跳轉到

finlab.data

資料下載核心模組,提供台股與美股的歷史資料下載功能。

使用情境

  • 下載股價、財報、籌碼等歷史資料
  • 篩選特定市場或類股的資料
  • 搜尋可用的資料表與欄位
  • 設定資料快取策略
  • 限制資料下載範圍以節省記憶體

快速範例

基本用法:下載資料

from finlab import data

# 下載收盤價
close = data.get('price:收盤價')

# 下載本益比
pe_ratio = data.get('price_earning_ratio:本益比')

# 下載月營收
revenue = data.get('monthly_revenue:當月營收')

搜尋可用欄位

# 搜尋包含「收盤」的欄位
data.search('收盤')
# 輸出: ['price:收盤價', 'etl:不含除權息收盤價', ...]

# 搜尋美股資料
data.search('close', market='us')

限制市場範圍

# 只取上市公司資料
with data.universe(market='TSE'):
    close = data.get('price:收盤價')

# 只取特定類股
with data.universe(category=['水泥工業', '食品工業']):
    close = data.get('price:收盤價')

詳細教學

參考 資料取得詳細說明,了解: - 完整資料下載教學 - 資料表結構說明 - 進階篩選技巧 - 錯誤處理方法


全域變數配置

強制使用雲端/本地資料

from finlab import data

# 強制使用雲端資料(每次都重新下載)
data.force_cloud_download = True

# 強制使用本地快取(無網路環境)
data.use_local_data_only = True

限制資料時間範圍

# 只下載 2020-2023 年的資料(節省記憶體)
data.truncate_start = '2020-01-01'
data.truncate_end = '2023-12-31'

# 之後所有 data.get() 都會套用此範圍
close = data.get('price:收盤價')

建議用法

  • 開發階段:使用 truncate_start 限制資料範圍,加快測試速度
  • 正式回測:移除 truncate 限制,使用完整歷史資料
  • 記憶體不足:設定 truncate_start 或使用 use_local_data_only

API Reference

data.get()

finlab.data.get

get(dataset, save_to_storage=True, force_download=False)

下載歷史資料

請至歷史資料目錄 來獲得所有歷史資料的名稱,即可使用此函式來獲取歷史資料。 假設 save_to_storageTrue 則,程式會自動在本地複製一份,以避免重複下載大量數據。

PARAMETER DESCRIPTION
dataset

The name of dataset.

TYPE: str

save_to_storage

Whether to save the dataset to storage for later use. Default is True. The argument will be removed in the future. Please use data.set_storage(FileStorage(use_cache=True)) instead.

TYPE: bool DEFAULT: True

force_download

Whether to force download the dataset from cloud. Default is False.

TYPE: bool DEFAULT: False

RETURNS DESCRIPTION
DataFrame

financial data

Examples:

欲下載所有上市上櫃之收盤價歷史資料,只需要使用此函式即可:

from finlab import data
close = data.get('price:收盤價')
close
date 0015 0050 0051 0052 0053
2007-04-23 9.54 57.85 32.83 38.4 nan
2007-04-24 9.54 58.1 32.99 38.65 nan
2007-04-25 9.52 57.6 32.8 38.59 nan
2007-04-26 9.59 57.7 32.8 38.6 nan
2007-04-27 9.55 57.5 32.72 38.4 nan

Note

使用 data.get 時,會預設優先下載近期資料,並與本地資料合併,以避免重複下載大量數據。

假如想要強制下載所有資料,可以在下載資料前,使用

data.force_cloud_download = True
假如想要強制使用本地資料,不額外下載,可以在下載資料前,使用
data.use_local_data_only = True

常用資料表

股價資料: - price:收盤價 - 每日收盤價 - price:開盤價 - 每日開盤價 - price:最高價 / price:最低價 - 日內極值 - price:成交股數 - 成交量

財報資料: - price_earning_ratio:本益比 - P/E ratio - price_earning_ratio:股價淨值比 - P/B ratio - fundamental_features:股東權益報酬率 - ROE - financial_statement:每股盈餘 - EPS

籌碼資料: - institutional_investors_trading_summary:投信買賣超股數 - margin_transactions:融資使用率 - etl:外資持股比例

月營收: - monthly_revenue:當月營收 - monthly_revenue:去年同月增減(%)

完整列表請至 資料庫目錄 查詢。

常見錯誤

  • KeyError: 資料表名稱錯誤或 API token 未設定
  • Empty DataFrame: 查詢條件過嚴或資料不存在
  • 記憶體不足: 下載資料過多,使用 truncate_start 限制範圍

data.search()

finlab.data.search

search(keyword=None, market='tw')

搜尋 FinLab 資料庫可用的資料欄位。

PARAMETER DESCRIPTION
keyword

搜尋關鍵字。若為 None 則列出全部。

TYPE: str DEFAULT: None

market

市場選擇 ('tw', 'us', 'all')。預設 'tw'。

TYPE: str DEFAULT: 'tw'

RETURNS DESCRIPTION
list

可用於 data.get() 的資料名稱列表,格式為 "table:column"

TYPE: list

Examples:

# 列出全部台股資料
tw_data = data.search()

# 搜尋台股包含 '收盤' 的欄位
close_data = data.search('收盤', market='tw')
# ['price:收盤價']

# 搜尋美股包含 'close' 的欄位
us_close = data.search('close', market='us')
# ['us_price:close', 'us_div_adj_price:adj_close', ...]

# 搜尋所有市場包含 'price' 的欄位
all_price = data.search('price', market='all')

範例

# 搜尋包含「股東」的欄位
data.search('股東')
# ['fundamental_features:股東權益報酬率', 'internal_equity_pledge:百分之十以上大股東持有股數', ...]

# 搜尋美股 PE ratio
data.search('pe', market='us')

# 列出所有台股欄位
all_fields = data.search()
print(f"共 {len(all_fields)} 個欄位")

data.universe()

finlab.data.universe

universe

universe(market='ALL', category='ALL', exclude_category=None)

Context manager to set a global stock universe filter for data retrieval.

This context manager limits the set of stocks returned by functions such as finlab.data.get(...) and finlab.data.indicator(...) to a specific market and category selection. The filter is applied globally within the context and is restored after the context exits.

Parameters

market : str, default 'ALL' Market scope to include. Supported values: - 'ALL': no market filter - 'TSE': TWSE (sii) - 'OTC': TPEx (otc) - 'TSE_OTC': include both TWSE and TPEx - 'ETF': exchange-traded funds - 'STOCK_FUTURE': underlying of single-stock futures/equity options

str | list[str], default 'ALL'

Category name(s) to include. Supports regex-like substring matching. For example, '電子' will match '電子工業', '電子通路業', etc. When a list is provided, the union of all matched categories is included.

str | list[str] | None, default None

Category name(s) to exclude from the resulting universe. Also supports regex-like substring matching. When None, no exclusion is applied.

Notes
  • The filter is applied to the internal universe_stocks set, which is then used by the data processing pipeline to select columns/rows corresponding to the chosen stocks.
  • Inside the context, calls to data.get(...) will return data limited to the specified universe whenever applicable.
  • After exiting the with block, the previous universe is restored.
Examples

Limit to TSE/OTC and include only specific categories:

from finlab import data with data.universe(market='TSE_OTC', category=['鋼鐵工業', '航運業']): ... close = data.get('price:收盤價')

Include categories but exclude financial-related stocks:

from finlab import data with data.universe('TSE_OTC', ['鋼鐵工業', '航運業'], exclude_category=['金融']): ... close = data.get('price:收盤價')

Equivalent global (non-context) usage:

from finlab import data data.set_universe(market='TSE_OTC', category='水泥', exclude_category='ETF') close = data.get('price:收盤價')

us_universe

us_universe(market='ALL', sector='ALL', industry='ALL', exchange='ALL', exclude_delisted=True, exclude_special=True)

Context manager to set a global stock universe filter for US market data retrieval.

This context manager limits the set of US stocks returned by data functions to a specific market category, sector, industry, and exchange selection. The filter is applied globally within the context and is restored after the context exits.

Parameters

market : str, default 'ALL' Market category to include. Supported values: - 'ALL': include all categories (default) - 'Common Stock': both ADR and Domestic common stocks - 'Preferred Stock': both ADR and Domestic preferred stocks - 'ADR': American Depositary Receipts - 'Domestic': Domestic stocks

str | list[str], default 'ALL'

Sector name(s) to include. Supports regex-like substring matching. For example, 'Technology' will match 'Technology' sector. When a list is provided, the union of all matched sectors is included.

str | list[str], default 'ALL'

Industry name(s) to include. Supports regex-like substring matching. For example, 'Software' will match 'Software - Application', 'Software - Infrastructure', etc. When a list is provided, the union of all matched industries is included.

str | list[str], default 'ALL'

Exchange name(s) to include. Common values: 'NASDAQ', 'NYSE', 'AMEX'. When a list is provided, stocks from any of the listed exchanges are included.

bool, default True

Whether to exclude delisted stocks (isdelisted='Y'). Recommended to keep as True since 61% of stocks in the dataset are delisted.

bool, default True

Whether to exclude special categories: Warrants, Rights, Units, Closed-End Funds (CEF). These categories typically lack sector/industry information.

Notes
  • About 61% of stocks in the us_tickers dataset are delisted (isdelisted='Y').
  • ETF and CEF categories typically have None values for sector and industry.
  • After filtering with default settings, approximately 5,000-7,000 active common stocks remain.
  • The filter modifies the global universe_stocks set used by the data processing pipeline.
Examples

Limit to active NASDAQ Technology stocks:

from finlab import data with data.us_universe(sector='Technology', exchange='NASDAQ'): ... close = data.get('price:收盤價')

Include all active common stocks on major exchanges:

from finlab import data with data.us_universe(market='Common Stock', exchange=['NASDAQ', 'NYSE']): ... close = data.get('price:收盤價')

Include delisted stocks for historical analysis:

from finlab import data with data.us_universe(exclude_delisted=False): ... close = data.get('price:收盤價')

set_us_universe

set_us_universe(market='ALL', sector='ALL', industry='ALL', exchange='ALL', exclude_delisted=True, exclude_special=True)

Set global US stock universe filter.

This function updates the global universe_stocks set to limit data retrieval to a specific subset of US stocks based on market category, sector, industry, exchange, and exclusion criteria.

Parameters

market : str, default 'ALL' Market category filter (see us_universe class for details). sector : str | list[str], default 'ALL' Sector filter with regex-like substring matching. industry : str | list[str], default 'ALL' Industry filter with regex-like substring matching. exchange : str | list[str], default 'ALL' Exchange filter (e.g., 'NASDAQ', 'NYSE', 'AMEX'). exclude_delisted : bool, default True Exclude stocks with isdelisted='Y' (recommended, as 61% are delisted). exclude_special : bool, default True Exclude Warrants, Rights, Units, and Closed-End Funds.

Notes

This function modifies the global universe_stocks variable.

範例

# 範例 1:只取上市公司
with data.universe(market='TSE'):
    close = data.get('price:收盤價')
    print(f"上市公司數量: {len(close.columns)}")

# 範例 2:特定類股
with data.universe(category=['半導體業']):
    close = data.get('price:收盤價')

# 範例 3:市值前 100 大
with data.universe(size=100):
    close = data.get('price:收盤價')

# 範例 4:組合條件
with data.universe(market='TSE_OTC', category=['電子工業'], size=50):
    close = data.get('price:收盤價')

可用 market 參數: - 'TSE' - 上市 - 'OTC' - 上櫃 - 'TSE_OTC' - 上市+上櫃 - 'ALL' - 全部(含興櫃)

data.us_universe()

finlab.data.us_universe

us_universe(market='ALL', sector='ALL', industry='ALL', exchange='ALL', exclude_delisted=True, exclude_special=True)

Context manager to set a global stock universe filter for US market data retrieval.

This context manager limits the set of US stocks returned by data functions to a specific market category, sector, industry, and exchange selection. The filter is applied globally within the context and is restored after the context exits.

Parameters

market : str, default 'ALL' Market category to include. Supported values: - 'ALL': include all categories (default) - 'Common Stock': both ADR and Domestic common stocks - 'Preferred Stock': both ADR and Domestic preferred stocks - 'ADR': American Depositary Receipts - 'Domestic': Domestic stocks

str | list[str], default 'ALL'

Sector name(s) to include. Supports regex-like substring matching. For example, 'Technology' will match 'Technology' sector. When a list is provided, the union of all matched sectors is included.

str | list[str], default 'ALL'

Industry name(s) to include. Supports regex-like substring matching. For example, 'Software' will match 'Software - Application', 'Software - Infrastructure', etc. When a list is provided, the union of all matched industries is included.

str | list[str], default 'ALL'

Exchange name(s) to include. Common values: 'NASDAQ', 'NYSE', 'AMEX'. When a list is provided, stocks from any of the listed exchanges are included.

bool, default True

Whether to exclude delisted stocks (isdelisted='Y'). Recommended to keep as True since 61% of stocks in the dataset are delisted.

bool, default True

Whether to exclude special categories: Warrants, Rights, Units, Closed-End Funds (CEF). These categories typically lack sector/industry information.

Notes

  • About 61% of stocks in the us_tickers dataset are delisted (isdelisted='Y').
  • ETF and CEF categories typically have None values for sector and industry.
  • After filtering with default settings, approximately 5,000-7,000 active common stocks remain.
  • The filter modifies the global universe_stocks set used by the data processing pipeline.

Examples

Limit to active NASDAQ Technology stocks:

from finlab import data with data.us_universe(sector='Technology', exchange='NASDAQ'): ... close = data.get('price:收盤價')

Include all active common stocks on major exchanges:

from finlab import data with data.us_universe(market='Common Stock', exchange=['NASDAQ', 'NYSE']): ... close = data.get('price:收盤價')

Include delisted stocks for historical analysis:

from finlab import data with data.us_universe(exclude_delisted=False): ... close = data.get('price:收盤價')

美股市場篩選

# 取得 S&P 500 成分股
with data.us_universe(index='SPX'):
    close = data.get('price:close')

# 取得 NASDAQ 100
with data.us_universe(index='NDX'):
    close = data.get('price:close')

data.indicator()

finlab.data.indicator

indicator(indname, adjust_price=False, resample='D', **kwargs)

支援 Talib 和 pandas_ta 上百種技術指標,計算 2000 檔股票、10年的所有資訊。

在使用這個函式前,需要安裝計算技術指標的 Packages

PARAMETER DESCRIPTION
indname

指標名稱, 以 TA-Lib 舉例,例如 SMA, STOCH, RSI 等,可以參考 talib 文件

以 Pandas-ta 舉例,例如 supertrend, ssf 等,可以參考 Pandas-ta 文件

TYPE: str

adjust_price

是否使用還原股價計算。

TYPE: bool DEFAULT: False

resample

技術指標價格週期,ex: D 代表日線, W 代表週線, M 代表月線。

TYPE: str DEFAULT: 'D'

market

市場選擇,ex: TW_STOCK 代表台股, US_STOCK 代表美股。

TYPE: str

**kwargs

技術指標的參數設定,TA-Lib 中的 RSI 為例,調整項為計算週期 timeperiod=14

TYPE: dict DEFAULT: {}

建議使用者可以先參考以下範例,並且搭配 talib官方文件,就可以掌握製作技術指標的方法了。

技術指標範例

from finlab import data

# 取得 MACD 指標
macd = data.indicator('macd', data.get('price:收盤價'))

# 取得 RSI 指標
rsi = data.indicator('rsi', data.get('price:收盤價'), period=14)

快取管理

finlab.data.set_storage

set_storage(storage)

設定本地端儲存歷史資料的方式 假設使用 data.get 獲取歷史資料則,在預設情況下,程式會自動在本地複製一份,以避免重複下載大量數據。 storage 就是用來儲存歷史資料的接口。我們提供兩種 storage 接口,分別是 finlab.data.CacheStorage (預設) 以及 finlab.data.FileStorage。前者是直接存在記憶體中,後者是存在檔案中。詳情請參考 CacheStorageFileStorage 來獲得更詳細的資訊。 在預設情況下,程式會自動使用 finlab.data.FileStorage 並將重複索取之歷史資料存在作業系統預設「暫時資料夾」。

PARAMETER DESCRIPTION
storage

The interface of storage

TYPE: Storage

Examples:

欲切換成以檔案方式儲存,可以用以下之方式:

from finlab import data
data.set_storage(data.FileStorage())
close = data.get('price:收盤價')

可以在本地端的 ./finlab_db/price#收盤價.pickle 中,看到下載的資料, 可以使用 pickle 調閱歷史資料:

import pickle
close = pickle.load(open('finlab_db/price#收盤價.pickle', 'rb'))

finlab.data.CacheStorage

CacheStorage()

將歷史資料儲存於快取中

Examples:

欲切換成以檔案方式儲存,可以用以下之方式:

from finlab import data
data.set_storage(data.CacheStorage())
close = data.get('price:收盤價')

可以直接調閱快取資料:

close = data._storage._cache['price:收盤價']

finlab.data.FileStorage

FileStorage(path=None, use_cache=True)

將歷史資料儲存於檔案中

PARAMETER DESCRIPTION
path

資料儲存的路徑

TYPE: str DEFAULT: None

use_cache

是否額外使用快取,將資料複製一份到記憶體中。

TYPE: bool DEFAULT: True

Examples:

欲切換成以檔案方式儲存,可以用以下之方式:

from finlab import data
data.set_storage(data.FileStorage())
close = data.get('price:收盤價')

可以在本地端的 ./finlab_db/price#收盤價.pickle 中,看到下載的資料, 可以使用 pickle 調閱歷史資料:

import pickle
close = pickle.load(open('finlab_db/price#收盤價.pickle', 'rb'))

diagnose

diagnose(dataset=None)

診斷本地儲存狀態

PARAMETER DESCRIPTION
dataset

指定要檢查的資料集名稱,例如 'price:收盤價'。如果不指定,則列出所有本地資料。

TYPE: str DEFAULT: None

Examples:

from finlab import data
data._storage.diagnose()  # 列出所有本地資料
data._storage.diagnose('price:收盤價')  # 檢查特定資料集

自訂快取策略

from finlab.data import set_storage, FileStorage

# 使用自訂資料夾
storage = FileStorage('/path/to/custom/cache')
set_storage(storage)

# 之後所有資料會快取到指定位置
close = data.get('price:收盤價')

其他工具

finlab.data.get_strategies

get_strategies(api_token=None)

取得已上傳量化平台的策略回傳資料。

可取得自己策略儀表板上的數據,例如每個策略的報酬率曲線、報酬率統計、夏普率、近期部位、近期換股日..., 這些數據可以用來進行多策略彙整的應用喔!

PARAMETER DESCRIPTION
api_token

若未帶入finlab模組的api_token,會自動跳出GUI頁面, 複製網頁內的api_token貼至輸入欄位即可。

TYPE: str DEFAULT: None

Returns: (dict): strategies data Response detail:

``` py
{
  strategy1:{
    'asset_type': '',
    'drawdown_details': {
       '2015-06-04': {
         'End': '2015-11-03',
         'Length': 152,
         'drawdown': -0.19879090089478024
         },
         ...
      },
    'fee_ratio': 0.000475,
    'last_trading_date': '2022-06-10',
    'last_updated': 'Sun, 03 Jul 2022 12:02:27 GMT',
    'ndays_return': {
      '1': -0.01132480035770611,
      '10': -0.0014737286933147464,
      '20': -0.06658015749110646,
      '5': -0.002292995729485159,
      '60': -0.010108700314771735
      },
    'next_trading_date': '2022-06-10',
    'positions': {
      '1413 宏洲': {
        'entry_date': '2022-05-10',
        'entry_price': 10.05,
        'exit_date': '',
        'next_weight': 0.1,
        'return': -0.010945273631840613,
        'status': '買進',
        'weight': 0.1479332345384493
        },
      'last_updated': 'Sun, 03 Jul 2022 12:02:27 GMT',
      'next_trading_date': '2022-06-10',
      'trade_at': 'open',
      'update_date': '2022-06-10'
      },
    'return_table': {
      '2014': {
        'Apr': 0.0,
        'Aug': 0.06315180932606546,
        'Dec': 0.0537589857541485,
        'Feb': 0.0,
        'Jan': 0.0,
        'Jul': 0.02937490104459939,
        'Jun': 0.01367930162104769,
        'Mar': 0.0,
        'May': 0.0,
        'Nov': -0.0014734320286596825,
        'Oct': -0.045082529665408266,
        'Sep': 0.04630906972509852,
        'YTD': 0.16626214846456966
        },
        ...
      },
    'returns': {
      'time': [
        '2014-06-10',
        '2014-06-11',
        '2014-06-12',
        ...
        ],
      'value': [
        100,
        99.9,
        100.2,
        ...
        ]
      },
    'stats': {
      'avg_down_month': -0.03304015302646822,
      'avg_drawdown': -0.0238021414698247,
      'avg_drawdown_days': 19.77952755905512,
      'avg_up_month': 0.05293384465715908,
      'cagr': 0.33236021285588846,
      'calmar': 1.65261094975066,
      'daily_kurt': 4.008888367138843,
      'daily_mean': 0.3090784769257415,
      'daily_sharpe': 1.747909002374217,
      'daily_skew': -0.6966018726321078,
      'daily_sortino': 2.8300677082214034,
      ...
      },
    'tax_ratio': 0.003,
    'trade_at': 'open',
    'update_date': '2022-06-10'
    },
  strategy2:{...},
  ...}
```

常見問題

Q: 如何知道有哪些資料可以下載?

方法 1:使用 search()

# 列出所有欄位
all_fields = data.search()
for field in all_fields[:10]:
    print(field)

方法 2:查看線上資料庫 前往 FinLab 資料庫目錄 瀏覽完整列表。

Q: 下載資料很慢怎麼辦?

# 方法 1:限制時間範圍
data.truncate_start = '2020-01-01'

# 方法 2:使用快取(第二次會很快)
close = data.get('price:收盤價')  # 第一次慢
close = data.get('price:收盤價')  # 第二次快(使用快取)

# 方法 3:使用 universe 限制股票數量
with data.universe(size=100):
    close = data.get('price:收盤價')  # 只下載 100 檔

Q: KeyError: 'price:收盤價' 怎麼辦?

可能原因: 1. 未登入 - 執行 finlab.login()finlab.login('YOUR_TOKEN') 2. 欄位名稱錯誤 - 使用 data.search('收盤') 確認正確名稱 3. API token 無效 - 重新取得 token

import finlab

# 檢查是否已登入
try:
    token, token_type = finlab.get_token()
    print(f"✅ 已登入({token_type})")
except:
    print("❌ 未登入,請執行 finlab.login()")

Q: 如何下載美股資料?

from finlab import data

# 美股收盤價
us_close = data.get('price:close', market='us')

# 搜尋美股欄位
us_fields = data.search(market='us')

Q: 資料有缺失值怎麼辦?

close = data.get('price:收盤價')

# 檢查缺失值
print(f"缺失值比例: {close.isna().sum().sum() / close.size:.2%}")

# 填補缺失值
close_filled = close.fillna(method='ffill')  # 向前填補

# 或刪除有缺失值的股票
close_clean = close.dropna(axis=1, thresh=len(close)*0.8)  # 保留 80% 以上資料的股票

Q: 如何節省記憶體?

# 方法 1:限制時間範圍
data.truncate_start = '2020-01-01'

# 方法 2:分批處理
all_stocks = close.columns
for batch in [all_stocks[i:i+100] for i in range(0, len(all_stocks), 100)]:
    batch_close = close[batch]
    # 處理 100 檔...

# 方法 3:只下載需要的欄位
# 不要一次 data.get() 太多資料表

參考資源