finlab.data
資料下載核心模組,提供台股與美股的歷史資料下載功能。
使用情境
- 下載股價、財報、籌碼等歷史資料
- 篩選特定市場或類股的資料
- 搜尋可用的資料表與欄位
- 設定資料快取策略
- 限制資料下載範圍以節省記憶體
快速範例
基本用法:下載資料
from finlab import data
# 下載收盤價
close = data.get('price:收盤價')
# 下載本益比
pe_ratio = data.get('price_earning_ratio:本益比')
# 下載月營收
revenue = data.get('monthly_revenue:當月營收')
搜尋可用欄位
# 搜尋包含「收盤」的欄位
data.search('收盤')
# 輸出: ['price:收盤價', 'etl:不含除權息收盤價', ...]
# 搜尋美股資料
data.search('close', market='us')
限制市場範圍
# 只取上市公司資料
with data.universe(market='TSE'):
close = data.get('price:收盤價')
# 只取特定類股
with data.universe(category=['水泥工業', '食品工業']):
close = data.get('price:收盤價')
詳細教學
參考 資料取得詳細說明,了解: - 完整資料下載教學 - 資料表結構說明 - 進階篩選技巧 - 錯誤處理方法
全域變數配置
強制使用雲端/本地資料
from finlab import data
# 強制使用雲端資料(每次都重新下載)
data.force_cloud_download = True
# 強制使用本地快取(無網路環境)
data.use_local_data_only = True
限制資料時間範圍
# 只下載 2020-2023 年的資料(節省記憶體)
data.truncate_start = '2020-01-01'
data.truncate_end = '2023-12-31'
# 之後所有 data.get() 都會套用此範圍
close = data.get('price:收盤價')
建議用法
- 開發階段:使用
truncate_start限制資料範圍,加快測試速度 - 正式回測:移除 truncate 限制,使用完整歷史資料
- 記憶體不足:設定
truncate_start或使用use_local_data_only
API Reference
data.get()
finlab.data.get
下載歷史資料
請至歷史資料目錄 來獲得所有歷史資料的名稱,即可使用此函式來獲取歷史資料。
假設 save_to_storage 為 True 則,程式會自動在本地複製一份,以避免重複下載大量數據。
| PARAMETER | DESCRIPTION |
|---|---|
dataset
|
The name of dataset.
TYPE:
|
save_to_storage
|
Whether to save the dataset to storage for later use. Default is True. The argument will be removed in the future. Please use data.set_storage(FileStorage(use_cache=True)) instead.
TYPE:
|
force_download
|
Whether to force download the dataset from cloud. Default is False.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
DataFrame
|
financial data |
Examples:
欲下載所有上市上櫃之收盤價歷史資料,只需要使用此函式即可:
| date | 0015 | 0050 | 0051 | 0052 | 0053 |
|---|---|---|---|---|---|
| 2007-04-23 | 9.54 | 57.85 | 32.83 | 38.4 | nan |
| 2007-04-24 | 9.54 | 58.1 | 32.99 | 38.65 | nan |
| 2007-04-25 | 9.52 | 57.6 | 32.8 | 38.59 | nan |
| 2007-04-26 | 9.59 | 57.7 | 32.8 | 38.6 | nan |
| 2007-04-27 | 9.55 | 57.5 | 32.72 | 38.4 | nan |
常用資料表
股價資料:
- price:收盤價 - 每日收盤價
- price:開盤價 - 每日開盤價
- price:最高價 / price:最低價 - 日內極值
- price:成交股數 - 成交量
財報資料:
- price_earning_ratio:本益比 - P/E ratio
- price_earning_ratio:股價淨值比 - P/B ratio
- fundamental_features:股東權益報酬率 - ROE
- financial_statement:每股盈餘 - EPS
籌碼資料:
- institutional_investors_trading_summary:投信買賣超股數
- margin_transactions:融資使用率
- etl:外資持股比例
月營收:
- monthly_revenue:當月營收
- monthly_revenue:去年同月增減(%)
完整列表請至 資料庫目錄 查詢。
常見錯誤
- KeyError: 資料表名稱錯誤或 API token 未設定
- Empty DataFrame: 查詢條件過嚴或資料不存在
- 記憶體不足: 下載資料過多,使用
truncate_start限制範圍
data.search()
finlab.data.search
搜尋 FinLab 資料庫可用的資料欄位。
| PARAMETER | DESCRIPTION |
|---|---|
keyword
|
搜尋關鍵字。若為 None 則列出全部。
TYPE:
|
market
|
市場選擇 ('tw', 'us', 'all')。預設 'tw'。
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
list
|
可用於 data.get() 的資料名稱列表,格式為 "table:column"
TYPE:
|
Examples:
# 列出全部台股資料
tw_data = data.search()
# 搜尋台股包含 '收盤' 的欄位
close_data = data.search('收盤', market='tw')
# ['price:收盤價']
# 搜尋美股包含 'close' 的欄位
us_close = data.search('close', market='us')
# ['us_price:close', 'us_div_adj_price:adj_close', ...]
# 搜尋所有市場包含 'price' 的欄位
all_price = data.search('price', market='all')
範例:
# 搜尋包含「股東」的欄位
data.search('股東')
# ['fundamental_features:股東權益報酬率', 'internal_equity_pledge:百分之十以上大股東持有股數', ...]
# 搜尋美股 PE ratio
data.search('pe', market='us')
# 列出所有台股欄位
all_fields = data.search()
print(f"共 {len(all_fields)} 個欄位")
data.universe()
finlab.data.universe
universe
Context manager to set a global stock universe filter for data retrieval.
This context manager limits the set of stocks returned by functions such as
finlab.data.get(...) and finlab.data.indicator(...) to a specific market
and category selection. The filter is applied globally within the context and
is restored after the context exits.
Parameters
market : str, default 'ALL' Market scope to include. Supported values: - 'ALL': no market filter - 'TSE': TWSE (sii) - 'OTC': TPEx (otc) - 'TSE_OTC': include both TWSE and TPEx - 'ETF': exchange-traded funds - 'STOCK_FUTURE': underlying of single-stock futures/equity options
str | list[str], default 'ALL'
Category name(s) to include. Supports regex-like substring matching. For example, '電子' will match '電子工業', '電子通路業', etc. When a list is provided, the union of all matched categories is included.
str | list[str] | None, default None
Category name(s) to exclude from the resulting universe. Also supports regex-like substring matching. When None, no exclusion is applied.
Notes
- The filter is applied to the internal
universe_stocksset, which is then used by the data processing pipeline to select columns/rows corresponding to the chosen stocks. - Inside the context, calls to
data.get(...)will return data limited to the specified universe whenever applicable. - After exiting the
withblock, the previous universe is restored.
Examples
Limit to TSE/OTC and include only specific categories:
from finlab import data with data.universe(market='TSE_OTC', category=['鋼鐵工業', '航運業']): ... close = data.get('price:收盤價')
Include categories but exclude financial-related stocks:
from finlab import data with data.universe('TSE_OTC', ['鋼鐵工業', '航運業'], exclude_category=['金融']): ... close = data.get('price:收盤價')
Equivalent global (non-context) usage:
from finlab import data data.set_universe(market='TSE_OTC', category='水泥', exclude_category='ETF') close = data.get('price:收盤價')
us_universe
us_universe(market='ALL', sector='ALL', industry='ALL', exchange='ALL', exclude_delisted=True, exclude_special=True)
Context manager to set a global stock universe filter for US market data retrieval.
This context manager limits the set of US stocks returned by data functions to a specific market category, sector, industry, and exchange selection. The filter is applied globally within the context and is restored after the context exits.
Parameters
market : str, default 'ALL' Market category to include. Supported values: - 'ALL': include all categories (default) - 'Common Stock': both ADR and Domestic common stocks - 'Preferred Stock': both ADR and Domestic preferred stocks - 'ADR': American Depositary Receipts - 'Domestic': Domestic stocks
str | list[str], default 'ALL'
Sector name(s) to include. Supports regex-like substring matching. For example, 'Technology' will match 'Technology' sector. When a list is provided, the union of all matched sectors is included.
str | list[str], default 'ALL'
Industry name(s) to include. Supports regex-like substring matching. For example, 'Software' will match 'Software - Application', 'Software - Infrastructure', etc. When a list is provided, the union of all matched industries is included.
str | list[str], default 'ALL'
Exchange name(s) to include. Common values: 'NASDAQ', 'NYSE', 'AMEX'. When a list is provided, stocks from any of the listed exchanges are included.
bool, default True
Whether to exclude delisted stocks (isdelisted='Y'). Recommended to keep as True since 61% of stocks in the dataset are delisted.
bool, default True
Whether to exclude special categories: Warrants, Rights, Units, Closed-End Funds (CEF). These categories typically lack sector/industry information.
Notes
- About 61% of stocks in the us_tickers dataset are delisted (isdelisted='Y').
- ETF and CEF categories typically have None values for sector and industry.
- After filtering with default settings, approximately 5,000-7,000 active common stocks remain.
- The filter modifies the global
universe_stocksset used by the data processing pipeline.
Examples
Limit to active NASDAQ Technology stocks:
from finlab import data with data.us_universe(sector='Technology', exchange='NASDAQ'): ... close = data.get('price:收盤價')
Include all active common stocks on major exchanges:
from finlab import data with data.us_universe(market='Common Stock', exchange=['NASDAQ', 'NYSE']): ... close = data.get('price:收盤價')
Include delisted stocks for historical analysis:
from finlab import data with data.us_universe(exclude_delisted=False): ... close = data.get('price:收盤價')
set_us_universe
set_us_universe(market='ALL', sector='ALL', industry='ALL', exchange='ALL', exclude_delisted=True, exclude_special=True)
Set global US stock universe filter.
This function updates the global universe_stocks set to limit data retrieval
to a specific subset of US stocks based on market category, sector, industry,
exchange, and exclusion criteria.
Parameters
market : str, default 'ALL' Market category filter (see us_universe class for details). sector : str | list[str], default 'ALL' Sector filter with regex-like substring matching. industry : str | list[str], default 'ALL' Industry filter with regex-like substring matching. exchange : str | list[str], default 'ALL' Exchange filter (e.g., 'NASDAQ', 'NYSE', 'AMEX'). exclude_delisted : bool, default True Exclude stocks with isdelisted='Y' (recommended, as 61% are delisted). exclude_special : bool, default True Exclude Warrants, Rights, Units, and Closed-End Funds.
Notes
This function modifies the global universe_stocks variable.
範例:
# 範例 1:只取上市公司
with data.universe(market='TSE'):
close = data.get('price:收盤價')
print(f"上市公司數量: {len(close.columns)}")
# 範例 2:特定類股
with data.universe(category=['半導體業']):
close = data.get('price:收盤價')
# 範例 3:市值前 100 大
with data.universe(size=100):
close = data.get('price:收盤價')
# 範例 4:組合條件
with data.universe(market='TSE_OTC', category=['電子工業'], size=50):
close = data.get('price:收盤價')
可用 market 參數:
- 'TSE' - 上市
- 'OTC' - 上櫃
- 'TSE_OTC' - 上市+上櫃
- 'ALL' - 全部(含興櫃)
data.us_universe()
finlab.data.us_universe
us_universe(market='ALL', sector='ALL', industry='ALL', exchange='ALL', exclude_delisted=True, exclude_special=True)
Context manager to set a global stock universe filter for US market data retrieval.
This context manager limits the set of US stocks returned by data functions to a specific market category, sector, industry, and exchange selection. The filter is applied globally within the context and is restored after the context exits.
Parameters
market : str, default 'ALL' Market category to include. Supported values: - 'ALL': include all categories (default) - 'Common Stock': both ADR and Domestic common stocks - 'Preferred Stock': both ADR and Domestic preferred stocks - 'ADR': American Depositary Receipts - 'Domestic': Domestic stocks
str | list[str], default 'ALL'
Sector name(s) to include. Supports regex-like substring matching. For example, 'Technology' will match 'Technology' sector. When a list is provided, the union of all matched sectors is included.
str | list[str], default 'ALL'
Industry name(s) to include. Supports regex-like substring matching. For example, 'Software' will match 'Software - Application', 'Software - Infrastructure', etc. When a list is provided, the union of all matched industries is included.
str | list[str], default 'ALL'
Exchange name(s) to include. Common values: 'NASDAQ', 'NYSE', 'AMEX'. When a list is provided, stocks from any of the listed exchanges are included.
bool, default True
Whether to exclude delisted stocks (isdelisted='Y'). Recommended to keep as True since 61% of stocks in the dataset are delisted.
bool, default True
Whether to exclude special categories: Warrants, Rights, Units, Closed-End Funds (CEF). These categories typically lack sector/industry information.
Notes
- About 61% of stocks in the us_tickers dataset are delisted (isdelisted='Y').
- ETF and CEF categories typically have None values for sector and industry.
- After filtering with default settings, approximately 5,000-7,000 active common stocks remain.
- The filter modifies the global
universe_stocksset used by the data processing pipeline.
Examples
Limit to active NASDAQ Technology stocks:
from finlab import data with data.us_universe(sector='Technology', exchange='NASDAQ'): ... close = data.get('price:收盤價')
Include all active common stocks on major exchanges:
from finlab import data with data.us_universe(market='Common Stock', exchange=['NASDAQ', 'NYSE']): ... close = data.get('price:收盤價')
Include delisted stocks for historical analysis:
from finlab import data with data.us_universe(exclude_delisted=False): ... close = data.get('price:收盤價')
美股市場篩選:
# 取得 S&P 500 成分股
with data.us_universe(index='SPX'):
close = data.get('price:close')
# 取得 NASDAQ 100
with data.us_universe(index='NDX'):
close = data.get('price:close')
data.indicator()
finlab.data.indicator
支援 Talib 和 pandas_ta 上百種技術指標,計算 2000 檔股票、10年的所有資訊。
在使用這個函式前,需要安裝計算技術指標的 Packages
| PARAMETER | DESCRIPTION |
|---|---|
indname
|
指標名稱, 以 TA-Lib 舉例,例如 SMA, STOCH, RSI 等,可以參考 talib 文件。 以 Pandas-ta 舉例,例如 supertrend, ssf 等,可以參考 Pandas-ta 文件。
TYPE:
|
adjust_price
|
是否使用還原股價計算。
TYPE:
|
resample
|
技術指標價格週期,ex:
TYPE:
|
market
|
市場選擇,ex:
TYPE:
|
**kwargs
|
技術指標的參數設定,TA-Lib 中的 RSI 為例,調整項為計算週期
TYPE:
|
建議使用者可以先參考以下範例,並且搭配 talib官方文件,就可以掌握製作技術指標的方法了。
技術指標範例:
from finlab import data
# 取得 MACD 指標
macd = data.indicator('macd', data.get('price:收盤價'))
# 取得 RSI 指標
rsi = data.indicator('rsi', data.get('price:收盤價'), period=14)
快取管理
finlab.data.set_storage
設定本地端儲存歷史資料的方式
假設使用 data.get 獲取歷史資料則,在預設情況下,程式會自動在本地複製一份,以避免重複下載大量數據。
storage 就是用來儲存歷史資料的接口。我們提供兩種 storage 接口,分別是 finlab.data.CacheStorage (預設) 以及
finlab.data.FileStorage。前者是直接存在記憶體中,後者是存在檔案中。詳情請參考 CacheStorage 和 FileStorage 來獲得更詳細的資訊。
在預設情況下,程式會自動使用 finlab.data.FileStorage 並將重複索取之歷史資料存在作業系統預設「暫時資料夾」。
| PARAMETER | DESCRIPTION |
|---|---|
storage
|
The interface of storage
TYPE:
|
Examples:
欲切換成以檔案方式儲存,可以用以下之方式:
可以在本地端的 ./finlab_db/price#收盤價.pickle 中,看到下載的資料,
可以使用 pickle 調閱歷史資料:
finlab.data.CacheStorage
finlab.data.FileStorage
將歷史資料儲存於檔案中
| PARAMETER | DESCRIPTION |
|---|---|
path
|
資料儲存的路徑
TYPE:
|
use_cache
|
是否額外使用快取,將資料複製一份到記憶體中。
TYPE:
|
Examples:
欲切換成以檔案方式儲存,可以用以下之方式:
可以在本地端的 ./finlab_db/price#收盤價.pickle 中,看到下載的資料,
可以使用 pickle 調閱歷史資料:
自訂快取策略:
from finlab.data import set_storage, FileStorage
# 使用自訂資料夾
storage = FileStorage('/path/to/custom/cache')
set_storage(storage)
# 之後所有資料會快取到指定位置
close = data.get('price:收盤價')
其他工具
finlab.data.get_strategies
取得已上傳量化平台的策略回傳資料。
可取得自己策略儀表板上的數據,例如每個策略的報酬率曲線、報酬率統計、夏普率、近期部位、近期換股日..., 這些數據可以用來進行多策略彙整的應用喔!
| PARAMETER | DESCRIPTION |
|---|---|
api_token
|
若未帶入finlab模組的api_token,會自動跳出GUI頁面, 複製網頁內的api_token貼至輸入欄位即可。
TYPE:
|
Returns: (dict): strategies data Response detail:
``` py
{
strategy1:{
'asset_type': '',
'drawdown_details': {
'2015-06-04': {
'End': '2015-11-03',
'Length': 152,
'drawdown': -0.19879090089478024
},
...
},
'fee_ratio': 0.000475,
'last_trading_date': '2022-06-10',
'last_updated': 'Sun, 03 Jul 2022 12:02:27 GMT',
'ndays_return': {
'1': -0.01132480035770611,
'10': -0.0014737286933147464,
'20': -0.06658015749110646,
'5': -0.002292995729485159,
'60': -0.010108700314771735
},
'next_trading_date': '2022-06-10',
'positions': {
'1413 宏洲': {
'entry_date': '2022-05-10',
'entry_price': 10.05,
'exit_date': '',
'next_weight': 0.1,
'return': -0.010945273631840613,
'status': '買進',
'weight': 0.1479332345384493
},
'last_updated': 'Sun, 03 Jul 2022 12:02:27 GMT',
'next_trading_date': '2022-06-10',
'trade_at': 'open',
'update_date': '2022-06-10'
},
'return_table': {
'2014': {
'Apr': 0.0,
'Aug': 0.06315180932606546,
'Dec': 0.0537589857541485,
'Feb': 0.0,
'Jan': 0.0,
'Jul': 0.02937490104459939,
'Jun': 0.01367930162104769,
'Mar': 0.0,
'May': 0.0,
'Nov': -0.0014734320286596825,
'Oct': -0.045082529665408266,
'Sep': 0.04630906972509852,
'YTD': 0.16626214846456966
},
...
},
'returns': {
'time': [
'2014-06-10',
'2014-06-11',
'2014-06-12',
...
],
'value': [
100,
99.9,
100.2,
...
]
},
'stats': {
'avg_down_month': -0.03304015302646822,
'avg_drawdown': -0.0238021414698247,
'avg_drawdown_days': 19.77952755905512,
'avg_up_month': 0.05293384465715908,
'cagr': 0.33236021285588846,
'calmar': 1.65261094975066,
'daily_kurt': 4.008888367138843,
'daily_mean': 0.3090784769257415,
'daily_sharpe': 1.747909002374217,
'daily_skew': -0.6966018726321078,
'daily_sortino': 2.8300677082214034,
...
},
'tax_ratio': 0.003,
'trade_at': 'open',
'update_date': '2022-06-10'
},
strategy2:{...},
...}
```
常見問題
Q: 如何知道有哪些資料可以下載?
方法 1:使用 search()
方法 2:查看線上資料庫 前往 FinLab 資料庫目錄 瀏覽完整列表。
Q: 下載資料很慢怎麼辦?
# 方法 1:限制時間範圍
data.truncate_start = '2020-01-01'
# 方法 2:使用快取(第二次會很快)
close = data.get('price:收盤價') # 第一次慢
close = data.get('price:收盤價') # 第二次快(使用快取)
# 方法 3:使用 universe 限制股票數量
with data.universe(size=100):
close = data.get('price:收盤價') # 只下載 100 檔
Q: KeyError: 'price:收盤價' 怎麼辦?
可能原因:
1. 未登入 - 執行 finlab.login() 或 finlab.login('YOUR_TOKEN')
2. 欄位名稱錯誤 - 使用 data.search('收盤') 確認正確名稱
3. API token 無效 - 重新取得 token
import finlab
# 檢查是否已登入
try:
token, token_type = finlab.get_token()
print(f"✅ 已登入({token_type})")
except:
print("❌ 未登入,請執行 finlab.login()")
Q: 如何下載美股資料?
from finlab import data
# 美股收盤價
us_close = data.get('price:close', market='us')
# 搜尋美股欄位
us_fields = data.search(market='us')
Q: 資料有缺失值怎麼辦?
close = data.get('price:收盤價')
# 檢查缺失值
print(f"缺失值比例: {close.isna().sum().sum() / close.size:.2%}")
# 填補缺失值
close_filled = close.fillna(method='ffill') # 向前填補
# 或刪除有缺失值的股票
close_clean = close.dropna(axis=1, thresh=len(close)*0.8) # 保留 80% 以上資料的股票
Q: 如何節省記憶體?
# 方法 1:限制時間範圍
data.truncate_start = '2020-01-01'
# 方法 2:分批處理
all_stocks = close.columns
for batch in [all_stocks[i:i+100] for i in range(0, len(all_stocks), 100)]:
batch_close = close[batch]
# 處理 100 檔...
# 方法 3:只下載需要的欄位
# 不要一次 data.get() 太多資料表
參考資源
- 資料取得詳細教學 - 完整使用指南
- FinLab 資料庫目錄 - 所有可用資料表
- 快速上手指南 - 新手入門
- 常見問題 FAQ - 更多疑難排解