finlab.ml
finlab.ml.feature
combine
The combine function takes a dictionary of features as input and combines them into a single pandas DataFrame. combine 函數接受一個特徵字典作為輸入,並將它們合併成一個 pandas DataFrame。
PARAMETER | DESCRIPTION |
---|---|
features |
a dictionary of features where index is datetime and column is instrument. 一個特徵字典,其中索引為日期時間,欄位為證券代碼。
TYPE:
|
resample |
Optional argument to resample the data in the features. Default is None. 選擇性的參數,用於重新取樣特徵中的資料。預設為 None。
TYPE:
|
sample_filter |
a boolean dictionary where index is date and columns are instrument representing the filter of features.
TYPE:
|
**kwargs |
Additional keyword arguments to pass to the resampler function. 傳遞給重新取樣函數 resampler 的其他關鍵字引數。
DEFAULT:
|
RETURNS | DESCRIPTION |
---|---|
A pandas DataFrame containing all the input features combined. 一個包含所有輸入特徵合併後的 pandas DataFrame。 |
Examples:
這段程式碼教我們如何使用finlab.ml.feature和finlab.data模組,來合併兩個特徵:RSI和股價淨值比。我們使用f.combine函數來進行合併,其中特徵的名稱是字典的鍵,對應的資料是值。 我們從data.indicator('RSI')取得'rsi'特徵,這個函數計算相對強弱指數。我們從data.get('price_earning_ratio:股價淨值比')取得'pb'特徵,這個函數獲取股價淨值比。最後,我們得到一個包含這兩個特徵的DataFrame。
from finlab import data
import finlab.ml.feature as f
import finlab.ml.qlib as q
features = f.combine({
# 用 data.get 簡單產生出技術指標
'pb': data.get('price_earning_ratio:股價淨值比'),
# 用 data.indicator 產生技術指標的特徵
'rsi': data.indicator('RSI'),
# 用 f.ta 枚舉超多種 talib 指標
'talib': f.ta(f.ta_names()),
# 利用 qlib alph158 產生技術指標的特徵(請先執行 q.init(), q.dump() 才能使用)
'qlib158': q.alpha('Alpha158')
})
features.head()
datetime | instrument | rsi | pb |
---|---|---|---|
2020-01-01 | 1101 | 0 | 2 |
2020-01-02 | 1102 | 100 | 3 |
2020-01-03 | 1108 | 100 | 4 |
ta
ta(feature_names, factories=None, resample=None, start_time=None, end_time=None, adj=False, cpu=-1, **kwargs)
Calculate technical indicator values for a list of feature names.
PARAMETER | DESCRIPTION |
---|---|
feature_names |
A list of technical indicator feature names. Defaults to None.
TYPE:
|
factories |
A dictionary of factories to generate technical indicators. Defaults to {"talib": TalibIndicatorFactory()}.
TYPE:
|
resample |
The frequency to resample the data to. Defaults to None.
TYPE:
|
start_time |
The start time of the data. Defaults to None.
TYPE:
|
end_time |
The end time of the data. Defaults to None.
TYPE:
|
**kwargs |
Additional keyword arguments to pass to the resampler function.
DEFAULT:
|
RETURNS | DESCRIPTION |
---|---|
DataFrame
|
pd.DataFrame: technical indicator feature names and their corresponding values. |
ta_names
Generate a list of technical indicator feature names.
PARAMETER | DESCRIPTION |
---|---|
lb |
The lower bound of the multiplier of the default parameter for the technical indicators.
TYPE:
|
ub |
The upper bound of the multiplier of the default parameter for the technical indicators.
TYPE:
|
n |
The number of random samples for each technical indicator.
TYPE:
|
factory |
A factory object to generate technical indicators. Defaults to TalibIndicatorFactory.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[str]
|
List[str]: A list of technical indicator feature names. |
Examples:
import finlab.ml.feature as f
# method 1: generate each indicator with random parameters
features = f.ta()
# method 2: generate specific indicator
feature_names = ['talib.MACD__macdhist__fastperiod__52__slowperiod__212__signalperiod__75__']
features = f.ta(feature_names, resample='W')
# method 3: generate some indicator
feature_names = f.ta_names()
features = f.ta(feature_names)
finlab.ml.label
daytrading_percentage
Calculate the percentage change of market prices over a given period.
PARAMETER | DESCRIPTION |
---|---|
index |
A multi-level index of datetime and instrument.
TYPE:
|
resample |
The resample frequency for the output data. Defaults to None.
TYPE:
|
period |
The number of periods to calculate the percentage change over. Defaults to 1.
TYPE:
|
trade_at_price |
The price for execution. Defaults to
TYPE:
|
**kwargs |
Additional arguments to be passed to the resampler function.
DEFAULT:
|
RETURNS | DESCRIPTION |
---|---|
pd.Series: A pd.Series containing the percentage change of stock prices. |
excess_over_mean
Calculate the excess over mean of market prices over a given period.
PARAMETER | DESCRIPTION |
---|---|
index |
A multi-level index of datetime and instrument.
TYPE:
|
resample |
The resample frequency for the output data. Defaults to None.
TYPE:
|
period |
The number of periods to calculate the percentage change over. Defaults to 1.
TYPE:
|
trade_at_price |
The price for execution. Defaults to
TYPE:
|
**kwargs |
Additional arguments to be passed to the resampler function.
DEFAULT:
|
RETURNS | DESCRIPTION |
---|---|
pd.Series: A pd.Series containing the percentage change of stock prices. |
excess_over_median
Calculate the excess over median of market prices over a given period.
PARAMETER | DESCRIPTION |
---|---|
index |
A multi-level index of datetime and instrument.
TYPE:
|
resample |
The resample frequency for the output data. Defaults to None.
TYPE:
|
period |
The number of periods to calculate the percentage change over. Defaults to 1.
TYPE:
|
trade_at_price |
The price for execution. Defaults to
TYPE:
|
**kwargs |
Additional arguments to be passed to the resampler function.
DEFAULT:
|
RETURNS | DESCRIPTION |
---|---|
pd.Series: A pd.Series containing the percentage change of stock prices. |
maximum_adverse_excursion
Calculate the maximum adverse excursion of market prices over a given period.
PARAMETER | DESCRIPTION |
---|---|
index |
A multi-level index of datetime and instrument.
TYPE:
|
resample |
The resample frequency for the output data. Defaults to None.
TYPE:
|
period |
The number of periods to calculate the percentage change over. Defaults to 1.
TYPE:
|
trade_at_price |
The price for execution. Defaults to
TYPE:
|
**kwargs |
Additional arguments to be passed to the resampler function.
|
RETURNS | DESCRIPTION |
---|---|
pd.Series: A pd.Series containing the percentage change of stock prices. |
maximum_favorable_excursion
Calculate the maximum favorable excursion of market prices over a given period.
PARAMETER | DESCRIPTION |
---|---|
index |
A multi-level index of datetime and instrument.
TYPE:
|
resample |
The resample frequency for the output data. Defaults to None.
TYPE:
|
period |
The number of periods to calculate the percentage change over. Defaults to 1.
TYPE:
|
trade_at_price |
The price for execution. Defaults to
TYPE:
|
**kwargs |
Additional arguments to be passed to the resampler function.
|
RETURNS | DESCRIPTION |
---|---|
pd.Series: A pd.Series containing the percentage change of stock prices. |
return_percentage
Calculate the percentage change of market prices over a given period.
PARAMETER | DESCRIPTION |
---|---|
index |
A multi-level index of datetime and instrument.
TYPE:
|
resample |
The resample frequency for the output data. Defaults to None.
TYPE:
|
period |
The number of periods to calculate the percentage change over. Defaults to 1.
TYPE:
|
trade_at_price |
The price for execution. Defaults to
TYPE:
|
**kwargs |
Additional arguments to be passed to the resampler function.
DEFAULT:
|
RETURNS | DESCRIPTION |
---|---|
pd.Series: A pd.Series containing the percentage change of stock prices. |
finlab.ml.qlib
DumpDataBase
DumpDataBase(csv_path, qlib_dir, backup_dir=None, freq='day', max_workers=16, date_field_name='date', file_suffix='.csv', symbol_field_name='symbol', exclude_fields='', include_fields='', limit_nums=None)
Base class for dumping data to Qlib format.
PARAMETER | DESCRIPTION |
---|---|
csv_path |
The path to the CSV file or directory containing the CSV files.
TYPE:
|
qlib_dir |
The directory where the Qlib data will be saved.
TYPE:
|
backup_dir |
The directory where the backup of the Qlib data will be saved. Defaults to None.
TYPE:
|
freq |
The frequency of the data. Defaults to "day".
TYPE:
|
max_workers |
The maximum number of workers for parallel processing. Defaults to 16.
TYPE:
|
date_field_name |
The name of the date field in the CSV file. Defaults to "date".
TYPE:
|
file_suffix |
The suffix of the CSV file. Defaults to ".csv".
TYPE:
|
symbol_field_name |
The name of the symbol field in the CSV file. Defaults to "symbol".
TYPE:
|
exclude_fields |
The fields to exclude from the dumped data. Defaults to "".
TYPE:
|
include_fields |
The fields to include in the dumped data. Defaults to "".
TYPE:
|
limit_nums |
The maximum number of CSV files to process. Defaults to None.
TYPE:
|
CatBoostModel
DEnsmbleModel
DNNModel
LGBModel
LinearModel
SFMModel
TabnetModel
XGBModel
alpha
dump
get_models
Return a list of available models. Examples:
output:{ 'LGBModel': LGBModel, 'XGBModel': XGBModel, 'DEnsmbleModel': DEnsmbleModel, 'CatBoostModel': CatBoostModel, 'LinearModel': LinearModel, 'TabnetModel': TabnetModel, 'DNNModel': DNNModel, 'SFMModel': SFMModel}