finlab.ml
finlab.ml.feature
combine
The combine function takes a dictionary of features as input and combines them into a single pandas DataFrame. combine 函數接受一個特徵字典作為輸入,並將它們合併成一個 pandas DataFrame。
PARAMETER | DESCRIPTION |
---|---|
features |
a dictionary of features where index is datetime and column is instrument. 一個特徵字典,其中索引為日期時間,欄位為證券代碼。
TYPE:
|
resample |
Optional argument to resample the data in the features. Default is None. 選擇性的參數,用於重新取樣特徵中的資料。預設為 None。
TYPE:
|
sample_filter |
a boolean dictionary where index is date and columns are instrument representing the filter of features.
TYPE:
|
**kwargs |
Additional keyword arguments to pass to the resampler function. 傳遞給重新取樣函數 resampler 的其他關鍵字引數。
DEFAULT:
|
RETURNS | DESCRIPTION |
---|---|
A pandas DataFrame containing all the input features combined. 一個包含所有輸入特徵合併後的 pandas DataFrame。 |
Examples:
這段程式碼教我們如何使用finlab.ml.feature和finlab.data模組,來合併兩個特徵:RSI和股價淨值比。我們使用f.combine函數來進行合併,其中特徵的名稱是字典的鍵,對應的資料是值。 我們從data.indicator('RSI')取得'rsi'特徵,這個函數計算相對強弱指數。我們從data.get('price_earning_ratio:股價淨值比')取得'pb'特徵,這個函數獲取股價淨值比。最後,我們得到一個包含這兩個特徵的DataFrame。
from finlab import data
import finlab.ml.feature as f
import finlab.ml.qlib as q
features = f.combine({
# 用 data.get 簡單產生出技術指標
'pb': data.get('price_earning_ratio:股價淨值比'),
# 用 data.indicator 產生技術指標的特徵
'rsi': data.indicator('RSI'),
# 用 f.ta 枚舉超多種 talib 指標
'talib': f.ta(f.ta_names()),
# 利用 qlib alph158 產生技術指標的特徵(請先執行 q.init(), q.dump() 才能使用)
'qlib158': q.alpha('Alpha158')
})
features.head()
datetime | instrument | rsi | pb |
---|---|---|---|
2020-01-01 | 1101 | 0 | 2 |
2020-01-02 | 1102 | 100 | 3 |
2020-01-03 | 1108 | 100 | 4 |
ta
ta(feature_names, factories=None, resample=None, start_time=None, end_time=None, adj=False, cpu=-1, **kwargs)
Calculate technical indicator values for a list of feature names.
PARAMETER | DESCRIPTION |
---|---|
feature_names |
A list of technical indicator feature names. Defaults to None.
TYPE:
|
factories |
A dictionary of factories to generate technical indicators. Defaults to {"talib": TalibIndicatorFactory()}.
TYPE:
|
resample |
The frequency to resample the data to. Defaults to None.
TYPE:
|
start_time |
The start time of the data. Defaults to None.
TYPE:
|
end_time |
The end time of the data. Defaults to None.
TYPE:
|
**kwargs |
Additional keyword arguments to pass to the resampler function.
DEFAULT:
|
RETURNS | DESCRIPTION |
---|---|
DataFrame
|
pd.DataFrame: technical indicator feature names and their corresponding values. |
ta_names
Generate a list of technical indicator feature names.
PARAMETER | DESCRIPTION |
---|---|
lb |
The lower bound of the multiplier of the default parameter for the technical indicators.
TYPE:
|
ub |
The upper bound of the multiplier of the default parameter for the technical indicators.
TYPE:
|
n |
The number of random samples for each technical indicator.
TYPE:
|
factory |
A factory object to generate technical indicators. Defaults to TalibIndicatorFactory.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[str]
|
List[str]: A list of technical indicator feature names. |
Examples:
import finlab.ml.feature as f
# method 1: generate each indicator with random parameters
features = f.ta()
# method 2: generate specific indicator
feature_names = ['talib.MACD__macdhist__fastperiod__52__slowperiod__212__signalperiod__75__']
features = f.ta(feature_names, resample='W')
# method 3: generate some indicator
feature_names = f.ta_names()
features = f.ta(feature_names)
finlab.ml.label
daytrading_percentage
Calculate the percentage change of market prices over a given period.
PARAMETER | DESCRIPTION |
---|---|
index |
A multi-level index of datetime and instrument.
TYPE:
|
resample |
The resample frequency for the output data. Defaults to None.
TYPE:
|
period |
The number of periods to calculate the percentage change over. Defaults to 1.
TYPE:
|
trade_at_price |
The price for execution. Defaults to
TYPE:
|
**kwargs |
Additional arguments to be passed to the resampler function.
DEFAULT:
|
RETURNS | DESCRIPTION |
---|---|
pd.Series: A pd.Series containing the percentage change of stock prices. |
excess_over_mean
Calculate the excess over mean of market prices over a given period.
PARAMETER | DESCRIPTION |
---|---|
index |
A multi-level index of datetime and instrument.
TYPE:
|
resample |
The resample frequency for the output data. Defaults to None.
TYPE:
|
period |
The number of periods to calculate the percentage change over. Defaults to 1.
TYPE:
|
trade_at_price |
The price for execution. Defaults to
TYPE:
|
**kwargs |
Additional arguments to be passed to the resampler function.
DEFAULT:
|
RETURNS | DESCRIPTION |
---|---|
pd.Series: A pd.Series containing the percentage change of stock prices. |
excess_over_median
Calculate the excess over median of market prices over a given period.
PARAMETER | DESCRIPTION |
---|---|
index |
A multi-level index of datetime and instrument.
TYPE:
|
resample |
The resample frequency for the output data. Defaults to None.
TYPE:
|
period |
The number of periods to calculate the percentage change over. Defaults to 1.
TYPE:
|
trade_at_price |
The price for execution. Defaults to
TYPE:
|
**kwargs |
Additional arguments to be passed to the resampler function.
DEFAULT:
|
RETURNS | DESCRIPTION |
---|---|
pd.Series: A pd.Series containing the percentage change of stock prices. |
maximum_adverse_excursion
Calculate the maximum adverse excursion of market prices over a given period.
PARAMETER | DESCRIPTION |
---|---|
index |
A multi-level index of datetime and instrument.
TYPE:
|
resample |
The resample frequency for the output data. Defaults to None.
TYPE:
|
period |
The number of periods to calculate the percentage change over. Defaults to 1.
TYPE:
|
trade_at_price |
The price for execution. Defaults to
TYPE:
|
**kwargs |
Additional arguments to be passed to the resampler function.
|
RETURNS | DESCRIPTION |
---|---|
pd.Series: A pd.Series containing the percentage change of stock prices. |
maximum_favorable_excursion
Calculate the maximum favorable excursion of market prices over a given period.
PARAMETER | DESCRIPTION |
---|---|
index |
A multi-level index of datetime and instrument.
TYPE:
|
resample |
The resample frequency for the output data. Defaults to None.
TYPE:
|
period |
The number of periods to calculate the percentage change over. Defaults to 1.
TYPE:
|
trade_at_price |
The price for execution. Defaults to
TYPE:
|
**kwargs |
Additional arguments to be passed to the resampler function.
|
RETURNS | DESCRIPTION |
---|---|
pd.Series: A pd.Series containing the percentage change of stock prices. |
return_percentage
Calculate the percentage change of market prices over a given period.
PARAMETER | DESCRIPTION |
---|---|
index |
A multi-level index of datetime and instrument.
TYPE:
|
resample |
The resample frequency for the output data. Defaults to None.
TYPE:
|
period |
The number of periods to calculate the percentage change over. Defaults to 1.
TYPE:
|
trade_at_price |
The price for execution. Defaults to
TYPE:
|
**kwargs |
Additional arguments to be passed to the resampler function.
DEFAULT:
|
RETURNS | DESCRIPTION |
---|---|
pd.Series: A pd.Series containing the percentage change of stock prices. |
finlab.ml.qlib
DumpDataBase
DumpDataBase(csv_path, qlib_dir, backup_dir=None, freq='day', max_workers=16, date_field_name='date', file_suffix='.csv', symbol_field_name='symbol', exclude_fields='', include_fields='', limit_nums=None)
Base class for dumping data to Qlib format.
PARAMETER | DESCRIPTION |
---|---|
csv_path |
The path to the CSV file or directory containing the CSV files.
TYPE:
|
qlib_dir |
The directory where the Qlib data will be saved.
TYPE:
|
backup_dir |
The directory where the backup of the Qlib data will be saved. Defaults to None.
TYPE:
|
freq |
The frequency of the data. Defaults to "day".
TYPE:
|
max_workers |
The maximum number of workers for parallel processing. Defaults to 16.
TYPE:
|
date_field_name |
The name of the date field in the CSV file. Defaults to "date".
TYPE:
|
file_suffix |
The suffix of the CSV file. Defaults to ".csv".
TYPE:
|
symbol_field_name |
The name of the symbol field in the CSV file. Defaults to "symbol".
TYPE:
|
exclude_fields |
The fields to exclude from the dumped data. Defaults to "".
TYPE:
|
include_fields |
The fields to include in the dumped data. Defaults to "".
TYPE:
|
limit_nums |
The maximum number of CSV files to process. Defaults to None.
TYPE:
|
CatBoostModel
DEnsmbleModel
DNNModel
LGBModel
LinearModel
SFMModel
TabnetModel
XGBModel
alpha
dump
get_models
Return a list of available models. Examples:
output:{ 'LGBModel': LGBModel, 'XGBModel': XGBModel, 'DEnsmbleModel': DEnsmbleModel, 'CatBoostModel': CatBoostModel, 'LinearModel': LinearModel, 'TabnetModel': TabnetModel, 'DNNModel': DNNModel, 'SFMModel': SFMModel}
finlab.ml.alphalens
create_factor_data
create factor data, which contains future return
PARAMETER | DESCRIPTION |
---|---|
factor |
factor data where index is datetime and columns is asset id
TYPE:
|
adj_close |
adj close where index is datetime and columns is asset id
TYPE:
|
days |
future return considered
TYPE:
|
Return
Analytic plots and tables
Examples:
import alphalens
from finlab import data
from finlab.ml.alphalens import create_factor_data
factor = data.get('price_earning_ratio:股價淨值比')
adj_close = data.get('etl:adj_close')
factor_data = create_factor_data(factor, adj_close)
alphalens.tears.create_full_tear_sheet(factor_data.dropna(), long_short=False,
group_neutral=False, by_group=False)
factor_weights
Computes asset weights by factor values and dividing by the sum of their absolute value (achieving gross leverage of 1). Positive factor values will results in positive weights and negative values in negative weights.
PARAMETER | DESCRIPTION |
---|---|
factor_data |
A MultiIndex DataFrame indexed by date (level 0) and asset (level 1), containing the values for a single alpha factor, forward returns for each period, the factor quantile/bin that factor value belongs to, and (optionally) the group the asset belongs to. - See full explanation in utils.get_clean_factor_and_forward_returns
TYPE:
|
demeaned |
Should this computation happen on a long short portfolio? if True, weights are computed by demeaning factor values and dividing by the sum of their absolute value (achieving gross leverage of 1). The sum of positive weights will be the same as the negative weights (absolute value), suitable for a dollar neutral long-short portfolio
TYPE:
|
group_adjust |
Should this computation happen on a group neutral portfolio? If True, compute group neutral weights: each group will weight the same and if 'demeaned' is enabled the factor values demeaning will occur on the group level.
TYPE:
|
equal_weight |
if True the assets will be equal-weighted instead of factor-weighted If demeaned is True then the factor universe will be split in two equal sized groups, top assets with positive weights and bottom assets with negative weights
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
returns
|
pd.Series Assets weighted by factor value. |