finlab.ml
finlab.ml.feature
combine
The combine function takes a dictionary of features as input and combines them into a single pandas DataFrame. combine 函數接受一個特徵字典作為輸入,並將它們合併成一個 pandas DataFrame。
PARAMETER | DESCRIPTION |
---|---|
features |
a dictionary of features where index is datetime and column is instrument. 一個特徵字典,其中索引為日期時間,欄位為證券代碼。
TYPE:
|
resample |
Optional argument to resample the data in the features. Default is None. 選擇性的參數,用於重新取樣特徵中的資料。預設為 None。
TYPE:
|
sample_filter |
a boolean dictionary where index is date and columns are instrument representing the filter of features.
TYPE:
|
**kwargs |
Additional keyword arguments to pass to the resampler function. 傳遞給重新取樣函數 resampler 的其他關鍵字引數。
DEFAULT:
|
RETURNS | DESCRIPTION |
---|---|
A pandas DataFrame containing all the input features combined. 一個包含所有輸入特徵合併後的 pandas DataFrame。 |
Examples:
這段程式碼教我們如何使用finlab.ml.feature和finlab.data模組,來合併兩個特徵:RSI和股價淨值比。我們使用f.combine函數來進行合併,其中特徵的名稱是字典的鍵,對應的資料是值。 我們從data.indicator('RSI')取得'rsi'特徵,這個函數計算相對強弱指數。我們從data.get('price_earning_ratio:股價淨值比')取得'pb'特徵,這個函數獲取股價淨值比。最後,我們得到一個包含這兩個特徵的DataFrame。
from finlab import data
import finlab.ml.feature as f
import finlab.ml.qlib as q
features = f.combine({
# 用 data.get 簡單產生出技術指標
'pb': data.get('price_earning_ratio:股價淨值比'),
# 用 data.indicator 產生技術指標的特徵
'rsi': data.indicator('RSI'),
# 用 f.ta 枚舉超多種 talib 指標
'talib': f.ta(f.ta_names()),
# 利用 qlib alph158 產生技術指標的特徵(請先執行 q.init(), q.dump() 才能使用)
'qlib158': q.alpha('Alpha158')
})
features.head()
datetime | instrument | rsi | pb |
---|---|---|---|
2020-01-01 | 1101 | 0 | 2 |
2020-01-02 | 1102 | 100 | 3 |
2020-01-03 | 1108 | 100 | 4 |
Source code in finlab/ml/feature.py
337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 |
|
ta
ta(feature_names, factories=None, resample=None, start_time=None, end_time=None, adj=False, cpu=-1, **kwargs)
Calculate technical indicator values for a list of feature names.
PARAMETER | DESCRIPTION |
---|---|
feature_names |
A list of technical indicator feature names. Defaults to None.
TYPE:
|
factories |
A dictionary of factories to generate technical indicators. Defaults to {"talib": TalibIndicatorFactory()}.
TYPE:
|
resample |
The frequency to resample the data to. Defaults to None.
TYPE:
|
start_time |
The start time of the data. Defaults to None.
TYPE:
|
end_time |
The end time of the data. Defaults to None.
TYPE:
|
**kwargs |
Additional keyword arguments to pass to the resampler function.
DEFAULT:
|
RETURNS | DESCRIPTION |
---|---|
DataFrame
|
pd.DataFrame: technical indicator feature names and their corresponding values. |
Source code in finlab/ml/feature.py
249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 |
|
ta_names
Generate a list of technical indicator feature names.
PARAMETER | DESCRIPTION |
---|---|
lb |
The lower bound of the multiplier of the default parameter for the technical indicators.
TYPE:
|
ub |
The upper bound of the multiplier of the default parameter for the technical indicators.
TYPE:
|
n |
The number of random samples for each technical indicator.
TYPE:
|
factory |
A factory object to generate technical indicators. Defaults to TalibIndicatorFactory.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[str]
|
List[str]: A list of technical indicator feature names. |
Examples:
import finlab.ml.feature as f
# method 1: generate each indicator with random parameters
features = f.ta()
# method 2: generate specific indicator
feature_names = ['talib.MACD__macdhist__fastperiod__52__slowperiod__212__signalperiod__75__']
features = f.ta(feature_names, resample='W')
# method 3: generate some indicator
feature_names = f.ta_names()
features = f.ta(feature_names)
Source code in finlab/ml/feature.py
finlab.ml.label
daytrading_percentage
Calculate the percentage change of market prices over a given period.
PARAMETER | DESCRIPTION |
---|---|
index |
A multi-level index of datetime and instrument.
TYPE:
|
resample |
The resample frequency for the output data. Defaults to None.
TYPE:
|
period |
The number of periods to calculate the percentage change over. Defaults to 1.
TYPE:
|
trade_at_price |
The price for execution. Defaults to
TYPE:
|
**kwargs |
Additional arguments to be passed to the resampler function.
DEFAULT:
|
RETURNS | DESCRIPTION |
---|---|
pd.Series: A pd.Series containing the percentage change of stock prices. |
Source code in finlab/ml/label.py
excess_over_mean
Calculate the excess over mean of market prices over a given period.
PARAMETER | DESCRIPTION |
---|---|
index |
A multi-level index of datetime and instrument.
TYPE:
|
resample |
The resample frequency for the output data. Defaults to None.
TYPE:
|
period |
The number of periods to calculate the percentage change over. Defaults to 1.
TYPE:
|
trade_at_price |
The price for execution. Defaults to
TYPE:
|
**kwargs |
Additional arguments to be passed to the resampler function.
DEFAULT:
|
RETURNS | DESCRIPTION |
---|---|
pd.Series: A pd.Series containing the percentage change of stock prices. |
Source code in finlab/ml/label.py
excess_over_median
Calculate the excess over median of market prices over a given period.
PARAMETER | DESCRIPTION |
---|---|
index |
A multi-level index of datetime and instrument.
TYPE:
|
resample |
The resample frequency for the output data. Defaults to None.
TYPE:
|
period |
The number of periods to calculate the percentage change over. Defaults to 1.
TYPE:
|
trade_at_price |
The price for execution. Defaults to
TYPE:
|
**kwargs |
Additional arguments to be passed to the resampler function.
DEFAULT:
|
RETURNS | DESCRIPTION |
---|---|
pd.Series: A pd.Series containing the percentage change of stock prices. |
Source code in finlab/ml/label.py
maximum_adverse_excursion
Calculate the maximum adverse excursion of market prices over a given period.
PARAMETER | DESCRIPTION |
---|---|
index |
A multi-level index of datetime and instrument.
TYPE:
|
resample |
The resample frequency for the output data. Defaults to None.
TYPE:
|
period |
The number of periods to calculate the percentage change over. Defaults to 1.
TYPE:
|
trade_at_price |
The price for execution. Defaults to
TYPE:
|
**kwargs |
Additional arguments to be passed to the resampler function.
|
RETURNS | DESCRIPTION |
---|---|
pd.Series: A pd.Series containing the percentage change of stock prices. |
Source code in finlab/ml/label.py
maximum_favorable_excursion
Calculate the maximum favorable excursion of market prices over a given period.
PARAMETER | DESCRIPTION |
---|---|
index |
A multi-level index of datetime and instrument.
TYPE:
|
resample |
The resample frequency for the output data. Defaults to None.
TYPE:
|
period |
The number of periods to calculate the percentage change over. Defaults to 1.
TYPE:
|
trade_at_price |
The price for execution. Defaults to
TYPE:
|
**kwargs |
Additional arguments to be passed to the resampler function.
|
RETURNS | DESCRIPTION |
---|---|
pd.Series: A pd.Series containing the percentage change of stock prices. |
Source code in finlab/ml/label.py
return_percentage
Calculate the percentage change of market prices over a given period.
PARAMETER | DESCRIPTION |
---|---|
index |
A multi-level index of datetime and instrument.
TYPE:
|
resample |
The resample frequency for the output data. Defaults to None.
TYPE:
|
period |
The number of periods to calculate the percentage change over. Defaults to 1.
TYPE:
|
trade_at_price |
The price for execution. Defaults to
TYPE:
|
**kwargs |
Additional arguments to be passed to the resampler function.
DEFAULT:
|
RETURNS | DESCRIPTION |
---|---|
pd.Series: A pd.Series containing the percentage change of stock prices. |
Source code in finlab/ml/label.py
finlab.ml.qlib
DumpDataBase
DumpDataBase(csv_path, qlib_dir, backup_dir=None, freq='day', max_workers=16, date_field_name='date', file_suffix='.csv', symbol_field_name='symbol', exclude_fields='', include_fields='', limit_nums=None)
Base class for dumping data to Qlib format.
PARAMETER | DESCRIPTION |
---|---|
csv_path |
The path to the CSV file or directory containing the CSV files.
TYPE:
|
qlib_dir |
The directory where the Qlib data will be saved.
TYPE:
|
backup_dir |
The directory where the backup of the Qlib data will be saved. Defaults to None.
TYPE:
|
freq |
The frequency of the data. Defaults to "day".
TYPE:
|
max_workers |
The maximum number of workers for parallel processing. Defaults to 16.
TYPE:
|
date_field_name |
The name of the date field in the CSV file. Defaults to "date".
TYPE:
|
file_suffix |
The suffix of the CSV file. Defaults to ".csv".
TYPE:
|
symbol_field_name |
The name of the symbol field in the CSV file. Defaults to "symbol".
TYPE:
|
exclude_fields |
The fields to exclude from the dumped data. Defaults to "".
TYPE:
|
include_fields |
The fields to include in the dumped data. Defaults to "".
TYPE:
|
limit_nums |
The maximum number of CSV files to process. Defaults to None.
TYPE:
|
Source code in finlab/ml/qlib.py
CatBoostModel
CatBoostModel is a wrapper model for CatBoost model.
import finlab.ml.qlib as q
# build X_train, y_train, X_test
model = q.CatBoostModel()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
Source code in finlab/ml/qlib.py
DEnsmbleModel
DEnsmbleModel is a wrapper model for Double Ensemble model.
import finlab.ml.qlib as q
# build X_train, y_train, X_test
model = q.DEnsmbleModel()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
Source code in finlab/ml/qlib.py
DNNModel
DNNModel is a wrapper model for Deep Neural Network model.
import finlab.ml.qlib as q
# build X_train, y_train, X_test
model = q.DNNModel()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
Source code in finlab/ml/qlib.py
LGBModel
LGBModel is a wrapper model for LightGBM model.
import finlab.ml.qlib as q
# build X_train, y_train, X_test
model = q.LGBModel()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
Source code in finlab/ml/qlib.py
LinearModel
LinearModel is a wrapper model for Linear model.
import finlab.ml.qlib as q
# build X_train, y_train, X_test
model = q.LinearModel()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
Source code in finlab/ml/qlib.py
SFMModel
SFMModel is a wrapper model for SFM.
import finlab.ml.qlib as q
# build X_train, y_train, X_test
model = q.SFMModel()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
Source code in finlab/ml/qlib.py
TabnetModel
TabnetModel is a wrapper model for Tabnet model.
import finlab.ml.qlib as q
# build X_train, y_train, X_test
model = q.TabnetModel()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
Source code in finlab/ml/qlib.py
XGBModel
XGBModel is a wrapper model for XGBoost model.
import finlab.ml.qlib as q
# build X_train, y_train, X_test
model = q.XGBModel()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
Source code in finlab/ml/qlib.py
alpha
產生 Qlib 的特徵 Args: handler (str): 預設為 'alpha158' 也可以設定成 'Alpha360' Examples:
Source code in finlab/ml/qlib.py
dump
產生Qlib 於台股的資料庫 Examples:
import qlib
import finlab.ml.qlib as q
q.dump() # generate tw stock database
q.init() # initiate tw stock to perform machine leraning tasks (similar to qlib.init)
import qlib
# qlib functions and operations
Source code in finlab/ml/qlib.py
get_models
Return a list of available models. Examples:
output:{ 'LGBModel': LGBModel, 'XGBModel': XGBModel, 'DEnsmbleModel': DEnsmbleModel, 'CatBoostModel': CatBoostModel, 'LinearModel': LinearModel, 'TabnetModel': TabnetModel, 'DNNModel': DNNModel, 'SFMModel': SFMModel}
Source code in finlab/ml/qlib.py
init
Qlib 初始化 (類似於台股版 qlib.init() 但更簡單易用) Examples:
import qlib
import finlab.ml.qlib as q
q.dump() # generate tw stock database
q.init() # initiate tw stock to perform machine leraning tasks (similar to qlib.init)
import qlib
# qlib functions and operations
Source code in finlab/ml/qlib.py
finlab.ml.alphalens
create_factor_data
create factor data, which contains future return
PARAMETER | DESCRIPTION |
---|---|
factor |
factor data where index is datetime and columns is asset id
TYPE:
|
adj_close |
adj close where index is datetime and columns is asset id
TYPE:
|
days |
future return considered
TYPE:
|
Return
Analytic plots and tables
Examples:
import alphalens
from finlab import data
from finlab.ml.alphalens import create_factor_data
factor = data.get('price_earning_ratio:股價淨值比')
adj_close = data.get('etl:adj_close')
factor_data = create_factor_data(factor, adj_close)
alphalens.tears.create_full_tear_sheet(factor_data.dropna(), long_short=False,
group_neutral=False, by_group=False)
Source code in finlab/ml/alphalens.py
factor_weights
Computes asset weights by factor values and dividing by the sum of their absolute value (achieving gross leverage of 1). Positive factor values will results in positive weights and negative values in negative weights.
PARAMETER | DESCRIPTION |
---|---|
factor_data |
A MultiIndex DataFrame indexed by date (level 0) and asset (level 1), containing the values for a single alpha factor, forward returns for each period, the factor quantile/bin that factor value belongs to, and (optionally) the group the asset belongs to. - See full explanation in utils.get_clean_factor_and_forward_returns
TYPE:
|
demeaned |
Should this computation happen on a long short portfolio? if True, weights are computed by demeaning factor values and dividing by the sum of their absolute value (achieving gross leverage of 1). The sum of positive weights will be the same as the negative weights (absolute value), suitable for a dollar neutral long-short portfolio
TYPE:
|
group_adjust |
Should this computation happen on a group neutral portfolio? If True, compute group neutral weights: each group will weight the same and if 'demeaned' is enabled the factor values demeaning will occur on the group level.
TYPE:
|
equal_weight |
if True the assets will be equal-weighted instead of factor-weighted If demeaned is True then the factor universe will be split in two equal sized groups, top assets with positive weights and bottom assets with negative weights
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
returns
|
pd.Series Assets weighted by factor value. |