跳轉到

finlab.ml

finlab.ml.feature

combine

combine(features, resample=None, sample_filter=None, **kwargs)

The combine function takes a dictionary of features as input and combines them into a single pandas DataFrame. combine 函數接受一個特徵字典作為輸入,並將它們合併成一個 pandas DataFrame。

PARAMETER DESCRIPTION
features

a dictionary of features where index is datetime and column is instrument. 一個特徵字典,其中索引為日期時間,欄位為證券代碼。

TYPE: Dict[str, DataFrame]

resample

Optional argument to resample the data in the features. Default is None. 選擇性的參數,用於重新取樣特徵中的資料。預設為 None。

TYPE: str DEFAULT: None

sample_filter

a boolean dictionary where index is date and columns are instrument representing the filter of features.

TYPE: DataFrame DEFAULT: None

**kwargs

Additional keyword arguments to pass to the resampler function. 傳遞給重新取樣函數 resampler 的其他關鍵字引數。

DEFAULT: {}

RETURNS DESCRIPTION

A pandas DataFrame containing all the input features combined. 一個包含所有輸入特徵合併後的 pandas DataFrame。

Examples:

這段程式碼教我們如何使用finlab.ml.feature和finlab.data模組,來合併兩個特徵:RSI和股價淨值比。我們使用f.combine函數來進行合併,其中特徵的名稱是字典的鍵,對應的資料是值。 我們從data.indicator('RSI')取得'rsi'特徵,這個函數計算相對強弱指數。我們從data.get('price_earning_ratio:股價淨值比')取得'pb'特徵,這個函數獲取股價淨值比。最後,我們得到一個包含這兩個特徵的DataFrame。

from finlab import data
import finlab.ml.feature as f
import finlab.ml.qlib as q

features = f.combine({

    # 用 data.get 簡單產生出技術指標
    'pb': data.get('price_earning_ratio:股價淨值比'),

    # 用 data.indicator 產生技術指標的特徵
    'rsi': data.indicator('RSI'),

    # 用 f.ta 枚舉超多種 talib 指標
    'talib': f.ta(f.ta_names()),

    # 利用 qlib alph158 產生技術指標的特徵(請先執行 q.init(), q.dump() 才能使用)
    'qlib158': q.alpha('Alpha158')

    })

features.head()
datetime instrument rsi pb
2020-01-01 1101 0 2
2020-01-02 1102 100 3
2020-01-03 1108 100 4

ta

ta(feature_names, factories=None, resample=None, start_time=None, end_time=None, adj=False, cpu=-1, **kwargs)

Calculate technical indicator values for a list of feature names.

PARAMETER DESCRIPTION
feature_names

A list of technical indicator feature names. Defaults to None.

TYPE: Optional[List[str]]

factories

A dictionary of factories to generate technical indicators. Defaults to {"talib": TalibIndicatorFactory()}.

TYPE: Optioanl[Dict[str, TalibIndicatorFactory]] DEFAULT: None

resample

The frequency to resample the data to. Defaults to None.

TYPE: Optional[str] DEFAULT: None

start_time

The start time of the data. Defaults to None.

TYPE: Optional[str] DEFAULT: None

end_time

The end time of the data. Defaults to None.

TYPE: Optional[str] DEFAULT: None

**kwargs

Additional keyword arguments to pass to the resampler function.

DEFAULT: {}

RETURNS DESCRIPTION
DataFrame

pd.DataFrame: technical indicator feature names and their corresponding values.

ta_names

ta_names(lb=1, ub=10, n=1, factory=None)

Generate a list of technical indicator feature names.

PARAMETER DESCRIPTION
lb

The lower bound of the multiplier of the default parameter for the technical indicators.

TYPE: int DEFAULT: 1

ub

The upper bound of the multiplier of the default parameter for the technical indicators.

TYPE: int DEFAULT: 10

n

The number of random samples for each technical indicator.

TYPE: int DEFAULT: 1

factory

A factory object to generate technical indicators. Defaults to TalibIndicatorFactory.

TYPE: IndicatorFactory DEFAULT: None

RETURNS DESCRIPTION
List[str]

List[str]: A list of technical indicator feature names.

Examples:

import finlab.ml.feature as f


# method 1: generate each indicator with random parameters
features = f.ta()

# method 2: generate specific indicator
feature_names = ['talib.MACD__macdhist__fastperiod__52__slowperiod__212__signalperiod__75__']
features = f.ta(feature_names, resample='W')

# method 3: generate some indicator
feature_names = f.ta_names()
features = f.ta(feature_names)

finlab.ml.label

daytrading_percentage

daytrading_percentage(index, **kwargs)

Calculate the percentage change of market prices over a given period.

PARAMETER DESCRIPTION
index

A multi-level index of datetime and instrument.

TYPE: Index

resample

The resample frequency for the output data. Defaults to None.

TYPE: Optional[str]

period

The number of periods to calculate the percentage change over. Defaults to 1.

TYPE: int

trade_at_price

The price for execution. Defaults to close.

TYPE: str

**kwargs

Additional arguments to be passed to the resampler function.

DEFAULT: {}

RETURNS DESCRIPTION

pd.Series: A pd.Series containing the percentage change of stock prices.

excess_over_mean

excess_over_mean(index, resample=None, period=1, trade_at_price='close', **kwargs)

Calculate the excess over mean of market prices over a given period.

PARAMETER DESCRIPTION
index

A multi-level index of datetime and instrument.

TYPE: Index

resample

The resample frequency for the output data. Defaults to None.

TYPE: Optional[str] DEFAULT: None

period

The number of periods to calculate the percentage change over. Defaults to 1.

TYPE: int DEFAULT: 1

trade_at_price

The price for execution. Defaults to close.

TYPE: str DEFAULT: 'close'

**kwargs

Additional arguments to be passed to the resampler function.

DEFAULT: {}

RETURNS DESCRIPTION

pd.Series: A pd.Series containing the percentage change of stock prices.

excess_over_median

excess_over_median(index, resample=None, period=1, trade_at_price='close', **kwargs)

Calculate the excess over median of market prices over a given period.

PARAMETER DESCRIPTION
index

A multi-level index of datetime and instrument.

TYPE: Index

resample

The resample frequency for the output data. Defaults to None.

TYPE: Optional[str] DEFAULT: None

period

The number of periods to calculate the percentage change over. Defaults to 1.

TYPE: int DEFAULT: 1

trade_at_price

The price for execution. Defaults to close.

TYPE: str DEFAULT: 'close'

**kwargs

Additional arguments to be passed to the resampler function.

DEFAULT: {}

RETURNS DESCRIPTION

pd.Series: A pd.Series containing the percentage change of stock prices.

maximum_adverse_excursion

maximum_adverse_excursion(index, period=1, trade_at_price='close')

Calculate the maximum adverse excursion of market prices over a given period.

PARAMETER DESCRIPTION
index

A multi-level index of datetime and instrument.

TYPE: Index

resample

The resample frequency for the output data. Defaults to None.

TYPE: Optional[str]

period

The number of periods to calculate the percentage change over. Defaults to 1.

TYPE: int DEFAULT: 1

trade_at_price

The price for execution. Defaults to close.

TYPE: str DEFAULT: 'close'

**kwargs

Additional arguments to be passed to the resampler function.

RETURNS DESCRIPTION

pd.Series: A pd.Series containing the percentage change of stock prices.

maximum_favorable_excursion

maximum_favorable_excursion(index, period=1, trade_at_price='close')

Calculate the maximum favorable excursion of market prices over a given period.

PARAMETER DESCRIPTION
index

A multi-level index of datetime and instrument.

TYPE: Index

resample

The resample frequency for the output data. Defaults to None.

TYPE: Optional[str]

period

The number of periods to calculate the percentage change over. Defaults to 1.

TYPE: int DEFAULT: 1

trade_at_price

The price for execution. Defaults to close.

TYPE: str DEFAULT: 'close'

**kwargs

Additional arguments to be passed to the resampler function.

RETURNS DESCRIPTION

pd.Series: A pd.Series containing the percentage change of stock prices.

return_percentage

return_percentage(index, resample=None, period=1, trade_at_price='close', bfill=False, **kwargs)

Calculate the percentage change of market prices over a given period.

PARAMETER DESCRIPTION
index

A multi-level index of datetime and instrument.

TYPE: Index

resample

The resample frequency for the output data. Defaults to None.

TYPE: Optional[str] DEFAULT: None

period

The number of periods to calculate the percentage change over. Defaults to 1.

TYPE: int DEFAULT: 1

trade_at_price

The price for execution. Defaults to close.

TYPE: str DEFAULT: 'close'

**kwargs

Additional arguments to be passed to the resampler function.

DEFAULT: {}

RETURNS DESCRIPTION

pd.Series: A pd.Series containing the percentage change of stock prices.

finlab.ml.qlib

DumpDataBase

DumpDataBase(csv_path, qlib_dir, backup_dir=None, freq='day', max_workers=16, date_field_name='date', file_suffix='.csv', symbol_field_name='symbol', exclude_fields='', include_fields='', limit_nums=None)

Base class for dumping data to Qlib format.

PARAMETER DESCRIPTION
csv_path

The path to the CSV file or directory containing the CSV files.

TYPE: str

qlib_dir

The directory where the Qlib data will be saved.

TYPE: str

backup_dir

The directory where the backup of the Qlib data will be saved. Defaults to None.

TYPE: str DEFAULT: None

freq

The frequency of the data. Defaults to "day".

TYPE: str DEFAULT: 'day'

max_workers

The maximum number of workers for parallel processing. Defaults to 16.

TYPE: int DEFAULT: 16

date_field_name

The name of the date field in the CSV file. Defaults to "date".

TYPE: str DEFAULT: 'date'

file_suffix

The suffix of the CSV file. Defaults to ".csv".

TYPE: str DEFAULT: '.csv'

symbol_field_name

The name of the symbol field in the CSV file. Defaults to "symbol".

TYPE: str DEFAULT: 'symbol'

exclude_fields

The fields to exclude from the dumped data. Defaults to "".

TYPE: str DEFAULT: ''

include_fields

The fields to include in the dumped data. Defaults to "".

TYPE: str DEFAULT: ''

limit_nums

The maximum number of CSV files to process. Defaults to None.

TYPE: int DEFAULT: None

CatBoostModel

CatBoostModel()

CatBoostModel is a wrapper model for CatBoost model.

import finlab.ml.qlib as q

# build X_train, y_train, X_test

model = q.CatBoostModel()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

DEnsmbleModel

DEnsmbleModel()

DEnsmbleModel is a wrapper model for Double Ensemble model.

import finlab.ml.qlib as q

# build X_train, y_train, X_test

model = q.DEnsmbleModel()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

DNNModel

DNNModel()

DNNModel is a wrapper model for Deep Neural Network model.

import finlab.ml.qlib as q

# build X_train, y_train, X_test

model = q.DNNModel()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

LGBModel

LGBModel()

LGBModel is a wrapper model for LightGBM model.

import finlab.ml.qlib as q

# build X_train, y_train, X_test

model = q.LGBModel()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

LinearModel

LinearModel()

LinearModel is a wrapper model for Linear model.

import finlab.ml.qlib as q

# build X_train, y_train, X_test

model = q.LinearModel()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

SFMModel

SFMModel()

SFMModel is a wrapper model for SFM.

import finlab.ml.qlib as q

# build X_train, y_train, X_test

model = q.SFMModel()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

TabnetModel

TabnetModel()

TabnetModel is a wrapper model for Tabnet model.

import finlab.ml.qlib as q

# build X_train, y_train, X_test

model = q.TabnetModel()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

XGBModel

XGBModel()

XGBModel is a wrapper model for XGBoost model.

import finlab.ml.qlib as q

# build X_train, y_train, X_test

model = q.XGBModel()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

alpha

alpha(handler='Alpha158', **kwargs)

產生 Qlib 的特徵 Args: handler (str): 預設為 'alpha158' 也可以設定成 'Alpha360' Examples:

import finlab.ml.qlib as q
features = q.alpha('Alpha158')

dump

dump(freq='day')

產生Qlib 於台股的資料庫 Examples:

import qlib
import finlab.ml.qlib as q

q.dump() # generate tw stock database
q.init() # initiate tw stock to perform machine leraning tasks (similar to qlib.init)

import qlib
# qlib functions and operations

get_models

get_models()

Return a list of available models. Examples:

import finlab.ml.qlib as q

models = q.get_models()
print(models)
output:

{ 'LGBModel': LGBModel, 'XGBModel': XGBModel, 'DEnsmbleModel': DEnsmbleModel, 'CatBoostModel': CatBoostModel, 'LinearModel': LinearModel, 'TabnetModel': TabnetModel, 'DNNModel': DNNModel, 'SFMModel': SFMModel}

init

init()

Qlib 初始化 (類似於台股版 qlib.init() 但更簡單易用) Examples:

import qlib
import finlab.ml.qlib as q

q.dump() # generate tw stock database
q.init() # initiate tw stock to perform machine leraning tasks (similar to qlib.init)

import qlib
# qlib functions and operations

finlab.ml.alphalens

create_factor_data

create_factor_data(factor, adj_close, days=[5, 10, 20, 60])

create factor data, which contains future return

PARAMETER DESCRIPTION
factor

factor data where index is datetime and columns is asset id

TYPE: DataFrame

adj_close

adj close where index is datetime and columns is asset id

TYPE: DataFrame

days

future return considered

TYPE: List[int] DEFAULT: [5, 10, 20, 60]

Return

Analytic plots and tables

Examples:

股價淨值比分析
import alphalens
from finlab import data
from finlab.ml.alphalens import create_factor_data

factor = data.get('price_earning_ratio:股價淨值比')
adj_close = data.get('etl:adj_close')

factor_data = create_factor_data(factor, adj_close)

alphalens.tears.create_full_tear_sheet(factor_data.dropna(), long_short=False,
                                       group_neutral=False, by_group=False)

factor_weights

factor_weights(factor_data, demeaned=True, group_adjust=False, equal_weight=False)

Computes asset weights by factor values and dividing by the sum of their absolute value (achieving gross leverage of 1). Positive factor values will results in positive weights and negative values in negative weights.

PARAMETER DESCRIPTION
factor_data

A MultiIndex DataFrame indexed by date (level 0) and asset (level 1), containing the values for a single alpha factor, forward returns for each period, the factor quantile/bin that factor value belongs to, and (optionally) the group the asset belongs to. - See full explanation in utils.get_clean_factor_and_forward_returns

TYPE: DataFrame - MultiIndex

demeaned

Should this computation happen on a long short portfolio? if True, weights are computed by demeaning factor values and dividing by the sum of their absolute value (achieving gross leverage of 1). The sum of positive weights will be the same as the negative weights (absolute value), suitable for a dollar neutral long-short portfolio

TYPE: bool DEFAULT: True

group_adjust

Should this computation happen on a group neutral portfolio? If True, compute group neutral weights: each group will weight the same and if 'demeaned' is enabled the factor values demeaning will occur on the group level.

TYPE: bool DEFAULT: False

equal_weight

if True the assets will be equal-weighted instead of factor-weighted If demeaned is True then the factor universe will be split in two equal sized groups, top assets with positive weights and bottom assets with negative weights

TYPE: bool DEFAULT: False

RETURNS DESCRIPTION
returns

pd.Series Assets weighted by factor value.