utils module
- utils.Fill_RTH_Minutes(df)[source]
Data preprocessing. Given a pandas dataframe, it fills missing values in the Regular Trading Hours (RTH) (09:30 - 16:00). Only the days where there is at least one observation are used, as they are assumed to be all and only the working days.
- INPUTS:
- df: pandas DataFrame,
the dataframe, made up of minute-by-minute values
- OUTPUTS:
- df: pandas DataFrame,
the preprocessed dataframe, with no holes and filled missing values.
Example of usage
import pandas as pd from utils import Fill_RTH_Minutes df = pd.read_csv('stock.csv') df.index = pd.to_datetime(df.index) df_full = Fill_RTH_Minutes(df) #Data Preprocessing
- utils.IntraSeries_2_DayReturn(series)[source]
Computes the day return of a time series representing intraday prices.
- INPUTS:
series: pandas Series,
the time series of intraday prices
- OUTPUTS:
output: numpy array,
the daily log returns
Example of usage
import pandas as pd from utils import IntraSeries_2_DayReturn df = pd.read_csv('stock.csv') df.index = pd.to_datetime(df.index) daily_ret = IntraSeries_2_DayReturn(df.Close) #Compute the daily log returns
- class utils.Subordinator(c, sub_type='clock')[source]
Bases:
object
Subordinator class. It is intended to be used on each day separately.
Parameters:
- c: int
the number of time indexes to sample
- sub_type: str, optional
Type of subordinator to use. Either ‘clock’ for the clock time; ‘tpv’ for the TriPower Variation; ‘vol’ for the volume; or ‘identity’ for the identity (that is, all the indexes are returned). Default is ‘clock’
Example of usage
import numpy as np import pandas as pd from utils import subordinator df = pd.read_csv('stock.csv') df.index = pd.to_datetime(df.index) day_price = df.Close day_vol = df.Volume subordinator = Subordinator(78, sub_type='vol') #Initialize the subordinator target_idxs = day_price.iloc[subordinator.predict(day_price, vol=day_vol)] #Extract the subordinated indexes log_ret = np.log(target_points).diff().dropna() #Compute the subordinated logarithmic returns
Methods:
- predict(data, vol=None, tpv_int_min=15)[source]
Returns the index position of the subordinated values. That is, return the vector au(j), with j=0,..,c, where au is intended to be the subordinator.
- INPUTS:
- data: pd.Series
the time series of intra-day prices, over all the day (that is, from 09:30 to 16:00)
- tpv_int_min: int, optional
half-length of the window used for computing the tri-power variation. Only used when self.sub_type == ‘tpv’. Default is 15
- vol: pd.Series, optional
the volume series, on the same indexes as data. Only used when self.sub_type == ‘vol’
- OUTPUTS:
- sub_idxs: list
the indexes corresponding to the subordinated values
- sample_with_tie_break(a)[source]
Ensure the sampled time indexes are unique (i.e., not overlapping). That is, it manage the tie-break rule for transofrming a non-injective function into an injective subordinator.
- INPUTS:
- a: ndarray
the cumulative intensities
- OUTPUTS:
- indices: ndarray
the indexes corresponding to the subordinated values
- utils.price2params(y, c, mu=0.0, sub_type='clock', vol=None)[source]
Fit the intra-day distribution, which is assumed to be a Student’s t-distribution
- INPUTS:
- y: pandas Series,
the price time series, with a minute-by-minute granularity, over all the considered period (e.g., one year of data)
- c: int
the number of time indexes to sample
- mu: float,
the intra-day mean. It could be either a float or None. In the latter case, it will be estimated from the data. It is preferable to not set mu=None. If you don’t have a reliable estimate for it, simply use 0. Default is 0
- sub_type: str, optional
Type of subordinator to use. Either ‘clock’ for the clock time; ‘tpv’ for the TriPower Variation; ‘vol’ for the volume; or ‘identity’ for the identity (that is, all the indexes are returned). Default is ‘clock’
- vol: pd.Series, optional
the volume series, on the same indexes as data. Only used when sub_type == ‘vol’
- OUTPUTS:
- out_pars: dict,
the fitted parameters. Every key correspond to a day of the y series. Every value is a list containing nu, mu, sigma.
Example of usage
import pandas as pd from utils import price2params df = pd.read_csv('stock.csv') price = df.Close vol = df.Volume fitted_pars = price2params(price, c=78, sub_type='vol', vol=vol)
- utils.price2params_ma(y, c, mu=0, sub_type='clock', vol=None)[source]
Fit the intra-day distribution, which is assumed to be a MA(1) process with Student’s t innovations.
- INPUTS:
- y: pandas Series,
the price time series, with a minute-by-minute granularity, over all the considered period (e.g., one year of data)
- c: int
the number of time indexes to sample
- mu: float,
the intra-day mean. It could be either a float or None. In the latter case, it will be estimated from the data. It is preferable to not set mu=None. If you don’t have a reliable estimate for it, simply use 0. Default is 0
- sub_type: str, optional
Type of subordinator to use. Either ‘clock’ for the clock time; ‘tpv’ for the TriPower Variation; ‘vol’ for the volume; or ‘identity’ for the identity (that is, all the indexes are returned). Default is ‘clock’
- vol: pd.Series, optional
the volume series, on the same indexes as data. Only used when sub_type == ‘vol’
- OUTPUTS:
- out_pars: dict,
the fitted parameters. Every key correspond to a day of the y series. Every value is a list containing nu, mu, sigma.
Example of usage
import pandas as pd from utils import price2params_ma df = pd.read_csv('stock.csv') price = df.Close vol = df.Volume fitted_pars = price2params_ma(price, c=39, sub_type='vol', vol=vol)