utils module

utils.Fill_RTH_Minutes(df)[source]

Data preprocessing. Given a pandas dataframe, it fills missing values in the Regular Trading Hours (RTH) (09:30 - 16:00). Only the days where there is at least one observation are used, as they are assumed to be all and only the working days.

INPUTS:
  • df: pandas DataFrame,

    the dataframe, made up of minute-by-minute values

OUTPUTS:
  • df: pandas DataFrame,

    the preprocessed dataframe, with no holes and filled missing values.

Example of usage

import pandas as pd
from utils import Fill_RTH_Minutes

df = pd.read_csv('stock.csv')
df.index = pd.to_datetime(df.index)

df_full = Fill_RTH_Minutes(df) #Data Preprocessing
utils.IntraSeries_2_DayReturn(series)[source]

Computes the day return of a time series representing intraday prices.

INPUTS:
  • series: pandas Series,

the time series of intraday prices

OUTPUTS:
  • output: numpy array,

the daily log returns

Example of usage

import pandas as pd
from utils import IntraSeries_2_DayReturn

df = pd.read_csv('stock.csv')
df.index = pd.to_datetime(df.index)

daily_ret = IntraSeries_2_DayReturn(df.Close) #Compute the daily log returns
class utils.Subordinator(c, sub_type='clock')[source]

Bases: object

Subordinator class. It is intended to be used on each day separately.

Parameters:

  • c: int

    the number of time indexes to sample

  • sub_type: str, optional

    Type of subordinator to use. Either ‘clock’ for the clock time; ‘tpv’ for the TriPower Variation; ‘vol’ for the volume; or ‘identity’ for the identity (that is, all the indexes are returned). Default is ‘clock’

Example of usage

import numpy as np
import pandas as pd
from utils import subordinator

df = pd.read_csv('stock.csv')
df.index = pd.to_datetime(df.index)
day_price = df.Close
day_vol = df.Volume

subordinator = Subordinator(78, sub_type='vol') #Initialize the subordinator

target_idxs = day_price.iloc[subordinator.predict(day_price, vol=day_vol)] #Extract the subordinated indexes
log_ret = np.log(target_points).diff().dropna() #Compute the subordinated logarithmic returns

Methods:

predict(data, vol=None, tpv_int_min=15)[source]

Returns the index position of the subordinated values. That is, return the vector au(j), with j=0,..,c, where au is intended to be the subordinator.

INPUTS:
  • data: pd.Series

    the time series of intra-day prices, over all the day (that is, from 09:30 to 16:00)

  • tpv_int_min: int, optional

    half-length of the window used for computing the tri-power variation. Only used when self.sub_type == ‘tpv’. Default is 15

  • vol: pd.Series, optional

    the volume series, on the same indexes as data. Only used when self.sub_type == ‘vol’

OUTPUTS:
  • sub_idxs: list

    the indexes corresponding to the subordinated values

sample_with_tie_break(a)[source]

Ensure the sampled time indexes are unique (i.e., not overlapping). That is, it manage the tie-break rule for transofrming a non-injective function into an injective subordinator.

INPUTS:
  • a: ndarray

    the cumulative intensities

OUTPUTS:
  • indices: ndarray

    the indexes corresponding to the subordinated values

utils.price2params(y, c, mu=0.0, sub_type='clock', vol=None)[source]

Fit the intra-day distribution, which is assumed to be a Student’s t-distribution

INPUTS:
  • y: pandas Series,

    the price time series, with a minute-by-minute granularity, over all the considered period (e.g., one year of data)

  • c: int

    the number of time indexes to sample

  • mu: float,

    the intra-day mean. It could be either a float or None. In the latter case, it will be estimated from the data. It is preferable to not set mu=None. If you don’t have a reliable estimate for it, simply use 0. Default is 0

  • sub_type: str, optional

    Type of subordinator to use. Either ‘clock’ for the clock time; ‘tpv’ for the TriPower Variation; ‘vol’ for the volume; or ‘identity’ for the identity (that is, all the indexes are returned). Default is ‘clock’

  • vol: pd.Series, optional

    the volume series, on the same indexes as data. Only used when sub_type == ‘vol’

OUTPUTS:
  • out_pars: dict,

    the fitted parameters. Every key correspond to a day of the y series. Every value is a list containing nu, mu, sigma.

Example of usage

import pandas as pd
from utils import price2params

df = pd.read_csv('stock.csv')
price = df.Close
vol = df.Volume

fitted_pars = price2params(price, c=78, sub_type='vol', vol=vol)
utils.price2params_ma(y, c, mu=0, sub_type='clock', vol=None)[source]

Fit the intra-day distribution, which is assumed to be a MA(1) process with Student’s t innovations.

INPUTS:
  • y: pandas Series,

    the price time series, with a minute-by-minute granularity, over all the considered period (e.g., one year of data)

  • c: int

    the number of time indexes to sample

  • mu: float,

    the intra-day mean. It could be either a float or None. In the latter case, it will be estimated from the data. It is preferable to not set mu=None. If you don’t have a reliable estimate for it, simply use 0. Default is 0

  • sub_type: str, optional

    Type of subordinator to use. Either ‘clock’ for the clock time; ‘tpv’ for the TriPower Variation; ‘vol’ for the volume; or ‘identity’ for the identity (that is, all the indexes are returned). Default is ‘clock’

  • vol: pd.Series, optional

    the volume series, on the same indexes as data. Only used when sub_type == ‘vol’

OUTPUTS:
  • out_pars: dict,

    the fitted parameters. Every key correspond to a day of the y series. Every value is a list containing nu, mu, sigma.

Example of usage

import pandas as pd
from utils import price2params_ma

df = pd.read_csv('stock.csv')
price = df.Close
vol = df.Volume

fitted_pars = price2params_ma(price, c=39, sub_type='vol', vol=vol)