The HierarchicalForecast package contains utility functions to wrangle and visualize hierarchical series datasets. The aggregate function of the module allows you to create a hierarchy from categorical variables representing the structure levels, returning also the aggregation contraints matrix

\mathbf{S}

. In addition, HierarchicalForecast ensures compatibility of its reconciliation methods with other popular machine-learning libraries via its external forecast adapters that transform output base forecasts from external libraries into a compatible data frame format.

Aggregate Function

source

aggregate

 aggregate
            (df:Union[ForwardRef('DataFrame[Any]'),ForwardRef('LazyFrame[A
            ny]')], spec:list[list[str]],
            exog_vars:Optional[dict[str,Union[str,list[str]]]]=None,
            sparse_s:bool=False, id_col:str='unique_id',
            time_col:str='ds', id_time_col:Optional[str]=None,
            target_cols:collections.abc.Sequence[str]=('y',))

Utils Aggregation Function. Aggregates bottom level series contained in the DataFrame df according to levels defined in the spec list.

	Type	Default	Details
df	Union		Dataframe with columns `[time_col, *target_cols]`, columns to aggregate and optionally exog_vars.
spec	list		list of levels. Each element of the list should contain a list of columns of `df` to aggregate.
exog_vars	Optional	None
sparse_s	bool	False	Return `S_df` as a sparse Pandas dataframe.
id_col	str	unique_id	Column that will identify each serie after aggregation.
time_col	str	ds	Column that identifies each timestep, its values can be timestamps or integers.
id_time_col	Optional	None	Column that will identify each timestep after temporal aggregation. If provided, aggregate will operate temporally.
target_cols	Sequence	(‘y’,)	list of columns that contains the targets to aggregate.
Returns	tuple		Hierarchically structured series.

source

aggregate_temporal

 aggregate_temporal
                     (df:Union[ForwardRef('DataFrame[Any]'),ForwardRef('La
                     zyFrame[Any]')], spec:dict[str,int], exog_vars:Option
                     al[dict[str,Union[str,list[str]]]]=None,
                     sparse_s:bool=False, id_col:str='unique_id',
                     time_col:str='ds', id_time_col:str='temporal_id',
                     target_cols:collections.abc.Sequence[str]=('y',),
                     aggregation_type:str='local')

Utils Aggregation Function for Temporal aggregations. Aggregates bottom level timesteps contained in the DataFrame df according to temporal levels defined in the spec list.

	Type	Default	Details
df	Union		Dataframe with columns `[time_col, target_cols]` and columns to aggregate.
spec	dict		Dictionary of temporal levels. Each key should be a string with the value representing the number of bottom-level timesteps contained in the aggregation.
exog_vars	Optional	None
sparse_s	bool	False	Return `S_df` as a sparse Pandas dataframe.
id_col	str	unique_id	Column that will identify each serie after aggregation.
time_col	str	ds	Column that identifies each timestep, its values can be timestamps or integers.
id_time_col	str	temporal_id	Column that will identify each timestep after aggregation.
target_cols	Sequence	(‘y’,)	List of columns that contain the targets to aggregate.
aggregation_type	str	local	If ‘local’ the aggregation will be performed on the timestamps of each timeseries independently. If ‘global’ the aggregation will be performed on the unique timestamps of all timeseries.
Returns	tuple		Temporally hierarchically structured series.

source

make_future_dataframe

 make_future_dataframe
                        (df:Union[ForwardRef('DataFrame[Any]'),ForwardRef(
                        'LazyFrame[Any]')], freq:Union[str,int], h:int,
                        id_col:str='unique_id', time_col:str='ds')

Create future dataframe for forecasting.

	Type	Default	Details
df	Union		Dataframe with ids, times and values for the exogenous regressors.
freq	Union		Frequency of the data. Must be a valid pandas or polars offset alias, or an integer.
h	int		Forecast horizon.
id_col	str	unique_id	Column that identifies each serie.
time_col	str	ds	Column that identifies each timestep, its values can be timestamps or integers.
Returns	FrameT		DataFrame with future values

source

get_cross_temporal_tags

 get_cross_temporal_tags
                          (df:Union[ForwardRef('DataFrame[Any]'),ForwardRe
                          f('LazyFrame[Any]')],
                          tags_cs:dict[str,numpy.ndarray],
                          tags_te:dict[str,numpy.ndarray], sep:str='//',
                          id_col:str='unique_id',
                          id_time_col:str='temporal_id',
                          cross_temporal_id_col:str='cross_temporal_id')

Get cross-temporal tags.

	Type	Default	Details
df	Union		DataFrame with temporal ids.
tags_cs	dict		Tags for the cross-sectional hierarchies
tags_te	dict		Tags for the temporal hierarchies
sep	str	//	Separator for the cross-temporal tags.
id_col	str	unique_id	Column that identifies each serie.
id_time_col	str	temporal_id	Column that identifies each (aggregated) timestep.
cross_temporal_id_col	str	cross_temporal_id	Column that will identify each cross-temporal aggregation.
Returns	tuple		DataFrame with cross-temporal ids.

Hierarchical Visualization

source

HierarchicalPlot

 HierarchicalPlot
                   (S:Union[ForwardRef('DataFrame[Any]'),ForwardRef('LazyF
                   rame[Any]')], tags:dict[str,numpy.ndarray],
                   S_id_col:str='unique_id')

*Hierarchical Plot This class contains a collection of matplotlib visualization methods, suited for small to medium sized hierarchical series. Parameters:
S: DataFrame with summing matrix of size (base, bottom), see aggregate function.
tags: np.ndarray, with hierarchical aggregation indexes, where each key is a level and its value contains tags associated to that level.
S_id_col : str=‘unique_id’, column that identifies each aggregation.
*

source

plot_summing_matrix

 plot_summing_matrix ()

*Summation Constraints plot This method simply plots the hierarchical aggregation constraints matrix

\mathbf{S}

. Returns:
fig: matplotlib.figure.Figure, figure object containing the plot of the summing matrix.*

source

plot_series

 plot_series (series:str,
              Y_df:Union[ForwardRef('DataFrame[Any]'),ForwardRef('LazyFram
              e[Any]')], models:Optional[list[str]]=None,
              level:Optional[list[int]]=None, id_col:str='unique_id',
              time_col:str='ds', target_col:str='y')

*Single Series plot Parameters:
series: str, string identifying the 'unique_id' any-level series to plot.
Y_df: DataFrame, hierarchically structured series (

\mathbf{y}_{[a,b]}

). It contains columns ['unique_id', 'ds', 'y'], it may have 'models'.
models: list[str], string identifying filtering model columns.
level: float list 0-100, confidence levels for prediction intervals available in Y_df.
id_col : str=‘unique_id’, column that identifies each serie.
time_col : str=‘ds’, column that identifies each timestep, its values can be timestamps or integers.
target_col : str=‘y’, column that contains the target.
Returns:
fig: matplotlib.figure.Figure, figure object containing the plot of the single series.*

source

plot_hierarchically_linked_series

 plot_hierarchically_linked_series (bottom_series:str,
                                    Y_df:Union[ForwardRef('DataFrame[Any]'
                                    ),ForwardRef('LazyFrame[Any]')],
                                    models:Optional[list[str]]=None,
                                    level:Optional[list[int]]=None,
                                    id_col:str='unique_id',
                                    time_col:str='ds', target_col:str='y')

*Hierarchically Linked Series plot Parameters:
bottom_series: str, string identifying the 'unique_id' bottom-level series to plot.
Y_df: DataFrame, hierarchically structured series (

\mathbf{y}_{[a,b]}

). It contains columns [‘unique_id’, ‘ds’, ‘y’] and models.
models: list[str], string identifying filtering model columns.
level: float list 0-100, confidence levels for prediction intervals available in Y_df.
id_col : str=‘unique_id’, column that identifies each serie.
time_col : str=‘ds’, column that identifies each timestep, its values can be timestamps or integers.
target_col : str=‘y’, column that contains the target.
Returns:
fig: matplotlib.figure.Figure, figure object containing the plots of the hierarchilly linked series.*

source

plot_hierarchical_predictions_gap

 plot_hierarchical_predictions_gap
                                    (Y_df:Union[ForwardRef('DataFrame[Any]
                                    '),ForwardRef('LazyFrame[Any]')],
                                    models:Optional[list[str]]=None,
                                    xlabel:Optional[str]=None,
                                    ylabel:Optional[str]=None,
                                    id_col:str='unique_id',
                                    time_col:str='ds', target_col:str='y')

*Hierarchically Predictions Gap plot Parameters:
Y_df: DataFrame, hierarchically structured series (

\mathbf{y}_{[a,b]}

). It contains columns [‘unique_id’, ‘ds’, ‘y’] and models.
models: list[str], string identifying filtering model columns.
xlabel: str, string for the plot’s x axis label.
ylabel: str, string for the plot’s y axis label.
id_col : str=‘unique_id’, column that identifies each serie.
time_col : str=‘ds’, column that identifies each timestep, its values can be timestamps or integers.
target_col : str=‘y’, column that contains the target.
Returns:
fig: matplotlib.figure.Figure, figure object containing the plot of the aggregated predictions at different levels of the hierarchical structure.*

from statsforecast.core import StatsForecast
from statsforecast.models import AutoETS
from datasetsforecast.hierarchical import HierarchicalData

Y_df, S, tags = HierarchicalData.load('./data', 'Labour')
Y_df['ds'] = pd.to_datetime(Y_df['ds'])
S = S.reset_index(names="unique_id")

Y_test_df  = Y_df.groupby('unique_id').tail(24)
Y_train_df = Y_df.drop(Y_test_df.index)

fcst = StatsForecast( 
    models=[AutoETS(season_length=12, model='AAZ')],
    freq='MS', 
    n_jobs=-1
)
Y_hat_df = fcst.forecast(df=Y_train_df, h=24).reset_index()

# Plot prediction difference of different aggregation
# Levels Country, Country/Region, Country/Gender/Region ...
hplots = HierarchicalPlot(S=S, tags=tags)

hplots.plot_hierarchical_predictions_gap(
    Y_df=Y_hat_df, models='AutoETS',
    xlabel='Month', ylabel='Predictions',
)

# polars
from statsforecast.core import StatsForecast
from statsforecast.models import AutoETS
from datasetsforecast.hierarchical import HierarchicalData

Y_df, S, tags = HierarchicalData.load('./data', 'Labour')
Y_df['ds'] = pd.to_datetime(Y_df['ds'])
S = S.reset_index(names="unique_id")

Y_test_df  = Y_df.groupby('unique_id').tail(24)
Y_train_df = Y_df.drop(Y_test_df.index)
Y_test_df_pl  = pl.from_pandas(Y_test_df)
Y_train_df_pl = pl.from_pandas(Y_train_df)

fcst = StatsForecast(
    models=[AutoETS(season_length=12, model='AAZ')],
    freq='1m', 
    n_jobs=-1
)
Y_hat_df = fcst.forecast(df=Y_train_df_pl, h=24)

# Plot prediction difference of different aggregation
# Levels Country, Country/Region, Country/Gender/Region ...
hplots = HierarchicalPlot(S=S, tags=tags)

hplots.plot_hierarchical_predictions_gap(
    Y_df=Y_hat_df, models='AutoETS',
    xlabel='Month', ylabel='Predictions',
)

External Forecast Adapters

source

samples_to_quantiles_df

 samples_to_quantiles_df (samples:numpy.ndarray,
                          unique_ids:collections.abc.Sequence[str],
                          dates:list[str],
                          quantiles:Optional[list[float]]=None,
                          level:Optional[list[int]]=None,
                          model_name:str='model', id_col:str='unique_id',
                          time_col:str='ds', backend:str='pandas')

*Transform Random Samples into HierarchicalForecast input. Auxiliary function to create compatible HierarchicalForecast input Y_hat_df dataframe. Parameters:
samples: numpy array. Samples from forecast distribution of shape [n_series, n_samples, horizon].
unique_ids: string list. Unique identifiers for each time series.
dates: datetime list. list of forecast dates.
quantiles: float list in [0., 1.]. Alternative to level, quantiles to estimate from y distribution.
level: int list in [0,100]. Probability levels for prediction intervals.
model_name: string. Name of forecasting model.
id_col : str=‘unique_id’, column that identifies each serie.
time_col : str=‘ds’, column that identifies each timestep, its values can be timestamps or integers.
backend : str=‘pandas’, backend to use for the output dataframe, either ‘pandas’ or ‘polars’.
Returns:
quantiles: float list in [0., 1.]. quantiles to estimate from y distribution .
Y_hat_df: DataFrame. With base quantile forecasts with columns ds and models to reconcile indexed by unique_id.*

Getting Started

Tutorials

API Reference

Aggregation/Visualization Utils

Aggregate Function

aggregate

aggregate_temporal

make_future_dataframe

get_cross_temporal_tags

Hierarchical Visualization

HierarchicalPlot

plot_summing_matrix

plot_series

plot_hierarchically_linked_series

plot_hierarchical_predictions_gap

External Forecast Adapters

samples_to_quantiles_df

​Aggregate Function

​aggregate

​aggregate_temporal

​make_future_dataframe

​get_cross_temporal_tags

​Hierarchical Visualization

​HierarchicalPlot

​plot_summing_matrix

​plot_series

​plot_hierarchically_linked_series

​plot_hierarchical_predictions_gap

​External Forecast Adapters

​samples_to_quantiles_df

Aggregate Function

aggregate

aggregate_temporal

make_future_dataframe

get_cross_temporal_tags

Hierarchical Visualization

HierarchicalPlot

plot_summing_matrix

plot_series

plot_hierarchically_linked_series

plot_hierarchical_predictions_gap

External Forecast Adapters

samples_to_quantiles_df