To assist the evaluation of hierarchical forecasting systems, we make available an evaluate function that can be used in combination with loss functions from utilsforecast.losses.

source

evaluate

 evaluate (df:~FrameT, metrics:list[typing.Callable],
           tags:dict[str,numpy.ndarray], models:Optional[list[str]]=None,
           train_df:Optional[~FrameT]=None,
           level:Optional[list[int]]=None, id_col:str='unique_id',
           time_col:str='ds', target_col:str='y',
           agg_fn:Optional[str]='mean', benchmark:Optional[str]=None)

Evaluate hierarchical forecast using different metrics.

	Type	Default	Details
df	FrameT		Forecasts to evaluate. Must have `id_col`, `time_col`, `target_col` and models’ predictions.
metrics	list		Functions with arguments `df`, `models`, `id_col`, `target_col` and optionally `train_df`.
tags	dict		Each key is a level in the hierarchy and its value contains tags associated to that level.
models	Optional	None	Names of the models to evaluate. If `None` will use every column in the dataframe after removing id, time and target.
train_df	Optional	None	Training set. Used to evaluate metrics such as `mase`.
level	Optional	None	Prediction interval levels. Used to compute losses that rely on quantiles.
id_col	str	unique_id	Column that identifies each serie.
time_col	str	ds	Column that identifies each timestep, its values can be timestamps or integers.
target_col	str	y	Column that contains the target.
agg_fn	Optional	mean	Statistic to compute on the scores by id to reduce them to a single number.
benchmark	Optional	None	If passed, evaluators are scaled by the error of this benchmark model.
Returns	FrameT		Metrics with one row per (id, metric) combination and one column per model. If `agg_fn` is not `None`, there is only one row per metric.

Example

import pandas as pd

from hierarchicalforecast.core import HierarchicalReconciliation
from hierarchicalforecast.methods import BottomUp, MinTrace
from hierarchicalforecast.utils import aggregate
from hierarchicalforecast.evaluation import evaluate
from statsforecast.core import StatsForecast
from statsforecast.models import AutoETS
from utilsforecast.losses import mase, rmse
from functools import partial

# Load TourismSmall dataset
df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/tourism.csv')
df = df.rename({'Trips': 'y', 'Quarter': 'ds'}, axis=1)
df.insert(0, 'Country', 'Australia')
qs = df['ds'].str.replace(r'(\d+) (Q\d)', r'\1-\2', regex=True)
df['ds'] = pd.PeriodIndex(qs, freq='Q').to_timestamp()

# Create hierarchical seires based on geographic levels and purpose
# And Convert quarterly ds string to pd.datetime format
hierarchy_levels = [['Country'],
                    ['Country', 'State'], 
                    ['Country', 'Purpose'], 
                    ['Country', 'State', 'Region'], 
                    ['Country', 'State', 'Purpose'], 
                    ['Country', 'State', 'Region', 'Purpose']]

Y_df, S_df, tags = aggregate(df=df, spec=hierarchy_levels)

# Split train/test sets
Y_test_df  = Y_df.groupby('unique_id').tail(8)
Y_train_df = Y_df.drop(Y_test_df.index)

# Compute base auto-ETS predictions
# Careful identifying correct data freq, this data quarterly 'Q'
fcst = StatsForecast(models=[AutoETS(season_length=4, model='ZZA')], freq='QS', n_jobs=-1)
Y_hat_df = fcst.forecast(df=Y_train_df, h=8, fitted=True)
Y_fitted_df = fcst.forecast_fitted_values()

reconcilers = [
                BottomUp(),
                MinTrace(method='ols'),
                MinTrace(method='mint_shrink'),
               ]
hrec = HierarchicalReconciliation(reconcilers=reconcilers)
Y_rec_df = hrec.reconcile(Y_hat_df=Y_hat_df, 
                          Y_df=Y_fitted_df,
                          S_df=S_df, tags=tags)

# Evaluate
eval_tags = {}
eval_tags['Total'] = tags['Country']
eval_tags['Purpose'] = tags['Country/Purpose']
eval_tags['State'] = tags['Country/State']
eval_tags['Regions'] = tags['Country/State/Region']
eval_tags['Bottom'] = tags['Country/State/Region/Purpose']

Y_rec_df_with_y = Y_rec_df.merge(Y_test_df, on=['unique_id', 'ds'], how='left')
mase_p = partial(mase, seasonality=4)

evaluation = evaluate(Y_rec_df_with_y, 
         metrics=[mase_p, rmse], 
         tags=eval_tags, 
         train_df=Y_train_df)

numeric_cols = evaluation.select_dtypes(include="number").columns
evaluation[numeric_cols] = evaluation[numeric_cols].map('{:.2f}'.format)

Hierarchical Evaluation

evaluate

Example

References

Documentation Index

​evaluate

​Example

​References

evaluate

Example

References