hierarchical_pipeline

Classes

HierarchicalPipeline(reconciliator, model[, ...])

Pipeline of transforms with a final estimator for hierarchical time series data.

class HierarchicalPipeline(reconciliator: etna.reconciliation.base.BaseReconciliator, model: Union[etna.models.base.NonPredictionIntervalContextIgnorantAbstractModel, etna.models.base.NonPredictionIntervalContextRequiredAbstractModel, etna.models.base.PredictionIntervalContextIgnorantAbstractModel, etna.models.base.PredictionIntervalContextRequiredAbstractModel], transforms: Sequence[etna.transforms.base.Transform] = (), horizon: int = 1)[source]

Pipeline of transforms with a final estimator for hierarchical time series data.

Notes

Aggregation of target quantiles and components is performed along with the target itself. It uses a provided hierarchical structure and a reconciliation method.

Create instance of HierarchicalPipeline with given parameters.

Parameters

Warning

Estimation of forecast intervals with forecast(prediction_interval=True) method and BottomUpReconciliator may be not reliable.

backtest(ts: etna.datasets.tsdataset.TSDataset, metrics: List[etna.metrics.base.Metric], n_folds: Union[int, List[etna.pipeline.base.FoldMask]] = 5, mode: Optional[str] = None, aggregate_metrics: bool = False, n_jobs: int = 1, refit: Union[bool, int] = True, stride: Optional[int] = None, joblib_params: Optional[Dict[str, Any]] = None, forecast_params: Optional[Dict[str, Any]] = None) Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame]

Run backtest with the pipeline.

If refit != True and some component of the pipeline doesn’t support forecasting with gap, this component will raise an exception.

Parameters
  • ts (etna.datasets.tsdataset.TSDataset) – Dataset to fit models in backtest

  • metrics (List[etna.metrics.base.Metric]) – List of metrics to compute for each fold

  • n_folds (Union[int, List[etna.pipeline.base.FoldMask]]) – Number of folds or the list of fold masks

  • mode (Optional[str]) – Train generation policy: ‘expand’ or ‘constant’. Works only if n_folds is integer. By default, is set to ‘expand’.

  • aggregate_metrics (bool) – If True aggregate metrics above folds, return raw metrics otherwise

  • n_jobs (int) – Number of jobs to run in parallel

  • refit (Union[bool, int]) –

    Determines how often pipeline should be retrained during iteration over folds.

    • If True: pipeline is retrained on each fold.

    • If False: pipeline is trained only on the first fold.

    • If value: int: pipeline is trained every value folds starting from the first.

  • stride (Optional[int]) – Number of points between folds. Works only if n_folds is integer. By default, is set to horizon.

  • joblib_params (Optional[Dict[str, Any]]) – Additional parameters for joblib.Parallel

  • forecast_params (Optional[Dict[str, Any]]) – Additional parameters for forecast()

Returns

metrics_df, forecast_df, fold_info_df – Metrics dataframe, forecast dataframe and dataframe with information about folds

Return type

Tuple[pd.DataFrame, pd.DataFrame, pd.DataFrame]

Raises
  • ValueError: – If mode is set when n_folds are List[FoldMask].

  • ValueError: – If stride is set when n_folds are List[FoldMask].

fit(ts: etna.datasets.tsdataset.TSDataset) etna.pipeline.hierarchical_pipeline.HierarchicalPipeline[source]

Fit the HierarchicalPipeline.

Fit and apply given transforms to the data, then fit the model on the transformed data. Provided hierarchical dataset will be aggregated to the source level before fitting pipeline.

Parameters

ts (etna.datasets.tsdataset.TSDataset) – Dataset with hierarchical timeseries data

Returns

Fitted HierarchicalPipeline instance

Return type

etna.pipeline.hierarchical_pipeline.HierarchicalPipeline

forecast(ts: Optional[etna.datasets.tsdataset.TSDataset] = None, prediction_interval: bool = False, quantiles: Sequence[float] = (0.025, 0.975), n_folds: int = 3, return_components: bool = False) etna.datasets.tsdataset.TSDataset[source]

Make a forecast of the next points of a dataset at a target level.

The result of forecasting starts from the last point of ts, not including it.

Method makes a prediction for target at the source level of hierarchy and then makes reconciliation to target level.

Parameters
  • ts (Optional[etna.datasets.tsdataset.TSDataset]) – Dataset to forecast. If not given, dataset given during :py:meth:fit is used.

  • prediction_interval (bool) – If True returns prediction interval for forecast

  • quantiles (Sequence[float]) – Levels of prediction distribution. By default 2.5% and 97.5% taken to form a 95% prediction interval

  • n_folds (int) – Number of folds to use in the backtest for prediction interval estimation

  • return_components (bool) – If True additionally returns forecast components

Returns

Dataset with predictions at the target level of hierarchy.

Return type

etna.datasets.tsdataset.TSDataset

classmethod load(path: pathlib.Path, ts: Optional[etna.datasets.tsdataset.TSDataset] = None) etna.pipeline.hierarchical_pipeline.HierarchicalPipeline[source]

Load an object.

Parameters
Returns

Loaded object.

Return type

etna.pipeline.hierarchical_pipeline.HierarchicalPipeline

params_to_tune() Dict[str, etna.distributions.distributions.BaseDistribution]

Get hyperparameter grid to tune.

Parameters for model has prefix “model.”, e.g. “model.alpha”.

Parameters for transforms has prefix “transforms.idx.”, e.g. “transforms.0.mode”.

Returns

Grid with parameters from model and transforms.

Return type

Dict[str, etna.distributions.distributions.BaseDistribution]

predict(ts: Optional[etna.datasets.tsdataset.TSDataset] = None, start_timestamp: Optional[pandas._libs.tslibs.timestamps.Timestamp] = None, end_timestamp: Optional[pandas._libs.tslibs.timestamps.Timestamp] = None, prediction_interval: bool = False, quantiles: Sequence[float] = (0.025, 0.975), return_components: bool = False) etna.datasets.tsdataset.TSDataset[source]

Make in-sample predictions on dataset at the target level in a given range.

Method makes a prediction for target at the source level of hierarchy and then makes reconciliation to the target level.

Currently, in situation when segments start with different timestamps we only guarantee to work with start_timestamp >= beginning of all segments.

Parameters
  • ts (Optional[etna.datasets.tsdataset.TSDataset]) – Dataset to make predictions on. If not given, dataset given during :py:meth:fit is used.

  • start_timestamp (Optional[pandas._libs.tslibs.timestamps.Timestamp]) – First timestamp of prediction range to return, should be >= than first timestamp in ts; expected that beginning of each segment <= start_timestamp; if isn’t set the first timestamp where each segment began is taken.

  • end_timestamp (Optional[pandas._libs.tslibs.timestamps.Timestamp]) – Last timestamp of prediction range to return; if isn’t set the last timestamp of ts is taken. Expected that value is less or equal to the last timestamp in ts.

  • prediction_interval (bool) – If True returns prediction interval for forecast.

  • quantiles (Sequence[float]) – Levels of prediction distribution. By default 2.5% and 97.5% taken to form a 95% prediction interval.

  • return_components (bool) – If True additionally returns forecast components.

Returns

Dataset with predictions at the target level in [start_timestamp, end_timestamp] range.

Return type

etna.datasets.tsdataset.TSDataset

raw_forecast(ts: etna.datasets.tsdataset.TSDataset, prediction_interval: bool = False, quantiles: Sequence[float] = (0.25, 0.75), n_folds: int = 3, return_components: bool = False) etna.datasets.tsdataset.TSDataset[source]

Make a forecast of the next points of a dataset at the source level.

The result of forecasting starts from the last point of ts, not including it.

Parameters
  • ts (etna.datasets.tsdataset.TSDataset) – Dataset to forecast

  • prediction_interval (bool) – If True returns prediction interval for forecast

  • quantiles (Sequence[float]) – Levels of prediction distribution. By default 2.5% and 97.5% taken to form a 95% prediction interval

  • n_folds (int) – Number of folds to use in the backtest for prediction interval estimation

  • return_components (bool) – If True additionally returns forecast components

Returns

Dataset with predictions at the source level

Return type

etna.datasets.tsdataset.TSDataset

raw_predict(ts: etna.datasets.tsdataset.TSDataset, start_timestamp: Optional[pandas._libs.tslibs.timestamps.Timestamp] = None, end_timestamp: Optional[pandas._libs.tslibs.timestamps.Timestamp] = None, prediction_interval: bool = False, quantiles: Sequence[float] = (0.025, 0.975), return_components: bool = False) etna.datasets.tsdataset.TSDataset[source]

Make in-sample predictions on dataset at the source level in a given range.

Parameters
  • ts (etna.datasets.tsdataset.TSDataset) – Dataset to make predictions on. If not given, dataset given during :py:meth:fit is used.

  • start_timestamp (Optional[pandas._libs.tslibs.timestamps.Timestamp]) – First timestamp of prediction range to return, should be >= than first timestamp in ts; expected that beginning of each segment <= start_timestamp; if isn’t set the first timestamp where each segment began is taken.

  • end_timestamp (Optional[pandas._libs.tslibs.timestamps.Timestamp]) – Last timestamp of prediction range to return; if isn’t set the last timestamp of ts is taken. Expected that value is less or equal to the last timestamp in ts.

  • prediction_interval (bool) – If True returns prediction interval for forecast.

  • quantiles (Sequence[float]) – Levels of prediction distribution. By default 2.5% and 97.5% taken to form a 95% prediction interval.

  • return_components (bool) – If True additionally returns forecast components.

Returns

Dataset with predictions at the source level in [start_timestamp, end_timestamp] range.

Return type

etna.datasets.tsdataset.TSDataset

save(path: pathlib.Path)[source]

Save the object.

Parameters

path (pathlib.Path) – Path to save object to.

set_params(**params: dict) etna.core.mixins.TMixin

Return new object instance with modified parameters.

Method also allows to change parameters of nested objects within the current object. For example, it is possible to change parameters of a model in a Pipeline.

Nested parameters are expected to be in a <component_1>.<...>.<parameter> form, where components are separated by a dot.

Parameters
  • **params – Estimator parameters

  • self (etna.core.mixins.TMixin) –

  • params (dict) –

Returns

New instance with changed parameters

Return type

etna.core.mixins.TMixin

Examples

>>> from etna.pipeline import Pipeline
>>> from etna.models import NaiveModel
>>> from etna.transforms import AddConstTransform
>>> model = model=NaiveModel(lag=1)
>>> transforms = [AddConstTransform(in_column="target", value=1)]
>>> pipeline = Pipeline(model, transforms=transforms, horizon=3)
>>> pipeline.set_params(**{"model.lag": 3, "transforms.0.value": 2})
Pipeline(model = NaiveModel(lag = 3, ), transforms = [AddConstTransform(in_column = 'target', value = 2, inplace = True, out_column = None, )], horizon = 3, )
to_dict()

Collect all information about etna object in dict.