DirectEnsemble

class DirectEnsemble(pipelines: List[etna.pipeline.base.BasePipeline], n_jobs: int = 1, joblib_params: Optional[Dict[str, Any]] = None)[source]

Bases: etna.ensembles.mixins.EnsembleMixin, etna.ensembles.mixins.SaveEnsembleMixin, etna.pipeline.base.BasePipeline

DirectEnsemble is a pipeline that forecasts future values merging the forecasts of base pipelines.

Ensemble expects several pipelines during init. These pipelines are expected to have different forecasting horizons. For each point in the future, forecast of the ensemble is forecast of base pipeline with the shortest horizon, which covers this point.

Examples

>>> from etna.datasets import generate_ar_df
>>> from etna.datasets import TSDataset
>>> from etna.ensembles import DirectEnsemble
>>> from etna.models import NaiveModel
>>> from etna.models import ProphetModel
>>> from etna.pipeline import Pipeline
>>> df = generate_ar_df(periods=30, start_time="2021-06-01", ar_coef=[1.2], n_segments=3)
>>> df_ts_format = TSDataset.to_dataset(df)
>>> ts = TSDataset(df_ts_format, "D")
>>> prophet_pipeline = Pipeline(model=ProphetModel(), transforms=[], horizon=3)
>>> naive_pipeline = Pipeline(model=NaiveModel(lag=10), transforms=[], horizon=5)
>>> ensemble = DirectEnsemble(pipelines=[prophet_pipeline, naive_pipeline])
>>> _ = ensemble.fit(ts=ts)
>>> forecast = ensemble.forecast()
>>> forecast
segment    segment_0 segment_1 segment_2
feature       target    target    target
timestamp
2021-07-01    -10.37   -232.60    163.16
2021-07-02    -10.59   -242.05    169.62
2021-07-03    -11.41   -253.82    177.62
2021-07-04     -5.85   -139.57     96.99
2021-07-05     -6.11   -167.69    116.59

Init DirectEnsemble.

Parameters
  • pipelines (List[etna.pipeline.base.BasePipeline]) – List of pipelines that should be used in ensemble

  • n_jobs (int) – Number of jobs to run in parallel

  • joblib_params (Optional[Dict[str, Any]]) – Additional parameters for joblib.Parallel

Raises

ValueError: – If two or more pipelines have the same horizons.

Inherited-members

Methods

backtest(ts, metrics[, n_folds, mode, ...])

Run backtest with the pipeline.

fit(ts)

Fit pipelines in ensemble.

forecast([ts, prediction_interval, ...])

Make a forecast of the next points of a dataset.

load(path[, ts])

Load an object.

params_to_tune()

Get hyperparameter grid to tune.

predict(ts[, start_timestamp, ...])

Make in-sample predictions on dataset in a given range.

save(path)

Save the object.

set_params(**params)

Return new object instance with modified parameters.

to_dict()

Collect all information about etna object in dict.

Attributes

backtest(ts: etna.datasets.tsdataset.TSDataset, metrics: List[etna.metrics.base.Metric], n_folds: Union[int, List[etna.pipeline.base.FoldMask]] = 5, mode: Optional[str] = None, aggregate_metrics: bool = False, n_jobs: int = 1, refit: Union[bool, int] = True, stride: Optional[int] = None, joblib_params: Optional[Dict[str, Any]] = None, forecast_params: Optional[Dict[str, Any]] = None) Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame]

Run backtest with the pipeline.

If refit != True and some component of the pipeline doesn’t support forecasting with gap, this component will raise an exception.

Parameters
  • ts (etna.datasets.tsdataset.TSDataset) – Dataset to fit models in backtest

  • metrics (List[etna.metrics.base.Metric]) – List of metrics to compute for each fold

  • n_folds (Union[int, List[etna.pipeline.base.FoldMask]]) – Number of folds or the list of fold masks

  • mode (Optional[str]) – Train generation policy: ‘expand’ or ‘constant’. Works only if n_folds is integer. By default, is set to ‘expand’.

  • aggregate_metrics (bool) – If True aggregate metrics above folds, return raw metrics otherwise

  • n_jobs (int) – Number of jobs to run in parallel

  • refit (Union[bool, int]) –

    Determines how often pipeline should be retrained during iteration over folds.

    • If True: pipeline is retrained on each fold.

    • If False: pipeline is trained only on the first fold.

    • If value: int: pipeline is trained every value folds starting from the first.

  • stride (Optional[int]) – Number of points between folds. Works only if n_folds is integer. By default, is set to horizon.

  • joblib_params (Optional[Dict[str, Any]]) – Additional parameters for joblib.Parallel

  • forecast_params (Optional[Dict[str, Any]]) – Additional parameters for forecast()

Returns

metrics_df, forecast_df, fold_info_df – Metrics dataframe, forecast dataframe and dataframe with information about folds

Return type

Tuple[pd.DataFrame, pd.DataFrame, pd.DataFrame]

Raises
  • ValueError: – If mode is set when n_folds are List[FoldMask].

  • ValueError: – If stride is set when n_folds are List[FoldMask].

fit(ts: etna.datasets.tsdataset.TSDataset) etna.ensembles.direct_ensemble.DirectEnsemble[source]

Fit pipelines in ensemble.

Parameters

ts (etna.datasets.tsdataset.TSDataset) – TSDataset to fit ensemble

Returns

Fitted ensemble

Return type

self

forecast(ts: Optional[etna.datasets.tsdataset.TSDataset] = None, prediction_interval: bool = False, quantiles: Sequence[float] = (0.025, 0.975), n_folds: int = 3, return_components: bool = False) etna.datasets.tsdataset.TSDataset

Make a forecast of the next points of a dataset.

The result of forecasting starts from the last point of ts, not including it.

Parameters
  • ts (Optional[etna.datasets.tsdataset.TSDataset]) – Dataset to forecast. If not given, dataset given during :py:meth:fit is used.

  • prediction_interval (bool) – If True returns prediction interval for forecast

  • quantiles (Sequence[float]) – Levels of prediction distribution. By default 2.5% and 97.5% taken to form a 95% prediction interval

  • n_folds (int) – Number of folds to use in the backtest for prediction interval estimation

  • return_components (bool) – If True additionally returns forecast components

Returns

Dataset with predictions

Raises

NotImplementedError: – Adding target components is not currently implemented

Return type

etna.datasets.tsdataset.TSDataset

classmethod load(path: pathlib.Path, ts: Optional[etna.datasets.tsdataset.TSDataset] = None) typing_extensions.Self

Load an object.

Warning

This method uses dill module which is not secure. It is possible to construct malicious data which will execute arbitrary code during loading. Never load data that could have come from an untrusted source, or that could have been tampered with.

Parameters
Returns

Loaded object.

Return type

typing_extensions.Self

params_to_tune() Dict[str, etna.distributions.distributions.BaseDistribution][source]

Get hyperparameter grid to tune.

Not implemented for this class.

Returns

Grid with hyperparameters.

Return type

Dict[str, etna.distributions.distributions.BaseDistribution]

predict(ts: etna.datasets.tsdataset.TSDataset, start_timestamp: Optional[pandas._libs.tslibs.timestamps.Timestamp] = None, end_timestamp: Optional[pandas._libs.tslibs.timestamps.Timestamp] = None, prediction_interval: bool = False, quantiles: Sequence[float] = (0.025, 0.975), return_components: bool = False) etna.datasets.tsdataset.TSDataset

Make in-sample predictions on dataset in a given range.

Currently, in situation when segments start with different timestamps we only guarantee to work with start_timestamp >= beginning of all segments.

Parameters
  • ts (etna.datasets.tsdataset.TSDataset) – Dataset to make predictions on.

  • start_timestamp (Optional[pandas._libs.tslibs.timestamps.Timestamp]) – First timestamp of prediction range to return, should be >= than first timestamp in ts; expected that beginning of each segment <= start_timestamp; if isn’t set the first timestamp where each segment began is taken.

  • end_timestamp (Optional[pandas._libs.tslibs.timestamps.Timestamp]) – Last timestamp of prediction range to return; if isn’t set the last timestamp of ts is taken. Expected that value is less or equal to the last timestamp in ts.

  • prediction_interval (bool) – If True returns prediction interval for forecast.

  • quantiles (Sequence[float]) – Levels of prediction distribution. By default 2.5% and 97.5% taken to form a 95% prediction interval.

  • return_components (bool) – If True additionally returns forecast components

Returns

Dataset with predictions in [start_timestamp, end_timestamp] range.

Raises
  • ValueError: – Value of end_timestamp is less than start_timestamp.

  • ValueError: – Value of start_timestamp goes before point where each segment started.

  • ValueError: – Value of end_timestamp goes after the last timestamp.

  • NotImplementedError: – Adding target components is not currently implemented

Return type

etna.datasets.tsdataset.TSDataset

save(path: pathlib.Path)

Save the object.

Parameters

path (pathlib.Path) – Path to save object to.

set_params(**params: dict) etna.core.mixins.TMixin

Return new object instance with modified parameters.

Method also allows to change parameters of nested objects within the current object. For example, it is possible to change parameters of a model in a Pipeline.

Nested parameters are expected to be in a <component_1>.<...>.<parameter> form, where components are separated by a dot.

Parameters
  • **params – Estimator parameters

  • self (etna.core.mixins.TMixin) –

  • params (dict) –

Returns

New instance with changed parameters

Return type

etna.core.mixins.TMixin

Examples

>>> from etna.pipeline import Pipeline
>>> from etna.models import NaiveModel
>>> from etna.transforms import AddConstTransform
>>> model = model=NaiveModel(lag=1)
>>> transforms = [AddConstTransform(in_column="target", value=1)]
>>> pipeline = Pipeline(model, transforms=transforms, horizon=3)
>>> pipeline.set_params(**{"model.lag": 3, "transforms.0.value": 2})
Pipeline(model = NaiveModel(lag = 3, ), transforms = [AddConstTransform(in_column = 'target', value = 2, inplace = True, out_column = None, )], horizon = 3, )
to_dict()

Collect all information about etna object in dict.