TSDataset

class TSDataset(df: pandas.core.frame.DataFrame, freq: str, df_exog: Optional[pandas.core.frame.DataFrame] = None, known_future: Union[Literal['all'], Sequence] = (), hierarchical_structure: Optional[etna.datasets.hierarchical_structure.HierarchicalStructure] = None)[source]

Bases: object

TSDataset is the main class to handle your time series data. It prepares the series for exploration analyzing, implements feature generation with Transforms and generation of future points.

Notes

TSDataset supports custom indexing and slicing method. It maybe done through these interface: TSDataset[timestamp, segment, column] If at the start of the period dataset contains NaN those timestamps will be removed.

During creation segment is casted to string type.

Examples

>>> from etna.datasets import generate_const_df
>>> df = generate_const_df(periods=30, start_time="2021-06-01", n_segments=2, scale=1)
>>> df_ts_format = TSDataset.to_dataset(df)
>>> ts = TSDataset(df_ts_format, "D")
>>> ts["2021-06-01":"2021-06-07", "segment_0", "target"]
timestamp
2021-06-01    1.0
2021-06-02    1.0
2021-06-03    1.0
2021-06-04    1.0
2021-06-05    1.0
2021-06-06    1.0
2021-06-07    1.0
Freq: D, Name: (segment_0, target), dtype: float64
>>> from etna.datasets import generate_ar_df
>>> pd.options.display.float_format = '{:,.2f}'.format
>>> df_to_forecast = generate_ar_df(100, start_time="2021-01-01", n_segments=1)
>>> df_regressors = generate_ar_df(120, start_time="2021-01-01", n_segments=5)
>>> df_regressors = df_regressors.pivot(index="timestamp", columns="segment").reset_index()
>>> df_regressors.columns = ["timestamp"] + [f"regressor_{i}" for i in range(5)]
>>> df_regressors["segment"] = "segment_0"
>>> df_to_forecast = TSDataset.to_dataset(df_to_forecast)
>>> df_regressors = TSDataset.to_dataset(df_regressors)
>>> tsdataset = TSDataset(df=df_to_forecast, freq="D", df_exog=df_regressors, known_future="all")
>>> tsdataset.df.head(5)
segment      segment_0
feature    regressor_0 regressor_1 regressor_2 regressor_3 regressor_4 target
timestamp
2021-01-01        1.62       -0.02       -0.50       -0.56        0.52   1.62
2021-01-02        1.01       -0.80       -0.81        0.38       -0.60   1.01
2021-01-03        0.48        0.47       -0.81       -1.56       -1.37   0.48
2021-01-04       -0.59        2.44       -2.21       -1.21       -0.69  -0.59
2021-01-05        0.28        0.58       -3.07       -1.45        0.77   0.28
>>> from etna.datasets import generate_hierarchical_df
>>> pd.options.display.width = 0
>>> df = generate_hierarchical_df(periods=100, n_segments=[2, 4], start_time="2021-01-01",)
>>> df, hierarchical_structure = TSDataset.to_hierarchical_dataset(df=df, level_columns=["level_0", "level_1"])
>>> tsdataset = TSDataset(df=df, freq="D", hierarchical_structure=hierarchical_structure)
>>> tsdataset.df.head(5)
segment    l0s0_l1s3 l0s1_l1s0 l0s1_l1s1 l0s1_l1s2
feature       target    target    target    target
timestamp
2021-01-01      2.07      1.62     -0.45     -0.40
2021-01-02      0.59      1.01      0.78      0.42
2021-01-03     -0.24      0.48      1.18     -0.14
2021-01-04     -1.12     -0.59      1.77      1.82
2021-01-05     -1.40      0.28      0.68      0.48

Init TSDataset.

Parameters
  • df (pandas.core.frame.DataFrame) – dataframe with timeseries

  • freq (str) – frequency of timestamp in df

  • df_exog (Optional[pandas.core.frame.DataFrame]) – dataframe with exogenous data;

  • known_future (Union[Literal['all'], typing.Sequence]) – columns in df_exog[known_future] that are regressors, if “all” value is given, all columns are meant to be regressors

  • hierarchical_structure (Optional[etna.datasets.hierarchical_structure.HierarchicalStructure]) – Structure of the levels in the hierarchy. If None, there is no hierarchical structure in the dataset.

Inherited-members

Methods

add_columns_from_pandas(df_update[, ...])

Update the dataset with the new columns from pandas dataframe.

add_target_components(target_components_df)

Add target components into dataset.

describe([segments])

Overview of the dataset that returns a DataFrame.

drop_features(features[, drop_from_exog])

Drop columns with features from the dataset.

drop_target_components()

Drop target components from dataset.

fit_transform(transforms)

Fit and apply given transforms to the data.

get_level_dataset(target_level)

Generate new TSDataset on target level.

get_target_components()

Get DataFrame with target components.

has_hierarchy()

Check whether dataset has hierarchical structure.

head([n_rows])

Return the first n_rows rows.

info([segments])

Overview of the dataset that prints the result.

inverse_transform(transforms)

Apply inverse transform method of transforms to the data.

isnull()

Return dataframe with flag that means if the correspondent object in self.df is null.

level_names()

Return names of the levels in the hierarchical structure.

make_future(future_steps[, transforms, ...])

Return new TSDataset with features extended into the future.

plot([n_segments, column, segments, start, ...])

Plot of random or chosen segments.

tail([n_rows])

Return the last n_rows rows.

to_dataset(df)

Convert pandas dataframe to ETNA Dataset format.

to_flatten(df[, features])

Return pandas DataFrame with flatten index.

to_hierarchical_dataset(df, level_columns[, ...])

Convert pandas dataframe from long hierarchical to ETNA Dataset format.

to_pandas([flatten, features])

Return pandas DataFrame.

to_torch_dataset(make_samples[, dropna])

Convert the TSDataset to a torch.Dataset.

train_test_split([train_start, train_end, ...])

Split given df with train-test timestamp indices or size of test set.

transform(transforms)

Apply given transform to the data.

tsdataset_idx_slice([start_idx, end_idx])

Return new TSDataset with integer-location based indexing.

update_columns_from_pandas(df_update)

Update the existing columns in the dataset with the new values from pandas dataframe.

Attributes

columns

Return columns of self.df.

idx

index

Return TSDataset timestamp index.

loc

Return self.df.loc method.

regressors

Get list of all regressors across all segments in dataset.

segments

Get list of all segments in dataset.

target_components_names

Get tuple with target components names.

target_quantiles_names

Get tuple with target quantiles names.

add_columns_from_pandas(df_update: pandas.core.frame.DataFrame, update_exog: bool = False, regressors: Optional[List[str]] = None)[source]

Update the dataset with the new columns from pandas dataframe.

Before updating columns in df, columns of df_update will be cropped by the last timestamp in df.

Parameters
  • df_update (pandas.core.frame.DataFrame) – Dataframe with the new columns in wide ETNA format.

  • update_exog (bool) – If True, update columns also in df_exog. If you wish to add new regressors in the dataset it is recommended to turn on this flag.

  • regressors (Optional[List[str]]) – List of regressors in the passed dataframe.

add_target_components(target_components_df: pandas.core.frame.DataFrame)[source]

Add target components into dataset.

Parameters

target_components_df (pandas.core.frame.DataFrame) – Dataframe in etna wide format with target components

Raises
  • ValueError: – If dataset already contains target components

  • ValueError: – If target components names differs between segments

  • ValueError: – If components don’t sum up to target

describe(segments: Optional[Sequence[str]] = None) pandas.core.frame.DataFrame[source]

Overview of the dataset that returns a DataFrame.

Method describes dataset in segment-wise fashion. Description columns:

  • start_timestamp: beginning of the segment, missing values in the beginning are ignored

  • end_timestamp: ending of the segment, missing values in the ending are ignored

  • length: length according to start_timestamp and end_timestamp

  • num_missing: number of missing variables between start_timestamp and end_timestamp

  • num_segments: total number of segments, common for all segments

  • num_exogs: number of exogenous features, common for all segments

  • num_regressors: number of exogenous factors, that are regressors, common for all segments

  • num_known_future: number of regressors, that are known since creation, common for all segments

  • freq: frequency of the series, common for all segments

Parameters

segments (Optional[Sequence[str]]) – segments to show in overview, if None all segments are shown.

Returns

result_table – table with results of the overview

Return type

pd.DataFrame

Examples

>>> from etna.datasets import generate_const_df
>>> pd.options.display.expand_frame_repr = False
>>> df = generate_const_df(
...    periods=30, start_time="2021-06-01",
...    n_segments=2, scale=1
... )
>>> df_ts_format = TSDataset.to_dataset(df)
>>> regressors_timestamp = pd.date_range(start="2021-06-01", periods=50)
>>> df_regressors_1 = pd.DataFrame(
...     {"timestamp": regressors_timestamp, "regressor_1": 1, "segment": "segment_0"}
... )
>>> df_regressors_2 = pd.DataFrame(
...     {"timestamp": regressors_timestamp, "regressor_1": 2, "segment": "segment_1"}
... )
>>> df_exog = pd.concat([df_regressors_1, df_regressors_2], ignore_index=True)
>>> df_exog_ts_format = TSDataset.to_dataset(df_exog)
>>> ts = TSDataset(df_ts_format, df_exog=df_exog_ts_format, freq="D", known_future="all")
>>> ts.describe()
          start_timestamp end_timestamp  length  num_missing  num_segments  num_exogs  num_regressors  num_known_future freq
segments
segment_0      2021-06-01    2021-06-30      30            0             2          1               1                 1    D
segment_1      2021-06-01    2021-06-30      30            0             2          1               1                 1    D
drop_features(features: List[str], drop_from_exog: bool = False)[source]

Drop columns with features from the dataset.

Parameters
  • features (List[str]) – List of features to drop.

  • drop_from_exog (bool) –

    • If False, drop features only from df. Features will appear again in df after make_future.

    • If True, drop features from df and df_exog. Features won’t appear in df after make_future.

Raises

ValueError: – If features list contains target components

drop_target_components()[source]

Drop target components from dataset.

fit_transform(transforms: Sequence[Transform])[source]

Fit and apply given transforms to the data.

Parameters

transforms (Sequence[Transform]) –

get_level_dataset(target_level: str) etna.datasets.tsdataset.TSDataset[source]

Generate new TSDataset on target level.

Parameters

target_level (str) – target level name

Returns

generated dataset

Return type

TSDataset

get_target_components() Optional[pandas.core.frame.DataFrame][source]

Get DataFrame with target components.

Returns

Dataframe with target components

Return type

Optional[pandas.core.frame.DataFrame]

has_hierarchy() bool[source]

Check whether dataset has hierarchical structure.

Return type

bool

head(n_rows: int = 5) pandas.core.frame.DataFrame[source]

Return the first n_rows rows.

Mimics pandas method.

This function returns the first n_rows rows for the object based on position. It is useful for quickly testing if your object has the right type of data in it.

For negative values of n_rows, this function returns all rows except the last n_rows rows, equivalent to df[:-n_rows].

Parameters

n_rows (int) – number of rows to select.

Returns

the first n_rows rows or 5 by default.

Return type

pd.DataFrame

info(segments: Optional[Sequence[str]] = None) None[source]

Overview of the dataset that prints the result.

Method describes dataset in segment-wise fashion.

Information about dataset in general:

  • num_segments: total number of segments

  • num_exogs: number of exogenous features

  • num_regressors: number of exogenous factors, that are regressors

  • num_known_future: number of regressors, that are known since creation

  • freq: frequency of the dataset

Information about individual segments:

  • start_timestamp: beginning of the segment, missing values in the beginning are ignored

  • end_timestamp: ending of the segment, missing values in the ending are ignored

  • length: length according to start_timestamp and end_timestamp

  • num_missing: number of missing variables between start_timestamp and end_timestamp

Parameters

segments (Optional[Sequence[str]]) – segments to show in overview, if None all segments are shown.

Return type

None

Examples

>>> from etna.datasets import generate_const_df
>>> df = generate_const_df(
...    periods=30, start_time="2021-06-01",
...    n_segments=2, scale=1
... )
>>> df_ts_format = TSDataset.to_dataset(df)
>>> regressors_timestamp = pd.date_range(start="2021-06-01", periods=50)
>>> df_regressors_1 = pd.DataFrame(
...     {"timestamp": regressors_timestamp, "regressor_1": 1, "segment": "segment_0"}
... )
>>> df_regressors_2 = pd.DataFrame(
...     {"timestamp": regressors_timestamp, "regressor_1": 2, "segment": "segment_1"}
... )
>>> df_exog = pd.concat([df_regressors_1, df_regressors_2], ignore_index=True)
>>> df_exog_ts_format = TSDataset.to_dataset(df_exog)
>>> ts = TSDataset(df_ts_format, df_exog=df_exog_ts_format, freq="D", known_future="all")
>>> ts.info()
<class 'etna.datasets.TSDataset'>
num_segments: 2
num_exogs: 1
num_regressors: 1
num_known_future: 1
freq: D
          start_timestamp end_timestamp  length  num_missing
segments
segment_0      2021-06-01    2021-06-30      30            0
segment_1      2021-06-01    2021-06-30      30            0
inverse_transform(transforms: Sequence[Transform])[source]

Apply inverse transform method of transforms to the data.

Applied in reversed order.

Parameters

transforms (Sequence[Transform]) –

isnull() pandas.core.frame.DataFrame[source]

Return dataframe with flag that means if the correspondent object in self.df is null.

Returns

is_null dataframe

Return type

pd.Dataframe

level_names() Optional[List[str]][source]

Return names of the levels in the hierarchical structure.

Return type

Optional[List[str]]

make_future(future_steps: int, transforms: Sequence[Transform] = (), tail_steps: int = 0) TSDataset[source]

Return new TSDataset with features extended into the future.

The result dataset doesn’t contain quantiles and target components.

Parameters
  • future_steps (int) – number of steps to extend dataset into the future.

  • transforms (Sequence[Transform]) – sequence of transforms to be applied.

  • tail_steps (int) – number of steps to keep from the tail of the original dataset.

Returns

dataset with features extended into the.

Return type

TSDataset

Examples

>>> from etna.datasets import generate_const_df
>>> df = generate_const_df(
...    periods=30, start_time="2021-06-01",
...    n_segments=2, scale=1
... )
>>> df_regressors = pd.DataFrame({
...     "timestamp": list(pd.date_range("2021-06-01", periods=40))*2,
...     "regressor_1": np.arange(80), "regressor_2": np.arange(80) + 5,
...     "segment": ["segment_0"]*40 + ["segment_1"]*40
... })
>>> df_ts_format = TSDataset.to_dataset(df)
>>> df_regressors_ts_format = TSDataset.to_dataset(df_regressors)
>>> ts = TSDataset(
...     df_ts_format, "D", df_exog=df_regressors_ts_format, known_future="all"
... )
>>> ts.make_future(4)
segment      segment_0                      segment_1
feature    regressor_1 regressor_2 target regressor_1 regressor_2 target
timestamp
2021-07-01          30          35    NaN          70          75    NaN
2021-07-02          31          36    NaN          71          76    NaN
2021-07-03          32          37    NaN          72          77    NaN
2021-07-04          33          38    NaN          73          78    NaN
plot(n_segments: int = 10, column: str = 'target', segments: Optional[Sequence[str]] = None, start: Optional[str] = None, end: Optional[str] = None, seed: int = 1, figsize: Tuple[int, int] = (10, 5))[source]

Plot of random or chosen segments.

Parameters
  • n_segments (int) – number of random segments to plot

  • column (str) – feature to plot

  • segments (Optional[Sequence[str]]) – segments to plot

  • seed (int) – seed for local random state

  • start (Optional[str]) – start plot from this timestamp

  • end (Optional[str]) – end plot at this timestamp

  • figsize (Tuple[int, int]) – size of the figure per subplot with one segment in inches

tail(n_rows: int = 5) pandas.core.frame.DataFrame[source]

Return the last n_rows rows.

Mimics pandas method.

This function returns last n_rows rows from the object based on position. It is useful for quickly verifying data, for example, after sorting or appending rows.

For negative values of n_rows, this function returns all rows except the first n rows, equivalent to df[n_rows:].

Parameters

n_rows (int) – number of rows to select.

Returns

the last n_rows rows or 5 by default.

Return type

pd.DataFrame

static to_dataset(df: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame[source]

Convert pandas dataframe to ETNA Dataset format.

Columns “timestamp” and “segment” are required.

Parameters

df (pandas.core.frame.DataFrame) – DataFrame with columns [“timestamp”, “segment”]. Other columns considered features.

Return type

pandas.core.frame.DataFrame

Notes

During conversion segment is casted to string type.

Examples

>>> from etna.datasets import generate_const_df
>>> df = generate_const_df(
...    periods=30, start_time="2021-06-01",
...    n_segments=2, scale=1
... )
>>> df.head(5)
   timestamp    segment  target
0 2021-06-01  segment_0    1.00
1 2021-06-02  segment_0    1.00
2 2021-06-03  segment_0    1.00
3 2021-06-04  segment_0    1.00
4 2021-06-05  segment_0    1.00
>>> df_ts_format = TSDataset.to_dataset(df)
>>> df_ts_format.head(5)
segment    segment_0 segment_1
feature       target    target
timestamp
2021-06-01      1.00      1.00
2021-06-02      1.00      1.00
2021-06-03      1.00      1.00
2021-06-04      1.00      1.00
2021-06-05      1.00      1.00
>>> df_regressors = pd.DataFrame({
...     "timestamp": pd.date_range("2021-01-01", periods=10),
...     "regressor_1": np.arange(10), "regressor_2": np.arange(10) + 5,
...     "segment": ["segment_0"]*10
... })
>>> TSDataset.to_dataset(df_regressors).head(5)
segment      segment_0
feature    regressor_1 regressor_2
timestamp
2021-01-01           0           5
2021-01-02           1           6
2021-01-03           2           7
2021-01-04           3           8
2021-01-05           4           9
static to_flatten(df: pandas.core.frame.DataFrame, features: Union[Literal['all'], Sequence[str]] = 'all') pandas.core.frame.DataFrame[source]

Return pandas DataFrame with flatten index.

The order of columns is (timestamp, segment, target, features in alphabetical order).

Parameters
  • df (pandas.core.frame.DataFrame) – DataFrame in ETNA format.

  • features (Union[Literal['all'], typing.Sequence[str]]) – List of features to return. If “all”, return all the features in the dataset. Always return columns with timestamp and segemnt.

Returns

dataframe with TSDataset data

Return type

pd.DataFrame

Examples

>>> from etna.datasets import generate_const_df
>>> df = generate_const_df(
...    periods=30, start_time="2021-06-01",
...    n_segments=2, scale=1
... )
>>> df.head(5)
    timestamp    segment  target
0  2021-06-01  segment_0    1.00
1  2021-06-02  segment_0    1.00
2  2021-06-03  segment_0    1.00
3  2021-06-04  segment_0    1.00
4  2021-06-05  segment_0    1.00
>>> df_ts_format = TSDataset.to_dataset(df)
>>> TSDataset.to_flatten(df_ts_format).head(5)
   timestamp    segment  target
0 2021-06-01  segment_0    1.0
1 2021-06-02  segment_0    1.0
2 2021-06-03  segment_0    1.0
3 2021-06-04  segment_0    1.0
4 2021-06-05  segment_0    1.0
static to_hierarchical_dataset(df: pandas.core.frame.DataFrame, level_columns: List[str], keep_level_columns: bool = False, sep: str = '_', return_hierarchy: bool = True) Tuple[pandas.core.frame.DataFrame, Optional[etna.datasets.hierarchical_structure.HierarchicalStructure]][source]

Convert pandas dataframe from long hierarchical to ETNA Dataset format.

Parameters
  • df (pandas.core.frame.DataFrame) – Dataframe in long hierarchical format with columns [timestamp, target] + [level_columns] + [other_columns]

  • level_columns (List[str]) – Columns of dataframe defines the levels in the hierarchy in order from top to bottom i.e [level_name_1, level_name_2, …]. Names of the columns will be used as names of the levels in hierarchy.

  • keep_level_columns (bool) – If true, leave the level columns in the result dataframe. By default level columns are concatenated into “segment” column and dropped

  • sep (str) – String to concatenated the level names with

  • return_hierarchy (bool) – If true, returns the hierarchical structure

Returns

Dataframe in wide format and optionally hierarchical structure

Raises

ValueError – If level_columns is empty

Return type

Tuple[pandas.core.frame.DataFrame, Optional[etna.datasets.hierarchical_structure.HierarchicalStructure]]

to_pandas(flatten: bool = False, features: Union[Literal['all'], Sequence[str]] = 'all') pandas.core.frame.DataFrame[source]

Return pandas DataFrame.

Parameters
  • flatten (bool) –

    • If False, return pd.DataFrame with multiindex

    • If True, return with flatten index,

    its order of columns is (timestamp, segment, target, features in alphabetical order).

  • features (Union[Literal['all'], typing.Sequence[str]]) – List of features to return. If “all”, return all the features in the dataset.

Returns

dataframe with TSDataset data

Return type

pd.DataFrame

Examples

>>> from etna.datasets import generate_const_df
>>> df = generate_const_df(
...    periods=30, start_time="2021-06-01",
...    n_segments=2, scale=1
... )
>>> df.head(5)
    timestamp    segment  target
0  2021-06-01  segment_0    1.00
1  2021-06-02  segment_0    1.00
2  2021-06-03  segment_0    1.00
3  2021-06-04  segment_0    1.00
4  2021-06-05  segment_0    1.00
>>> df_ts_format = TSDataset.to_dataset(df)
>>> ts = TSDataset(df_ts_format, "D")
>>> ts.to_pandas(True).head(5)
    timestamp    segment  target
0  2021-06-01  segment_0    1.00
1  2021-06-02  segment_0    1.00
2  2021-06-03  segment_0    1.00
3  2021-06-04  segment_0    1.00
4  2021-06-05  segment_0    1.00
>>> ts.to_pandas(False).head(5)
segment    segment_0 segment_1
feature       target    target
timestamp
2021-06-01      1.00      1.00
2021-06-02      1.00      1.00
2021-06-03      1.00      1.00
2021-06-04      1.00      1.00
2021-06-05      1.00      1.00
to_torch_dataset(make_samples: Callable[[pandas.core.frame.DataFrame], Union[Iterator[dict], Iterable[dict]]], dropna: bool = True) torch.utils.data.dataset.Dataset[source]

Convert the TSDataset to a torch.Dataset.

Parameters
  • make_samples (Callable[[pandas.core.frame.DataFrame], Union[Iterator[dict], Iterable[dict]]]) – function that takes per segment DataFrame and returns iterabale of samples

  • dropna (bool) – if True, missing rows are dropped

Returns

torch.Dataset with with train or test samples to infer on

Return type

torch.utils.data.dataset.Dataset

train_test_split(train_start: Optional[Union[str, pandas._libs.tslibs.timestamps.Timestamp]] = None, train_end: Optional[Union[str, pandas._libs.tslibs.timestamps.Timestamp]] = None, test_start: Optional[Union[str, pandas._libs.tslibs.timestamps.Timestamp]] = None, test_end: Optional[Union[str, pandas._libs.tslibs.timestamps.Timestamp]] = None, test_size: Optional[int] = None) Tuple[etna.datasets.tsdataset.TSDataset, etna.datasets.tsdataset.TSDataset][source]

Split given df with train-test timestamp indices or size of test set.

In case of inconsistencies between test_size and (test_start, test_end), test_size is ignored

Parameters
  • train_start (Optional[Union[str, pandas._libs.tslibs.timestamps.Timestamp]]) – start timestamp of new train dataset, if None first timestamp is used

  • train_end (Optional[Union[str, pandas._libs.tslibs.timestamps.Timestamp]]) – end timestamp of new train dataset, if None previous to test_start timestamp is used

  • test_start (Optional[Union[str, pandas._libs.tslibs.timestamps.Timestamp]]) – start timestamp of new test dataset, if None next to train_end timestamp is used

  • test_end (Optional[Union[str, pandas._libs.tslibs.timestamps.Timestamp]]) – end timestamp of new test dataset, if None last timestamp is used

  • test_size (Optional[int]) – number of timestamps to use in test set

Returns

generated datasets

Return type

train, test

Examples

>>> from etna.datasets import generate_ar_df
>>> pd.options.display.float_format = '{:,.2f}'.format
>>> df = generate_ar_df(100, start_time="2021-01-01", n_segments=3)
>>> df = TSDataset.to_dataset(df)
>>> ts = TSDataset(df, "D")
>>> train_ts, test_ts = ts.train_test_split(
...     train_start="2021-01-01", train_end="2021-02-01",
...     test_start="2021-02-02", test_end="2021-02-07"
... )
>>> train_ts.df.tail(5)
segment    segment_0 segment_1 segment_2
feature       target    target    target
timestamp
2021-01-28     -2.06      2.03      1.51
2021-01-29     -2.33      0.83      0.81
2021-01-30     -1.80      1.69      0.61
2021-01-31     -2.49      1.51      0.85
2021-02-01     -2.89      0.91      1.06
>>> test_ts.df.head(5)
segment    segment_0 segment_1 segment_2
feature       target    target    target
timestamp
2021-02-02     -3.57     -0.32      1.72
2021-02-03     -4.42      0.23      3.51
2021-02-04     -5.09      1.02      3.39
2021-02-05     -5.10      0.40      2.15
2021-02-06     -6.22      0.92      0.97
transform(transforms: Sequence[Transform])[source]

Apply given transform to the data.

Parameters

transforms (Sequence[Transform]) –

tsdataset_idx_slice(start_idx: Optional[int] = None, end_idx: Optional[int] = None) etna.datasets.tsdataset.TSDataset[source]

Return new TSDataset with integer-location based indexing.

Parameters
  • start_idx (Optional[int]) – starting index of the slice.

  • end_idx (Optional[int]) – last index of the slice.

Returns

TSDataset based on indexing slice.

Return type

etna.datasets.tsdataset.TSDataset

update_columns_from_pandas(df_update: pandas.core.frame.DataFrame)[source]

Update the existing columns in the dataset with the new values from pandas dataframe.

Before updating columns in df, columns of df_update will be cropped by the last timestamp in df. Columns in df_exog are not updated. If you wish to update the df_exog, create the new instance of TSDataset.

Parameters

df_update (pandas.core.frame.DataFrame) – Dataframe with new values in wide ETNA format.

property columns: pandas.core.indexes.multi.MultiIndex

Return columns of self.df.

Returns

multiindex of dataframe with target and features.

Return type

pd.core.indexes.multi.MultiIndex

property index: pandas.core.indexes.datetimes.DatetimeIndex

Return TSDataset timestamp index.

Returns

timestamp index of TSDataset

Return type

pd.core.indexes.datetimes.DatetimeIndex

property loc: pandas.core.indexing._LocIndexer

Return self.df.loc method.

Returns

dataframe with self.df.loc[…]

Return type

pd.core.indexing._LocIndexer

property regressors: List[str]

Get list of all regressors across all segments in dataset.

Examples

>>> from etna.datasets import generate_const_df
>>> df = generate_const_df(
...    periods=30, start_time="2021-06-01",
...    n_segments=2, scale=1
... )
>>> df_ts_format = TSDataset.to_dataset(df)
>>> regressors_timestamp = pd.date_range(start="2021-06-01", periods=50)
>>> df_regressors_1 = pd.DataFrame(
...     {"timestamp": regressors_timestamp, "regressor_1": 1, "segment": "segment_0"}
... )
>>> df_regressors_2 = pd.DataFrame(
...     {"timestamp": regressors_timestamp, "regressor_1": 2, "segment": "segment_1"}
... )
>>> df_exog = pd.concat([df_regressors_1, df_regressors_2], ignore_index=True)
>>> df_exog_ts_format = TSDataset.to_dataset(df_exog)
>>> ts = TSDataset(
...     df_ts_format, df_exog=df_exog_ts_format, freq="D", known_future="all"
... )
>>> ts.regressors
['regressor_1']
property segments: List[str]

Get list of all segments in dataset.

Examples

>>> from etna.datasets import generate_const_df
>>> df = generate_const_df(
...    periods=30, start_time="2021-06-01",
...    n_segments=2, scale=1
... )
>>> df_ts_format = TSDataset.to_dataset(df)
>>> ts = TSDataset(df_ts_format, "D")
>>> ts.segments
['segment_0', 'segment_1']
property target_components_names: Tuple[str, ...]

Get tuple with target components names. Components sum up to target. Return the empty tuple in case of components absence.

property target_quantiles_names: Tuple[str, ...]

Get tuple with target quantiles names. Return the empty tuple in case of quantile absence.