utils

Functions

duplicate_data(df, segments[, format])

Duplicate dataframe for all the segments.

get_level_dataframe(df, mapping_matrix, ...)

Perform mapping to dataframe at the target level.

get_target_with_quantiles(columns)

Find "target" column and target quantiles among dataframe columns.

inverse_transform_target_components(...)

Inverse transform target components.

match_target_components(features)

Find target components in a set of features.

match_target_quantiles(features)

Find quantiles in dataframe columns.

set_columns_wide(df_left, df_right[, ...])

Set columns in a left dataframe with values from the right dataframe.

Classes

DataFrameFormat(value)

Enum for different types of result.

_TorchDataset(ts_samples)

In memory dataset for torch dataloader.

class DataFrameFormat(value)[source]

Enum for different types of result.

duplicate_data(df: pandas.core.frame.DataFrame, segments: Sequence[str], format: str = DataFrameFormat.wide) pandas.core.frame.DataFrame[source]

Duplicate dataframe for all the segments.

Parameters
  • df (pandas.core.frame.DataFrame) – dataframe to duplicate, there should be column “timestamp”

  • segments (Sequence[str]) – list of segments for making duplication

  • format (str) – represent the result in TSDataset inner format (wide) or in flatten format (long)

Returns

result – result of duplication for all the segments

Return type

pd.DataFrame

Raises
  • ValueError: – if segments list is empty

  • ValueError: – if incorrect strategy is given

  • ValueError: – if dataframe doesn’t contain “timestamp” column

Examples

>>> from etna.datasets import generate_const_df
>>> from etna.datasets import duplicate_data
>>> from etna.datasets import TSDataset
>>> df = generate_const_df(
...    periods=50, start_time="2020-03-10",
...    n_segments=2, scale=1
... )
>>> timestamp = pd.date_range("2020-03-10", periods=100, freq="D")
>>> is_friday_13 = (timestamp.weekday == 4) & (timestamp.day == 13)
>>> df_exog_raw = pd.DataFrame({"timestamp": timestamp, "is_friday_13": is_friday_13})
>>> df_exog = duplicate_data(df_exog_raw, segments=["segment_0", "segment_1"], format="wide")
>>> df_ts_format = TSDataset.to_dataset(df)
>>> ts = TSDataset(df=df_ts_format, df_exog=df_exog, freq="D", known_future="all")
>>> ts.head()
segment       segment_0           segment_1
feature    is_friday_13 target is_friday_13 target
timestamp
2020-03-10        False   1.00        False   1.00
2020-03-11        False   1.00        False   1.00
2020-03-12        False   1.00        False   1.00
2020-03-13         True   1.00         True   1.00
2020-03-14        False   1.00        False   1.00
get_level_dataframe(df: pandas.core.frame.DataFrame, mapping_matrix: scipy.sparse._csr.csr_matrix, source_level_segments: List[str], target_level_segments: List[str])[source]

Perform mapping to dataframe at the target level.

Parameters
  • df (pandas.core.frame.DataFrame) – dataframe at the source level

  • mapping_matrix (scipy.sparse._csr.csr_matrix) – mapping matrix between levels

  • source_level_segments (List[str]) – list of segments at the source level, set the order of segments matching the mapping matrix

  • target_level_segments (List[str]) – list of segments at the target level

Returns

dataframe at the target level

get_target_with_quantiles(columns: pandas.core.indexes.base.Index) Set[str][source]

Find “target” column and target quantiles among dataframe columns.

Parameters

columns (pandas.core.indexes.base.Index) –

Return type

Set[str]

inverse_transform_target_components(target_components_df: pandas.core.frame.DataFrame, target_df: pandas.core.frame.DataFrame, inverse_transformed_target_df: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame[source]

Inverse transform target components.

Parameters
  • target_components_df (pandas.core.frame.DataFrame) – Dataframe with target components

  • target_df (pandas.core.frame.DataFrame) – Dataframe with transformed target

  • inverse_transformed_target_df (pandas.core.frame.DataFrame) – Dataframe with inverse_transformed target

Returns

Dataframe with inverse transformed target components

Return type

pandas.core.frame.DataFrame

match_target_components(features: Set[str]) Set[str][source]

Find target components in a set of features.

Parameters

features (Set[str]) –

Return type

Set[str]

match_target_quantiles(features: Set[str]) Set[str][source]

Find quantiles in dataframe columns.

Parameters

features (Set[str]) –

Return type

Set[str]

set_columns_wide(df_left: pandas.core.frame.DataFrame, df_right: pandas.core.frame.DataFrame, timestamps_left: Optional[Sequence[pandas._libs.tslibs.timestamps.Timestamp]] = None, timestamps_right: Optional[Sequence[pandas._libs.tslibs.timestamps.Timestamp]] = None, segments_left: Optional[Sequence[str]] = None, features_right: Optional[Sequence[str]] = None, features_left: Optional[Sequence[str]] = None, segments_right: Optional[Sequence[str]] = None) pandas.core.frame.DataFrame[source]

Set columns in a left dataframe with values from the right dataframe.

Parameters
  • df_left (pandas.core.frame.DataFrame) – dataframe to set columns in

  • df_right (pandas.core.frame.DataFrame) – dataframe to set columns from

  • timestamps_left (Optional[Sequence[pandas._libs.tslibs.timestamps.Timestamp]]) – timestamps to select in df_left

  • timestamps_right (Optional[Sequence[pandas._libs.tslibs.timestamps.Timestamp]]) – timestamps to select in df_right

  • segments_left (Optional[Sequence[str]]) – segments to select in df_left

  • segments_right (Optional[Sequence[str]]) – segments to select in df_right

  • features_left (Optional[Sequence[str]]) – features to select in df_left

  • features_right (Optional[Sequence[str]]) – features to select in df_right

Returns

a new dataframe with changed columns

Return type

pandas.core.frame.DataFrame