utils¶
Functions
|
Duplicate dataframe for all the segments. |
|
Perform mapping to dataframe at the target level. |
|
Find "target" column and target quantiles among dataframe columns. |
Inverse transform target components. |
|
|
Find target components in a set of features. |
|
Find quantiles in dataframe columns. |
|
Set columns in a left dataframe with values from the right dataframe. |
Classes
|
Enum for different types of result. |
|
In memory dataset for torch dataloader. |
- duplicate_data(df: pandas.core.frame.DataFrame, segments: Sequence[str], format: str = DataFrameFormat.wide) pandas.core.frame.DataFrame [source]¶
Duplicate dataframe for all the segments.
- Parameters
df (pandas.core.frame.DataFrame) – dataframe to duplicate, there should be column “timestamp”
segments (Sequence[str]) – list of segments for making duplication
format (str) – represent the result in TSDataset inner format (wide) or in flatten format (long)
- Returns
result – result of duplication for all the segments
- Return type
pd.DataFrame
- Raises
ValueError: – if segments list is empty
ValueError: – if incorrect strategy is given
ValueError: – if dataframe doesn’t contain “timestamp” column
Examples
>>> from etna.datasets import generate_const_df >>> from etna.datasets import duplicate_data >>> from etna.datasets import TSDataset >>> df = generate_const_df( ... periods=50, start_time="2020-03-10", ... n_segments=2, scale=1 ... ) >>> timestamp = pd.date_range("2020-03-10", periods=100, freq="D") >>> is_friday_13 = (timestamp.weekday == 4) & (timestamp.day == 13) >>> df_exog_raw = pd.DataFrame({"timestamp": timestamp, "is_friday_13": is_friday_13}) >>> df_exog = duplicate_data(df_exog_raw, segments=["segment_0", "segment_1"], format="wide") >>> df_ts_format = TSDataset.to_dataset(df) >>> ts = TSDataset(df=df_ts_format, df_exog=df_exog, freq="D", known_future="all") >>> ts.head() segment segment_0 segment_1 feature is_friday_13 target is_friday_13 target timestamp 2020-03-10 False 1.00 False 1.00 2020-03-11 False 1.00 False 1.00 2020-03-12 False 1.00 False 1.00 2020-03-13 True 1.00 True 1.00 2020-03-14 False 1.00 False 1.00
- get_level_dataframe(df: pandas.core.frame.DataFrame, mapping_matrix: scipy.sparse._csr.csr_matrix, source_level_segments: List[str], target_level_segments: List[str])[source]¶
Perform mapping to dataframe at the target level.
- Parameters
df (pandas.core.frame.DataFrame) – dataframe at the source level
mapping_matrix (scipy.sparse._csr.csr_matrix) – mapping matrix between levels
source_level_segments (List[str]) – list of segments at the source level, set the order of segments matching the mapping matrix
target_level_segments (List[str]) – list of segments at the target level
- Returns
dataframe at the target level
- get_target_with_quantiles(columns: pandas.core.indexes.base.Index) Set[str] [source]¶
Find “target” column and target quantiles among dataframe columns.
- Parameters
columns (pandas.core.indexes.base.Index) –
- Return type
Set[str]
- inverse_transform_target_components(target_components_df: pandas.core.frame.DataFrame, target_df: pandas.core.frame.DataFrame, inverse_transformed_target_df: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame [source]¶
Inverse transform target components.
- Parameters
target_components_df (pandas.core.frame.DataFrame) – Dataframe with target components
target_df (pandas.core.frame.DataFrame) – Dataframe with transformed target
inverse_transformed_target_df (pandas.core.frame.DataFrame) – Dataframe with inverse_transformed target
- Returns
Dataframe with inverse transformed target components
- Return type
pandas.core.frame.DataFrame
- match_target_components(features: Set[str]) Set[str] [source]¶
Find target components in a set of features.
- Parameters
features (Set[str]) –
- Return type
Set[str]
- match_target_quantiles(features: Set[str]) Set[str] [source]¶
Find quantiles in dataframe columns.
- Parameters
features (Set[str]) –
- Return type
Set[str]
- set_columns_wide(df_left: pandas.core.frame.DataFrame, df_right: pandas.core.frame.DataFrame, timestamps_left: Optional[Sequence[pandas._libs.tslibs.timestamps.Timestamp]] = None, timestamps_right: Optional[Sequence[pandas._libs.tslibs.timestamps.Timestamp]] = None, segments_left: Optional[Sequence[str]] = None, features_right: Optional[Sequence[str]] = None, features_left: Optional[Sequence[str]] = None, segments_right: Optional[Sequence[str]] = None) pandas.core.frame.DataFrame [source]¶
Set columns in a left dataframe with values from the right dataframe.
- Parameters
df_left (pandas.core.frame.DataFrame) – dataframe to set columns in
df_right (pandas.core.frame.DataFrame) – dataframe to set columns from
timestamps_left (Optional[Sequence[pandas._libs.tslibs.timestamps.Timestamp]]) – timestamps to select in
df_left
timestamps_right (Optional[Sequence[pandas._libs.tslibs.timestamps.Timestamp]]) – timestamps to select in
df_right
segments_left (Optional[Sequence[str]]) – segments to select in
df_left
segments_right (Optional[Sequence[str]]) – segments to select in
df_right
features_left (Optional[Sequence[str]]) – features to select in
df_left
features_right (Optional[Sequence[str]]) – features to select in
df_right
- Returns
a new dataframe with changed columns
- Return type
pandas.core.frame.DataFrame