datasets_generation

Functions

generate_ar_df(periods, start_time[, ...])

Create DataFrame with AR process data.

generate_const_df(periods, start_time, scale)

Create DataFrame with const data.

generate_from_patterns_df(periods, ...[, ...])

Create DataFrame from patterns.

generate_hierarchical_df(periods, n_segments)

Create DataFrame with hierarchical structure and AR process data.

generate_periodic_df(periods, start_time[, ...])

Create DataFrame with periodic data.

generate_ar_df(periods: int, start_time: str, ar_coef: Optional[list] = None, sigma: float = 1, n_segments: int = 1, freq: str = '1D', random_seed: int = 1) pandas.core.frame.DataFrame[source]

Create DataFrame with AR process data.

Parameters
  • periods (int) – number of timestamps

  • start_time (str) – start timestamp

  • ar_coef (Optional[list]) – AR coefficients

  • sigma (float) – scale of AR noise

  • n_segments (int) – number of segments

  • freq (str) – pandas frequency string for pandas.date_range() that is used to generate timestamp

  • random_seed (int) – random seed

Return type

pandas.core.frame.DataFrame

generate_const_df(periods: int, start_time: str, scale: float, n_segments: int = 1, freq: str = '1D', add_noise: bool = False, sigma: float = 1, random_seed: int = 1) pandas.core.frame.DataFrame[source]

Create DataFrame with const data.

Parameters
  • periods (int) – number of timestamps

  • start_time (str) – start timestamp

  • scale (float) – const value to fill

  • period – data frequency – x[i+period] = x[i]

  • n_segments (int) – number of segments

  • freq (str) – pandas frequency string for pandas.date_range() that is used to generate timestamp

  • add_noise (bool) – if True we add noise to final samples

  • sigma (float) – scale of added noise

  • random_seed (int) – random seed

Return type

pandas.core.frame.DataFrame

generate_from_patterns_df(periods: int, start_time: str, patterns: List[List[float]], freq: str = '1D', add_noise=False, sigma: float = 1, random_seed: int = 1) pandas.core.frame.DataFrame[source]

Create DataFrame from patterns.

Parameters
  • periods (int) – number of timestamps

  • start_time (str) – start timestamp

  • patterns (List[List[float]]) – list of lists with patterns to be repeated

  • freq (str) – pandas frequency string for pandas.date_range() that is used to generate timestamp

  • add_noise – if True we add noise to final samples

  • sigma (float) – scale of added noise

  • random_seed (int) – random seed

Return type

pandas.core.frame.DataFrame

generate_hierarchical_df(periods: int, n_segments: List[int], freq: str = 'D', start_time: str = '2000-01-01', ar_coef: Optional[list] = None, sigma: float = 1, random_seed: int = 1) pandas.core.frame.DataFrame[source]

Create DataFrame with hierarchical structure and AR process data.

The hierarchical structure is generated as follows:
  1. Number of levels in the structure is the same as length of n_segments parameter

  2. Each level contains the number of segments set in n_segments

  3. Connections from parent to child level are generated randomly.

Parameters
  • periods (int) – number of timestamps

  • n_segments (List[int]) – number of segments on each level.

  • freq (str) – pandas frequency string for pandas.date_range() that is used to generate timestamp

  • start_time (str) – start timestamp

  • ar_coef (Optional[list]) – AR coefficients

  • sigma (float) – scale of AR noise

  • random_seed (int) – random seed

Returns

DataFrame at the bottom level of the hierarchy

Raises
  • ValueError:n_segments is empty

  • ValueError:n_segments contains not positive integers

  • ValueError:n_segments represents not non-decreasing sequence

Return type

pandas.core.frame.DataFrame

generate_periodic_df(periods: int, start_time: str, scale: float = 10, period: int = 1, n_segments: int = 1, freq: str = '1D', add_noise: bool = False, sigma: float = 1, random_seed: int = 1) pandas.core.frame.DataFrame[source]

Create DataFrame with periodic data.

Parameters
  • periods (int) – number of timestamps

  • start_time (str) – start timestamp

  • scale (float) – we sample data from Uniform[0, scale)

  • period (int) – data frequency – x[i+period] = x[i]

  • n_segments (int) – number of segments

  • freq (str) – pandas frequency string for pandas.date_range() that is used to generate timestamp

  • add_noise (bool) – if True we add noise to final samples

  • sigma (float) – scale of added noise

  • random_seed (int) – random seed

Return type

pandas.core.frame.DataFrame