plots

Functions

_cross_correlation(a, b[, maxlags, normed])

Calculate cross correlation between arrays.

acf_plot(ts[, n_segments, lags, partial, ...])

Autocorrelation and partial autocorrelation plot for multiple timeseries.

cross_corr_plot(ts[, n_segments, maxlags, ...])

Cross-correlation plot between multiple timeseries.

distribution_plot(ts[, n_segments, ...])

Distribution of z-values grouped by segments and time frequency.

plot_clusters(ts, segment2cluster[, ...])

Plot clusters [with centroids].

plot_correlation_matrix(ts[, columns, ...])

Plot pairwise correlation heatmap for selected segments.

plot_holidays(ts, holidays[, segments, ...])

Plot holidays for segments.

plot_imputation(ts, imputer[, segments, ...])

Plot the result of imputation by a given imputer.

plot_periodogram(ts, period[, ...])

Plot the periodogram using scipy.signal.periodogram().

acf_plot(ts: TSDataset, n_segments: int = 10, lags: int = 21, partial: bool = False, columns_num: int = 2, segments: Optional[List[str]] = None, figsize: Tuple[int, int] = (10, 5))[source]

Autocorrelation and partial autocorrelation plot for multiple timeseries.

Notes

Definition of autocorrelation.

Definition of partial autocorrelation.

  • If partial=False function works with NaNs at any place of the time-series.

  • if partial=True function works only with NaNs at the edges of the time-series and fails if there are NaNs inside it.

Parameters
  • ts (TSDataset) – TSDataset with timeseries data

  • n_segments (int) – number of random segments to plot

  • lags (int) – number of timeseries shifts for cross-correlation

  • partial (bool) – plot autocorrelation or partial autocorrelation

  • columns_num (int) – number of columns in subplots

  • segments (Optional[List[str]]) – segments to plot

  • figsize (Tuple[int, int]) – size of the figure per subplot with one segment in inches

Raises

ValueError: – If partial=True and there is a NaN in the middle of the time series

cross_corr_plot(ts: TSDataset, n_segments: int = 10, maxlags: int = 21, segments: Optional[List[str]] = None, columns_num: int = 2, figsize: Tuple[int, int] = (10, 5))[source]

Cross-correlation plot between multiple timeseries.

Parameters
  • ts (TSDataset) – TSDataset with timeseries data

  • n_segments (int) – number of random segments to plot, ignored if parameter segments is set

  • maxlags (int) – number of timeseries shifts for cross-correlation, should be >=1 and <= len(timeseries)

  • segments (Optional[List[str]]) – segments to plot

  • columns_num (int) – number of columns in subplots

  • figsize (Tuple[int, int]) – size of the figure per subplot with one segment in inches

Raises

ValueError: – parameter maxlags doesn’t satisfy constraints

distribution_plot(ts: TSDataset, n_segments: int = 10, segments: Optional[List[str]] = None, shift: int = 30, window: int = 30, freq: str = '1M', n_rows: int = 10, figsize: Tuple[int, int] = (10, 5))[source]

Distribution of z-values grouped by segments and time frequency.

Mean is calculated by the windows:

\[mean_{i} = \sum_{j=i-\text{shift}}^{i-\text{shift}+\text{window}} \frac{x_{j}}{\text{window}}\]

The same is applied to standard deviation.

Parameters
  • ts (TSDataset) – dataset with timeseries data

  • n_segments (int) – number of random segments to plot

  • segments (Optional[List[str]]) – segments to plot

  • shift (int) – number of timeseries shifts for statistics calc

  • window (int) – number of points for statistics calc

  • freq (str) – group for z-values

  • n_rows (int) – maximum number of rows to plot

  • figsize (Tuple[int, int]) – size of the figure per subplot with one segment in inches

plot_clusters(ts: TSDataset, segment2cluster: Dict[str, int], centroids_df: Optional[pandas.core.frame.DataFrame] = None, columns_num: int = 2, figsize: Tuple[int, int] = (10, 5))[source]

Plot clusters [with centroids].

Parameters
  • ts (TSDataset) – TSDataset with timeseries

  • segment2cluster (Dict[str, int]) – mapping from segment to cluster in format {segment: cluster}

  • centroids_df (Optional[pandas.core.frame.DataFrame]) – dataframe with centroids

  • columns_num (int) – number of columns in subplots

  • figsize (Tuple[int, int]) – size of the figure per subplot with one segment in inches

plot_correlation_matrix(ts: TSDataset, columns: Optional[List[str]] = None, segments: Optional[List[str]] = None, method: str = 'pearson', mode: str = 'macro', columns_num: int = 2, figsize: Tuple[int, int] = (10, 10), **heatmap_kwargs)[source]

Plot pairwise correlation heatmap for selected segments.

Parameters
  • ts (TSDataset) – TSDataset with timeseries data

  • columns (Optional[List[str]]) – Columns to use, if None use all columns

  • segments (Optional[List[str]]) – Segments to use

  • method (str) –

    Method of correlation:

    • pearson: standard correlation coefficient

    • kendall: Kendall Tau correlation coefficient

    • spearman: Spearman rank correlation

  • mode ('macro' or 'per-segment') – Aggregation mode

  • columns_num (int) – Number of subplots columns

  • figsize (Tuple[int, int]) – size of the figure in inches

plot_holidays(ts: TSDataset, holidays: Union[str, pandas.core.frame.DataFrame], segments: Optional[List[str]] = None, columns_num: int = 2, figsize: Tuple[int, int] = (10, 5), start: Optional[str] = None, end: Optional[str] = None, as_is: bool = False)[source]

Plot holidays for segments.

Sequence of timestamps with one holiday is drawn as a colored region. Individual holiday is drawn like a colored point.

It is not possible to distinguish points plotted at one timestamp, but this case is considered rare. This the problem isn’t relevant for region drawing because they are partially transparent.

Parameters
  • ts (TSDataset) – TSDataset with timeseries data

  • holidays (Union[str, pandas.core.frame.DataFrame]) –

    there are several options:

    • if str, then this is code of the country in holidays library;

    • if DataFrame, then dataframe is expected to be in prophet`s holiday format;

  • segments (Optional[List[str]]) – segments to use

  • columns_num (int) – number of columns in subplots

  • figsize (Tuple[int, int]) – size of the figure per subplot with one segment in inches

  • as_is (bool) –

    • Use this option if DataFrame is represented as a dataframe with a timestamp index and holiday names columns.
      In a holiday column values 0 represent absence of holiday in that timestamp, 1 represent the presence.

  • start (Optional[str]) – start timestamp for plot

  • end (Optional[str]) – end timestamp for plot

Raises

ValueError:

  • Holiday nor pd.DataFrame or String. * Holiday is an empty pd.DataFrame. * as_is=True while holiday is String. * If upper_window is negative. * If lower_window is positive.

plot_imputation(ts: TSDataset, imputer: TimeSeriesImputerTransform, segments: Optional[List[str]] = None, columns_num: int = 2, figsize: Tuple[int, int] = (10, 5), start: Optional[str] = None, end: Optional[str] = None)[source]

Plot the result of imputation by a given imputer.

Parameters
  • ts (TSDataset) – TSDataset with timeseries data

  • imputer (TimeSeriesImputerTransform) – transform to make imputation of NaNs

  • segments (Optional[List[str]]) – segments to use

  • columns_num (int) – number of columns in subplots

  • figsize (Tuple[int, int]) – size of the figure per subplot with one segment in inches

  • start (Optional[str]) – start timestamp for plot

  • end (Optional[str]) – end timestamp for plot

plot_periodogram(ts: TSDataset, period: float, amplitude_aggregation_mode: Union[str, Literal['per-segment']] = AggregationMode.mean, periodogram_params: Optional[Dict[str, Any]] = None, segments: Optional[List[str]] = None, xticks: Optional[List[Any]] = None, columns_num: int = 2, figsize: Tuple[int, int] = (10, 5))[source]

Plot the periodogram using scipy.signal.periodogram().

It is useful to determine the optimal order parameter for FourierTransform.

Parameters
  • ts (TSDataset) – TSDataset with timeseries data

  • period (float) – the period of the seasonality to capture in frequency units of time series, it should be >= 2; it is translated to the fs parameter of scipy.signal.periodogram()

  • amplitude_aggregation_mode (Union[str, Literal['per-segment']]) – aggregation strategy for obtained per segment periodograms; all the strategies can be examined at AggregationMode

  • periodogram_params (Optional[Dict[str, Any]]) – additional keyword arguments for periodogram, scipy.signal.periodogram() is used

  • segments (Optional[List[str]]) – segments to use

  • xticks (Optional[List[Any]]) – list of tick locations of the x-axis, useful to highlight specific reference periodicities

  • columns_num (int) – if amplitude_aggregation_mode="per-segment" number of columns in subplots, otherwise the value is ignored

  • figsize (Tuple[int, int]) – size of the figure per subplot with one segment in inches

Raises
  • ValueError: – if period < 2

  • ValueError: – if periodogram can’t be calculated on segment because of the NaNs inside it

Notes

In non per-segment mode all segments are cut to be the same length, the last values are taken.