# hist_outliers¶

Functions

 `adjust_estimation`(i, k, sse, sse_one_bin) Count sse_one_bin[i][k] using binary search. `compute_f`(series, k, p, pp) Compute F. `get_anomalies_hist`(ts[, in_column, bins_number]) Get point outliers in time series using histogram model. `hist`(series, bins_number) Compute outliers indices according to hist rule. `optimal_sse`(left, right, p, pp) Count the approximation error by 1 bin from left to right elements. `v_optimal_hist`(series, bins_number, p, pp) Count an approximation error of a series with [1, bins_number] bins.
adjust_estimation(i: int, k: int, sse: numpy.ndarray, sse_one_bin: numpy.ndarray) float[source]

Count sse_one_bin[i][k] using binary search.

Parameters
• i (int) – left border of series

• k (int) – number of bins

• sse (numpy.ndarray) – array of approximation errors

• sse_one_bin (numpy.ndarray) – array of approximation errors with one bin

Returns

result – calculated sse_one_bin[i][k]

Return type

float

compute_f(series: numpy.ndarray, k: int, p: numpy.ndarray, pp: numpy.ndarray) Tuple[numpy.ndarray, list][source]

Compute F. F[a][b][k] - minimum approximation error on series[a:b+1] with k outliers.

Parameters
• series (numpy.ndarray) – array to count F

• k (int) – number of outliers

• p (numpy.ndarray) – array of sums of elements, `p[i]` - sum from 0th to i elements

• pp (numpy.ndarray) – array of sums of squares of elements, `pp[i]` - sum of squares from 0th to i elements

Returns

result – array F, outliers_indices

Return type

np.ndarray

get_anomalies_hist(ts: TSDataset, in_column: str = 'target', bins_number: int = 10) Dict[str, List[pandas._libs.tslibs.timestamps.Timestamp]][source]

Get point outliers in time series using histogram model.

Outliers are all points that, when removed, result in a histogram with a lower approximation error, even with the number of bins less than the number of outliers.

Parameters
• ts (TSDataset) – TSDataset with timeseries data

• in_column (str) – name of the column in which the anomaly is searching

• bins_number (int) – number of bins

Returns

dict of outliers in format {segment: [outliers_timestamps]}

Return type

Dict[str, List[pandas._libs.tslibs.timestamps.Timestamp]]

hist(series: numpy.ndarray, bins_number: int) numpy.ndarray[source]

Compute outliers indices according to hist rule.

Parameters
• series (numpy.ndarray) – array to count F

• bins_number (int) – number of bins

Returns

indices – outliers indices

Return type

np.ndarray

optimal_sse(left: int, right: int, p: numpy.ndarray, pp: numpy.ndarray) float[source]

Count the approximation error by 1 bin from left to right elements.

Parameters
• left (int) – left border

• right (int) – right border

• p (numpy.ndarray) – array of sums of elements, `p[i]` - sum from first to i elements

• pp (numpy.ndarray) – array of sums of squares of elements, `pp[i]` - sum of squares from first to i elements

Returns

result – approximation error

Return type

float

v_optimal_hist(series: numpy.ndarray, bins_number: int, p: numpy.ndarray, pp: numpy.ndarray) numpy.ndarray[source]

Count an approximation error of a series with [1, bins_number] bins.

Parameters
• series (numpy.ndarray) – array to count an approximation error with bins_number bins

• bins_number (int) – number of bins

• p (numpy.ndarray) – array of sums of elements, p[i] - sum from 0th to i elements

• pp (numpy.ndarray) – array of sums of squares of elements, p[i] - sum of squares from 0th to i elements

Returns

error – approximation error of a series with [1, bins_number] bins

Return type

np.ndarray