mrmr

mrmr(relevance_table: pandas.core.frame.DataFrame, regressors: pandas.core.frame.DataFrame, top_k: int, fast_redundancy: bool = False, relevance_aggregation_mode: str = AggregationMode.mean, redundancy_aggregation_mode: str = AggregationMode.mean, atol: float = 1e-10) List[str][source]

Maximum Relevance and Minimum Redundancy feature selection method.

Here relevance for each regressor is calculated as the per-segment aggregation of the relevance values in relevance_table. The redundancy term for the regressor is calculated as a mean absolute correlation between this regressor and other ones. The correlation between the two regressors is an aggregated pairwise correlation for the regressors values in each segment.

Parameters
  • relevance_table (pandas.core.frame.DataFrame) – dataframe of shape n_segment x n_exog_series with relevance table, where relevance_table[i][j] contains relevance of j-th df_exog series to i-th df series

  • regressors (pandas.core.frame.DataFrame) – dataframe with regressors in etna format

  • top_k (int) – num of regressors to select; if there are not enough regressors, then all will be selected

  • fast_redundancy (bool) –

    • True: compute redundancy only inside the the segments, time complexity \(O(top\_k * n\_segments * n\_features * history\_len)\)

    • False: compute redundancy for all the pairs of segments, time complexity \(O(top\_k * n\_segments^2 * n\_features * history\_len)\)

  • relevance_aggregation_mode (str) – the method for relevance values per-segment aggregation

  • redundancy_aggregation_mode (str) – the method for redundancy values per-segment aggregation

  • atol (float) – the absolute tolerance to compare the float values

Returns

selected_features – list of top_k selected regressors, sorted by their importance

Return type

List[str]