catboost¶
Classes
|
Class for holding Catboost model for all segments. |
|
Class for holding per segment Catboost model. |
|
- class CatBoostMultiSegmentModel(iterations: Optional[int] = None, depth: Optional[int] = None, learning_rate: Optional[float] = None, logging_level: Optional[str] = 'Silent', l2_leaf_reg: Optional[float] = None, thread_count: Optional[int] = None, **kwargs)[source]¶
Class for holding Catboost model for all segments.
Examples
>>> from etna.datasets import generate_periodic_df >>> from etna.datasets import TSDataset >>> from etna.models import CatBoostMultiSegmentModel >>> from etna.transforms import LagTransform >>> classic_df = generate_periodic_df( ... periods=100, ... start_time="2020-01-01", ... n_segments=4, ... period=7, ... sigma=3 ... ) >>> df = TSDataset.to_dataset(df=classic_df) >>> ts = TSDataset(df, freq="D") >>> horizon = 7 >>> transforms = [ ... LagTransform(in_column="target", lags=[horizon, horizon+1, horizon+2]) ... ] >>> ts.fit_transform(transforms=transforms) >>> future = ts.make_future(horizon, transforms=transforms) >>> model = CatBoostMultiSegmentModel() >>> model.fit(ts=ts) CatBoostMultiSegmentModel(iterations = None, depth = None, learning_rate = None, logging_level = 'Silent', l2_leaf_reg = None, thread_count = None, ) >>> forecast = model.forecast(future) >>> forecast.inverse_transform(transforms) >>> pd.options.display.float_format = '{:,.2f}'.format >>> forecast[:, :, "target"].round() segment segment_0 segment_1 segment_2 segment_3 feature target target target target timestamp 2020-04-10 9.00 9.00 4.00 6.00 2020-04-11 5.00 2.00 7.00 9.00 2020-04-12 -0.00 4.00 7.00 9.00 2020-04-13 0.00 5.00 9.00 7.00 2020-04-14 1.00 2.00 1.00 6.00 2020-04-15 5.00 7.00 4.00 7.00 2020-04-16 8.00 6.00 2.00 0.00
Create instance of CatBoostMultiSegmentModel with given parameters.
- Parameters
iterations (Optional[int]) – The maximum number of trees that can be built when solving machine learning problems. When using other parameters that limit the number of iterations, the final number of trees may be less than the number specified in this parameter.
depth (Optional[int]) –
Depth of the tree. The range of supported values depends on the processing unit type and the type of the selected loss function:
CPU — Any integer up to 16.
GPU — Any integer up to 8 pairwise modes (YetiRank, PairLogitPairwise and QueryCrossEntropy) and up to 16 for all other loss functions.
learning_rate (Optional[float]) – The learning rate. Used for reducing the gradient step. If None the value is defined automatically depending on the number of iterations.
logging_level (Optional[str]) –
The logging level to output to stdout. Possible values:
Silent — Do not output any logging information to stdout.
Verbose — Output the following data to stdout:
optimized metric
elapsed time of training
remaining time of training
Info — Output additional information and the number of trees.
Debug — Output debugging information.
l2_leaf_reg (Optional[float]) – Coefficient at the L2 regularization term of the cost function. Any positive value is allowed.
thread_count (Optional[int]) –
The number of threads to use during the training.
For CPU. Optimizes the speed of execution. This parameter doesn’t affect results.
For GPU. The given value is used for reading the data from the hard drive and does not affect the training. During the training one main thread and one thread for each GPU are used.
- fit(ts: etna.datasets.tsdataset.TSDataset) etna.models.mixins.MultiSegmentModelMixin ¶
Fit model.
- Parameters
ts (etna.datasets.tsdataset.TSDataset) – Dataset with features
- Returns
Model after fit
- Return type
- forecast(ts: etna.datasets.tsdataset.TSDataset, return_components: bool = False) etna.datasets.tsdataset.TSDataset ¶
Make predictions.
- Parameters
ts (etna.datasets.tsdataset.TSDataset) – Dataset with features
return_components (bool) – If True additionally returns forecast components
- Returns
Dataset with predictions
- Return type
- get_model() Any ¶
Get internal model that is used inside etna class.
Internal model is a model that is used inside etna to forecast segments, e.g.
catboost.CatBoostRegressor
orsklearn.linear_model.Ridge
.- Returns
Internal model
- Return type
Any
- classmethod load(path: pathlib.Path) typing_extensions.Self ¶
Load an object.
Warning
This method uses
dill
module which is not secure. It is possible to construct malicious data which will execute arbitrary code during loading. Never load data that could have come from an untrusted source, or that could have been tampered with.- Parameters
path (pathlib.Path) – Path to load object from.
- Returns
Loaded object.
- Return type
typing_extensions.Self
- params_to_tune() Dict[str, etna.distributions.distributions.BaseDistribution] [source]¶
Get default grid for tuning hyperparameters.
This grid tunes parameters:
learning_rate
,depth
,random_strength
,l2_leaf_reg
. Other parameters are expected to be set by the user.- Returns
Grid to tune.
- Return type
Dict[str, etna.distributions.distributions.BaseDistribution]
- predict(ts: etna.datasets.tsdataset.TSDataset, return_components: bool = False) etna.datasets.tsdataset.TSDataset ¶
Make predictions with using true values as autoregression context if possible (teacher forcing).
- Parameters
ts (etna.datasets.tsdataset.TSDataset) – Dataset with features
return_components (bool) – If True additionally returns prediction components
- Returns
Dataset with predictions
- Return type
- save(path: pathlib.Path)¶
Save the object.
- Parameters
path (pathlib.Path) – Path to save object to.
- set_params(**params: dict) etna.core.mixins.TMixin ¶
Return new object instance with modified parameters.
Method also allows to change parameters of nested objects within the current object. For example, it is possible to change parameters of a
model
in aPipeline
.Nested parameters are expected to be in a
<component_1>.<...>.<parameter>
form, where components are separated by a dot.- Parameters
**params – Estimator parameters
self (etna.core.mixins.TMixin) –
params (dict) –
- Returns
New instance with changed parameters
- Return type
etna.core.mixins.TMixin
Examples
>>> from etna.pipeline import Pipeline >>> from etna.models import NaiveModel >>> from etna.transforms import AddConstTransform >>> model = model=NaiveModel(lag=1) >>> transforms = [AddConstTransform(in_column="target", value=1)] >>> pipeline = Pipeline(model, transforms=transforms, horizon=3) >>> pipeline.set_params(**{"model.lag": 3, "transforms.0.value": 2}) Pipeline(model = NaiveModel(lag = 3, ), transforms = [AddConstTransform(in_column = 'target', value = 2, inplace = True, out_column = None, )], horizon = 3, )
- to_dict()¶
Collect all information about etna object in dict.
- property context_size: int¶
Context size of the model. Determines how many history points do we ask to pass to the model.
Zero for this model.
- class CatBoostPerSegmentModel(iterations: Optional[int] = None, depth: Optional[int] = None, learning_rate: Optional[float] = None, logging_level: Optional[str] = 'Silent', l2_leaf_reg: Optional[float] = None, thread_count: Optional[int] = None, **kwargs)[source]¶
Class for holding per segment Catboost model.
Examples
>>> from etna.datasets import generate_periodic_df >>> from etna.datasets import TSDataset >>> from etna.models import CatBoostPerSegmentModel >>> from etna.transforms import LagTransform >>> classic_df = generate_periodic_df( ... periods=100, ... start_time="2020-01-01", ... n_segments=4, ... period=7, ... sigma=3 ... ) >>> df = TSDataset.to_dataset(df=classic_df) >>> ts = TSDataset(df, freq="D") >>> horizon = 7 >>> transforms = [ ... LagTransform(in_column="target", lags=[horizon, horizon+1, horizon+2]) ... ] >>> ts.fit_transform(transforms=transforms) >>> future = ts.make_future(horizon, transforms=transforms) >>> model = CatBoostPerSegmentModel() >>> model.fit(ts=ts) CatBoostPerSegmentModel(iterations = None, depth = None, learning_rate = None, logging_level = 'Silent', l2_leaf_reg = None, thread_count = None, ) >>> forecast = model.forecast(future) >>> forecast.inverse_transform(transforms) >>> pd.options.display.float_format = '{:,.2f}'.format >>> forecast[:, :, "target"] segment segment_0 segment_1 segment_2 segment_3 feature target target target target timestamp 2020-04-10 9.00 9.00 4.00 6.00 2020-04-11 5.00 2.00 7.00 9.00 2020-04-12 0.00 4.00 7.00 9.00 2020-04-13 0.00 5.00 9.00 7.00 2020-04-14 1.00 2.00 1.00 6.00 2020-04-15 5.00 7.00 4.00 7.00 2020-04-16 8.00 6.00 2.00 0.00
Create instance of CatBoostPerSegmentModel with given parameters.
- Parameters
iterations (Optional[int]) – The maximum number of trees that can be built when solving machine learning problems. When using other parameters that limit the number of iterations, the final number of trees may be less than the number specified in this parameter.
depth (Optional[int]) –
Depth of the tree. The range of supported values depends on the processing unit type and the type of the selected loss function:
CPU — Any integer up to 16.
GPU — Any integer up to 8 pairwise modes (YetiRank, PairLogitPairwise and QueryCrossEntropy) and up to 16 for all other loss functions.
learning_rate (Optional[float]) – The learning rate. Used for reducing the gradient step. If None the value is defined automatically depending on the number of iterations.
logging_level (Optional[str]) –
The logging level to output to stdout. Possible values:
Silent — Do not output any logging information to stdout.
Verbose — Output the following data to stdout:
optimized metric
elapsed time of training
remaining time of training
Info — Output additional information and the number of trees.
Debug — Output debugging information.
l2_leaf_reg (Optional[float]) – Coefficient at the L2 regularization term of the cost function. Any positive value is allowed.
thread_count (Optional[int]) –
The number of threads to use during the training.
For CPU. Optimizes the speed of execution. This parameter doesn’t affect results.
For GPU. The given value is used for reading the data from the hard drive and does not affect the training. During the training one main thread and one thread for each GPU are used.
- fit(ts: etna.datasets.tsdataset.TSDataset) etna.models.mixins.PerSegmentModelMixin ¶
Fit model.
- Parameters
ts (etna.datasets.tsdataset.TSDataset) – Dataset with features
- Returns
Model after fit
- Return type
- forecast(ts: etna.datasets.tsdataset.TSDataset, return_components: bool = False) etna.datasets.tsdataset.TSDataset ¶
Make predictions.
- Parameters
ts (etna.datasets.tsdataset.TSDataset) – Dataset with features
return_components (bool) – If True additionally returns forecast components
- Returns
Dataset with predictions
- Return type
- get_model() Dict[str, Any] ¶
Get internal models that are used inside etna class.
Internal model is a model that is used inside etna to forecast segments, e.g.
catboost.CatBoostRegressor
orsklearn.linear_model.Ridge
.- Returns
dictionary where key is segment and value is internal model
- Return type
Dict[str, Any]
- classmethod load(path: pathlib.Path) typing_extensions.Self ¶
Load an object.
Warning
This method uses
dill
module which is not secure. It is possible to construct malicious data which will execute arbitrary code during loading. Never load data that could have come from an untrusted source, or that could have been tampered with.- Parameters
path (pathlib.Path) – Path to load object from.
- Returns
Loaded object.
- Return type
typing_extensions.Self
- params_to_tune() Dict[str, etna.distributions.distributions.BaseDistribution] [source]¶
Get default grid for tuning hyperparameters.
This grid tunes parameters:
learning_rate
,depth
,random_strength
,l2_leaf_reg
. Other parameters are expected to be set by the user.- Returns
Grid to tune.
- Return type
Dict[str, etna.distributions.distributions.BaseDistribution]
- predict(ts: etna.datasets.tsdataset.TSDataset, return_components: bool = False) etna.datasets.tsdataset.TSDataset ¶
Make predictions with using true values as autoregression context if possible (teacher forcing).
- Parameters
ts (etna.datasets.tsdataset.TSDataset) – Dataset with features
return_components (bool) – If True additionally returns prediction components
- Returns
Dataset with predictions
- Return type
- save(path: pathlib.Path)¶
Save the object.
- Parameters
path (pathlib.Path) – Path to save object to.
- set_params(**params: dict) etna.core.mixins.TMixin ¶
Return new object instance with modified parameters.
Method also allows to change parameters of nested objects within the current object. For example, it is possible to change parameters of a
model
in aPipeline
.Nested parameters are expected to be in a
<component_1>.<...>.<parameter>
form, where components are separated by a dot.- Parameters
**params – Estimator parameters
self (etna.core.mixins.TMixin) –
params (dict) –
- Returns
New instance with changed parameters
- Return type
etna.core.mixins.TMixin
Examples
>>> from etna.pipeline import Pipeline >>> from etna.models import NaiveModel >>> from etna.transforms import AddConstTransform >>> model = model=NaiveModel(lag=1) >>> transforms = [AddConstTransform(in_column="target", value=1)] >>> pipeline = Pipeline(model, transforms=transforms, horizon=3) >>> pipeline.set_params(**{"model.lag": 3, "transforms.0.value": 2}) Pipeline(model = NaiveModel(lag = 3, ), transforms = [AddConstTransform(in_column = 'target', value = 2, inplace = True, out_column = None, )], horizon = 3, )
- to_dict()¶
Collect all information about etna object in dict.
- property context_size: int¶
Context size of the model. Determines how many history points do we ask to pass to the model.
Zero for this model.