1.XGBClassifie

Scikit-Learn API
能一个工具搞定的,就不要太多工具,关于python_api某块,重点翻译学习Scikit-Learn API,主要翻译分类,回归,排序部分,本文重点翻译分类部分、

classxgboost.XGBClassifier(*, objective=‘binary:logistic’, use_label_encoder=False, **kwargs)
Bases: xgboost.sklearn.XGBModel, sklearn.base.ClassifierMixin
Implementation of the scikit-learn API for XGBoost classification.

XGB分类器类

基:xgboost.sklearn.XGBModel与sklearn.base.ClassifierMixin
用于XGBoost分类的scikit学习API的实现

2.parameter

2.1参数第一部分

n_estimators (int) – Number of boosting rounds.

max_depth (Optional[int]) – Maximum tree depth for base learners.

max_leaves – Maximum number of leaves; 0 indicates no limit.

max_bin – If using histogram-based algorithm, maximum number of bins per feature

grow_policy – Tree growing policy. 0: favor splitting at nodes closest to the node, i.e. grow depth-wise. 1: favor splitting at nodes with highest loss change.

learning_rate (Optional[float]) – Boosting learning rate (xgb’s “eta”)

verbosity (Optional[int]) – The degree of verbosity. Valid values are 0 (silent) - 3 (debug).

参数
n_estimators (int) –提升轮数.

max_depth (Optional[int]) – 基学习器的最大深度.

max_leaves – 最大叶子数;0表示无限制.

max_bin – 如果使用基于分箱算法,则每个特征的最大分箱数

grow_policy – 树木生长政策。0:优先在距离节点最近的节点处进行拆分,即按深度增长。1: 优先在损失变化最大的节点进行拆分。

learning_rate (Optional[float]) – 提升时的学习率 (xgb’s “eta”)
verbosity (Optional[int]) – 冗余的程度。有效值为0(无提示)-3(调试)

2.2参数第2部分

objective (Union[str, Callable[[numpy.ndarray, numpy.ndarray], Tuple[numpy.ndarray, numpy.ndarray]], NoneType]) – Specify the learning task and the corresponding learning objective or a custom objective function to be used (see note below).

booster (Optional[str]) – Specify which booster to use: gbtree, gblinear or dart.

tree_method (Optional[str]) – Specify which tree method to use. Default to auto. If this parameter is set to default, XGBoost will choose the most conservative option available. It’s recommended to study this option from the parameters document tree method

n_jobs (Optional[int]) – Number of parallel threads used to run xgboost. When used with other Scikit-Learn algorithms like grid search, you may choose which algorithm to parallelize and balance the threads. Creating thread contention will significantly slow down both algorithms.

gamma (Optional[float]) – (min_split_loss) Minimum loss reduction required to make a further partition on a leaf node of the tree.

objective (Union[str, Callable[[numpy.ndarray, numpy.ndarray], Tuple[numpy.ndarray, numpy.ndarray]], NoneType]) – 指定要使用的学习任务和相应的学习目标或自定义目标函数(请参见下面的注释)。

booster (Optional[str]) – 指定要使用的提升器:gbtree、gblinear或dart。

tree_method (Optional[str]) – 指定要使用的树方法。默认为自动。如果此参数设置为默认值,XGBoost将选择可用的最保守选项。建议从参数文档树方法研究此选项

n_jobs (Optional[int]) – 用于运行xgboost的并行线程数。当与其他Scikit学习算法(如网格搜索)一起使用时,您可以选择并行化和平衡线程的算法。产生线程争用将显著降低这两种算法的速度。

gamma (Optional[float]) – (min_split_loss) 在树的叶节点上进行进一步分区所需的最小损失减少

2.3参数第3部分

min_child_weight (Optional[float]) – Minimum sum of instance weight(hessian) needed in a child.

max_delta_step (Optional[float]) – Maximum delta step we allow each tree’s weight estimation to be.

subsample (Optional[float]) – Subsample ratio of the training instance.

sampling_method –Sampling method. Used only by gpu_hist tree method.
uniform: select random training instances uniformly.
gradient_based select random training instances with higher probability when the gradient and hessian are larger. (cf. CatBoost)

min_child_weight (Optional[float]) – 子树(类)所需的最小实例权重和.

max_delta_step (Optional[float]) –我们允许每棵树的权重估计为的最大增量步长。

subsample (Optional[float]) – 训练实例的子样本比率。

sampling_method – 取样方法。仅由gpu_hist树方法使用。
uniform: 统一随机选择训练实例。
gradient_based:选择较高概率具有较大梯度和hessian值的随机训练实例

2.4参数第4部分

colsample_bytree (Optional[float]) – Subsample ratio of columns when constructing each tree.

colsample_bylevel (Optional[float]) – Subsample ratio of columns for each level.

colsample_bynode (Optional[float]) – Subsample ratio of columns for each split.

reg_alpha (Optional[float]) – L1 regularization term on weights (xgb’s alpha).

reg_lambda (Optional[float]) – L2 regularization term on weights (xgb’s lambda).

scale_pos_weight (Optional[float]) – Balancing of positive and negative weights.

base_score (Optional[float]) – The initial prediction score of all instances, global bias.

colsample_bytree (Optional[float]) – 构造每个树时,列的子样本比率。

colsample_bylevel (Optional[float]) – 每个级别,列的子样本比率。

colsample_bynode (Optional[float]) –每次拆分的列的子样本比率。

reg_alpha (Optional[float]) – 权重上的L1正则化项(xgb的alpha)

reg_lambda (Optional[float]) – 权重上的L2正则化项(xgb的lambda)

scale_pos_weight (Optional[float]) – 平衡正负权重。

base_score (Optional[float]) –所有实例的初始预测得分,全局偏差。

random_state (Optional[Union[numpy.random.RandomState, int]]) –Random number seed.

Note:Using gblinear booster with shotgun updater is nondeterministic as it uses Hogwild algorithm.

missing(float, default np.nan) – Value in the data which needs to be present as a missing value.

num_parallel_tree (Optional[int]) – Used for boosting random forest.

monotone_constraints (Optional[Union[Dict[str, int], str]]) – Constraint of variable monotonicity. See tutorial for more information.

interaction_constraints (Optional[Union[str, List[Tuple[str]]]]) – Constraints for interaction representing permitted interactions. The constraints must be specified in the form of a nested list, e.g. [[0, 1], [2, 3, 4]], where each inner list is a group of indices of features that are allowed to interact with each other. See tutorial for more information

importance_type (Optional[str]) –The feature importance type for the feature_importances_ property:

For tree model, it’s either “gain”, “weight”, “cover”, “total_gain” or “total_cover”.

For linear model, only “weight” is defined and it’s the normalized coefficients without bias.

random_state (Optional[Union[numpy.random.RandomState, int]]) –随机数种子.

注意:将gblinear booster与shotgun(猎枪)更新程序结合使用是不确定的,因为它使用的是Hogwild算法

missing(float, default np.nan) – 数据中需要作为缺失值显示的值

num_parallel_tree (Optional[int]) – 用于提升随机林.

monotone_constraints (Optional[Union[Dict[str, int], str]]) – 变量单调性约束。有关详细信息,请参见教程。

interaction_constraints (Optional[Union[str, List[Tuple[str]]]]) – 表示允许交互的交互约束。必须以嵌套列表的形式指定约束,例如[[0,1],[2,3,4]],其中每个内部列表是一组允许相互交互的特征索引。有关详细信息,请参见教程。

importance_type (Optional[str]) –关于 feature_importances_ property的要素重要性类型:

对于树模型, 它是“gain”, “weight”, “cover”, “total_gain” or “total_cover”

对于线模型, 只有 “weight”被定义 ,它是无偏差的归一化系数

2.5参数第5部分

gpu_id (Optional[int]) – Device ordinal.

validate_parameters (Optional[bool]) – Give warnings for unknown parameter.

predictor (Optional[str]) – Force XGBoost to use specific predictor, available choices are [cpu_predictor, gpu_predictor].

enable_categorical (bool) –New in version 1.5.0.

Not: This parameter is experimental

Experimental support for categorical data. When enabled, cudf/pandas.DataFrame should be used to specify categorical data type. Also, JSON/UBJSON serialization format is required.

max_cat_to_onehot (Optional[int]) –New in version 1.6.0.

Note:This parameter is experimental

A threshold for deciding whether XGBoost should use one-hot encoding based split for categorical data. When number of categories is lesser than the threshold then one-hot encoding is chosen, otherwise the categories will be partitioned into children nodes. Only relevant for regression and binary classification. See Categorical Data for details.

gpu_id (Optional[int]) –设备序号。.

validate_parameters (Optional[bool]) – 对未知参数发出警告。

predictor (Optional[str]) – 强制XGBoost使用特定的预测器, 可选项是 [cpu_predictor, gpu_predictor].

enable_categorical (bool) –版本1.5.0中新增加。

Not: 此参数为实验参数

对分类数据的实验支持。启用时,cudf/pandas.DataFrame用于指定分类数据类型。此外,要求JSON/UBJSON序列化。

max_cat_to_onehot (Optional[int]) –版本1.6.0中新增加。

Not: 此参数为实验参数

用于决定XGBoost是否应该对分类数据使用基于热编码的拆分的阈值。当类别数小于阈值时,则选择一个热编码,否则,类别将被划分为子节点。仅与回归和二元分类相关。有关详细信息,请参阅分类数据。

eval_metric (Optional[Union[str, List[str], Callable]]) –New in version 1.6.0.

Metric used for monitoring the training result and early stopping. It can be a string or list of strings as names of predefined metric in XGBoost (See doc/parameter.rst), one of the metrics in sklearn.metrics, or any other user defined metric that looks like sklearn.metrics.

If custom objective is also provided, then custom metric should implement the corresponding reverse link function.

Unlike the scoring parameter commonly used in scikit-learn, when a callable object is provided, it’s assumed to be a cost function and by default XGBoost will minimize the result during early stopping.

For advanced usage on Early stopping like directly choosing to maximize instead of minimize, see xgboost.callback.EarlyStopping.

See Custom Objective and Evaluation Metric for more.

Note:This parameter replaces eval_metric in fit() method. The old one receives un-transformed prediction regardless of whether custom objective is being used.

eval_metric (Optional[Union[str, List[str], Callable]]) –版本1.6.0中新增加。

用于监控训练结果和提前停止的指标。它可以是一个字符串或字符串列表,作为XGBoost中预定指标的名称(请参见doc/parameter.rst),sklearn.metrics中的一个指标。或任何其他用户定义的看起来像sklearn.metrics的指标。

如果还提供了自定义目标函数,则自定义评价指标应实现相应的反向链接功能。

与scikit learn中常用的评分参数不同,当提供可调用对象时,它被假定为一个成本函数,默认情况下XGBoost将最小化结果在提前停止期间。

有关早期停止的高级用法,如直接选择最大化而不是最小化,请参阅xgboost。回调。提前停止。

请参阅自定义目标和评估指标获取更多有关信息,。

注意:此参数替换 fit()方法中的eval_metric。无论是否使用自定义目标函数,旧目标都会收到未转换的预测。

from sklearn.datasets import load_diabetes
from sklearn.metrics import mean_absolute_error
import xgboost as xgb
X, y = load_diabetes(return_X_y=True)
reg = xgb.XGBRegressor(
    tree_method="hist",
    eval_metric=mean_absolute_error,
)
reg.fit(X, y, eval_set=[(X, y)])

[0]	validation_0-rmse:125.60229	validation_0-mean_absolute_error:107.86327
[1]	validation_0-rmse:94.53059	validation_0-mean_absolute_error:78.02611
[2]	validation_0-rmse:72.70615	validation_0-mean_absolute_error:57.60754
[3]	validation_0-rmse:57.41636	validation_0-mean_absolute_error:44.09879
[4]	validation_0-rmse:46.72110	validation_0-mean_absolute_error:35.53532
[5]	validation_0-rmse:39.40697	validation_0-mean_absolute_error:30.12643
[6]	validation_0-rmse:33.75610	validation_0-mean_absolute_error:25.94312
[7]	validation_0-rmse:29.48226	validation_0-mean_absolute_error:22.60080
[8]	validation_0-rmse:26.30025	validation_0-mean_absolute_error:20.16968
[9]	validation_0-rmse:23.10979	validation_0-mean_absolute_error:17.79017
[10]	validation_0-rmse:21.35165	validation_0-mean_absolute_error:16.31033
[11]	validation_0-rmse:19.53509	validation_0-mean_absolute_error:14.95299
[12]	validation_0-rmse:18.42825	validation_0-mean_absolute_error:14.12309
[13]	validation_0-rmse:17.15199	validation_0-mean_absolute_error:13.13535
[14]	validation_0-rmse:15.62577	validation_0-mean_absolute_error:11.86242
[15]	validation_0-rmse:15.10604	validation_0-mean_absolute_error:11.44101
[16]	validation_0-rmse:14.26707	validation_0-mean_absolute_error:10.78244
[17]	validation_0-rmse:13.26172	validation_0-mean_absolute_error:9.95314
[18]	validation_0-rmse:12.79556	validation_0-mean_absolute_error:9.53170
[19]	validation_0-rmse:11.83669	validation_0-mean_absolute_error:8.79309
[20]	validation_0-rmse:11.68172	validation_0-mean_absolute_error:8.60161
[21]	validation_0-rmse:11.11805	validation_0-mean_absolute_error:8.19914
[22]	validation_0-rmse:10.69094	validation_0-mean_absolute_error:7.83645
[23]	validation_0-rmse:10.13389	validation_0-mean_absolute_error:7.40532
[24]	validation_0-rmse:9.45950	validation_0-mean_absolute_error:6.84925
[25]	validation_0-rmse:8.81668	validation_0-mean_absolute_error:6.35195
[26]	validation_0-rmse:8.38823	validation_0-mean_absolute_error:6.05989
[27]	validation_0-rmse:7.87457	validation_0-mean_absolute_error:5.65038
[28]	validation_0-rmse:7.58585	validation_0-mean_absolute_error:5.41957
[29]	validation_0-rmse:7.12735	validation_0-mean_absolute_error:5.09982
[30]	validation_0-rmse:6.77737	validation_0-mean_absolute_error:4.83167
[31]	validation_0-rmse:6.52171	validation_0-mean_absolute_error:4.64407
[32]	validation_0-rmse:6.15232	validation_0-mean_absolute_error:4.32023
[33]	validation_0-rmse:5.78856	validation_0-mean_absolute_error:4.08167
[34]	validation_0-rmse:5.73804	validation_0-mean_absolute_error:4.03156
[35]	validation_0-rmse:5.52209	validation_0-mean_absolute_error:3.87176
[36]	validation_0-rmse:5.28810	validation_0-mean_absolute_error:3.71434
[37]	validation_0-rmse:4.86828	validation_0-mean_absolute_error:3.41697
[38]	validation_0-rmse:4.56639	validation_0-mean_absolute_error:3.22589
[39]	validation_0-rmse:4.33981	validation_0-mean_absolute_error:3.05319
[40]	validation_0-rmse:4.00879	validation_0-mean_absolute_error:2.83498
[41]	validation_0-rmse:3.79877	validation_0-mean_absolute_error:2.66336
[42]	validation_0-rmse:3.69970	validation_0-mean_absolute_error:2.57036
[43]	validation_0-rmse:3.62628	validation_0-mean_absolute_error:2.49631
[44]	validation_0-rmse:3.35754	validation_0-mean_absolute_error:2.31807
[45]	validation_0-rmse:3.27957	validation_0-mean_absolute_error:2.26423
[46]	validation_0-rmse:3.21589	validation_0-mean_absolute_error:2.20716
[47]	validation_0-rmse:3.17717	validation_0-mean_absolute_error:2.16696
[48]	validation_0-rmse:3.00932	validation_0-mean_absolute_error:2.05033
[49]	validation_0-rmse:2.88938	validation_0-mean_absolute_error:1.97796
[50]	validation_0-rmse:2.80247	validation_0-mean_absolute_error:1.90016
[51]	validation_0-rmse:2.66098	validation_0-mean_absolute_error:1.81172
[52]	validation_0-rmse:2.39878	validation_0-mean_absolute_error:1.65243
[53]	validation_0-rmse:2.22235	validation_0-mean_absolute_error:1.53393
[54]	validation_0-rmse:2.11483	validation_0-mean_absolute_error:1.45341
[55]	validation_0-rmse:2.01234	validation_0-mean_absolute_error:1.37645
[56]	validation_0-rmse:1.88209	validation_0-mean_absolute_error:1.28805
[57]	validation_0-rmse:1.85998	validation_0-mean_absolute_error:1.26770
[58]	validation_0-rmse:1.68246	validation_0-mean_absolute_error:1.16017
[59]	validation_0-rmse:1.58247	validation_0-mean_absolute_error:1.08573
[60]	validation_0-rmse:1.49737	validation_0-mean_absolute_error:1.02531
[61]	validation_0-rmse:1.42128	validation_0-mean_absolute_error:0.98005
[62]	validation_0-rmse:1.35798	validation_0-mean_absolute_error:0.93843
[63]	validation_0-rmse:1.32351	validation_0-mean_absolute_error:0.90900
[64]	validation_0-rmse:1.24190	validation_0-mean_absolute_error:0.85659
[65]	validation_0-rmse:1.23084	validation_0-mean_absolute_error:0.84815
[66]	validation_0-rmse:1.17905	validation_0-mean_absolute_error:0.81719
[67]	validation_0-rmse:1.12405	validation_0-mean_absolute_error:0.77899
[68]	validation_0-rmse:1.11357	validation_0-mean_absolute_error:0.76819
[69]	validation_0-rmse:1.08249	validation_0-mean_absolute_error:0.75051
[70]	validation_0-rmse:0.99841	validation_0-mean_absolute_error:0.69092
[71]	validation_0-rmse:0.94803	validation_0-mean_absolute_error:0.65164
[72]	validation_0-rmse:0.92989	validation_0-mean_absolute_error:0.63650
[73]	validation_0-rmse:0.90956	validation_0-mean_absolute_error:0.61828
[74]	validation_0-rmse:0.89545	validation_0-mean_absolute_error:0.60560
[75]	validation_0-rmse:0.87946	validation_0-mean_absolute_error:0.59108
[76]	validation_0-rmse:0.81718	validation_0-mean_absolute_error:0.55461
[77]	validation_0-rmse:0.75697	validation_0-mean_absolute_error:0.50971
[78]	validation_0-rmse:0.70997	validation_0-mean_absolute_error:0.48306
[79]	validation_0-rmse:0.69258	validation_0-mean_absolute_error:0.46948
[80]	validation_0-rmse:0.65017	validation_0-mean_absolute_error:0.44389
[81]	validation_0-rmse:0.60811	validation_0-mean_absolute_error:0.41713
[82]	validation_0-rmse:0.59621	validation_0-mean_absolute_error:0.40903
[83]	validation_0-rmse:0.56955	validation_0-mean_absolute_error:0.38835
[84]	validation_0-rmse:0.53451	validation_0-mean_absolute_error:0.36762
[85]	validation_0-rmse:0.51161	validation_0-mean_absolute_error:0.35368
[86]	validation_0-rmse:0.48165	validation_0-mean_absolute_error:0.32819
[87]	validation_0-rmse:0.45540	validation_0-mean_absolute_error:0.31016
[88]	validation_0-rmse:0.44418	validation_0-mean_absolute_error:0.30254
[89]	validation_0-rmse:0.43365	validation_0-mean_absolute_error:0.29612
[90]	validation_0-rmse:0.40963	validation_0-mean_absolute_error:0.28010
[91]	validation_0-rmse:0.38638	validation_0-mean_absolute_error:0.26281
[92]	validation_0-rmse:0.37208	validation_0-mean_absolute_error:0.25216
[93]	validation_0-rmse:0.36221	validation_0-mean_absolute_error:0.24535
[94]	validation_0-rmse:0.33923	validation_0-mean_absolute_error:0.23225
[95]	validation_0-rmse:0.33415	validation_0-mean_absolute_error:0.22625
[96]	validation_0-rmse:0.31783	validation_0-mean_absolute_error:0.21434
[97]	validation_0-rmse:0.30309	validation_0-mean_absolute_error:0.20348
[98]	validation_0-rmse:0.29466	validation_0-mean_absolute_error:0.19853
[99]	validation_0-rmse:0.28283	validation_0-mean_absolute_error:0.18969
XGBRegressor(base_score=0.5, booster='gbtree', callbacks=None,
         colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1,
         early_stopping_rounds=None, enable_categorical=False,
         eval_metric=<function mean_absolute_error at 0x0000020058394790>,
         gamma=0, gpu_id=-1, grow_policy='depthwise', importance_type=None,
         interaction_constraints='', learning_rate=0.300000012, max_bin=256,
         max_cat_to_onehot=4, max_delta_step=0, max_depth=6, max_leaves=0,
         min_child_weight=1, missing=nan, monotone_constraints='()',
         n_estimators=100, n_jobs=0, num_parallel_tree=1, predictor='auto',
         random_state=0, reg_alpha=0, reg_lambda=1, ...)</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class="sk-container" hidden><div class="sk-item"><div class="sk-estimator sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-1" type="checkbox" checked><label for="sk-estimator-id-1" class="sk-toggleable__label sk-toggleable__label-arrow">XGBRegressor</label><div class="sk-toggleable__content"><pre>XGBRegressor(base_score=0.5, booster=&#x27;gbtree&#x27;, callbacks=None,
         colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1,
         early_stopping_rounds=None, enable_categorical=False,
         eval_metric=&lt;function mean_absolute_error at 0x0000020058394790&gt;,
         gamma=0, gpu_id=-1, grow_policy=&#x27;depthwise&#x27;, importance_type=None,
         interaction_constraints=&#x27;&#x27;, learning_rate=0.300000012, max_bin=256,
         max_cat_to_onehot=4, max_delta_step=0, max_depth=6, max_leaves=0,
         min_child_weight=1, missing=nan, monotone_constraints=&#x27;()&#x27;,
         n_estimators=100, n_jobs=0, num_parallel_tree=1, predictor=&#x27;auto&#x27;,
         random_state=0, reg_alpha=0, reg_lambda=1, ...)</pre></div></div></div></div></div>

early_stopping_rounds (Optional[int]) –New in version 1.6.0.

Activates early stopping. Validation metric needs to improve at least once in every early_stopping_rounds round(s) to continue training. Requires at least one item in eval_set in fit().

The method returns the model from the last iteration (not the best one). If there’s more than one item in eval_set, the last entry will be used for early stopping. If there’s more than one metric in eval_metric, the last metric will be used for early stopping.

If early stopping occurs, the model will have three additional fields: best_score, best_iteration and best_ntree_limit.

Note:This parameter replaces early_stopping_rounds in fit() method.

early_stopping_rounds (Optional[int]) –版本1.6.0中新增加。

激活提前停止。验证指标需要 至少改进一个指标在每一轮early_stopping_rounds才能继续培训。要求改进在fit()方法eval_set中至少一项。

该方法返回最后一次迭代(不是最佳)的模型,如果eval_set中有多个项目,则最后一个项目将用于提前停止。如果eeval_metric中有多个指标,则最后一个指标将用于提前停止。

如果提前停止,模型将有三个附加字段:best_score, best_iteration and best_ntree_limit.

Note:This parameter replaces early_stopping_rounds in fit() method.
注意:此参数替换fit()方法中的early\u stopping\u rounds。

2.6参数第6部分

callbacks (Optional[List[TrainingCallback]]) –List of callback functions that are applied at end of each iteration. It is possible to use predefined callbacks by using Callback API.

Note:States in callback are not preserved during training, which means callback objects can not be reused for multiple training sessions without reinitialization or deepcopy.

callbacks (Optional[List[TrainingCallback]]) –在每次迭代结束时应用的回调函数列表。通过使用回调API,可以使用预定义的回调。

Note:回调中的状态在训练期间不会保留,这意味着在没有重新初始化或deepcopy的情况下,回调对象不能在多训练会话中重用。

for params in parameters_grid:   
    # be sure to (re)initialize the callbacks before each run   
    callbacks = [xgb.callback.LearningRateScheduler(custom_rates)]   
    xgboost.train(params, Xy, callbacks=callbacks)   

kwargs (dict, optional) –Keyword arguments for XGBoost Booster object. Full documentation of parameters can be found here. Attempting to set a parameter via the constructor args and * *kwargs dict simultaneously will result in a TypeError.

Note: ** kwargs unsupported by scikit-learn
** kwargs is unsupported by scikit-learn. We do not guarantee that parameters passed via this argument will interact properly with scikit-learn.

Note:Custom objective function
A custom objective function can be provided for the objective parameter.
In this case, it should have the signature objective(y_true, y_pred) -> grad, hess:

y_true: array_like of shape [n_samples]
The target values

y_pred: array_like of shape [n_samples]
The predicted values

grad: array_like of shape [n_samples]
The value of the gradient for each sample point.

hess: array_like of shape [n_samples]
The value of the second derivative for each sample point

kwargs (dict, optional) –XGBoost Booster对象的关键字参数。可在此处找到参数的完整文档。试图同时通过构造函数args和**kwargs dict设置参数将导致TypeError

Note: scikit-learn 不支持 ** kwargs
scikit-learn 不支持 ** kwargs . 我们不保证通过此方式传递的参数将与scikit learn正确交互。

Note:用户自定义目标函数
可以为参数objective提供自定义目标函数。
在这种情况下,它应该具有标志对象(y\u true,y\u pred)->grad,hess:

y_true: array_like of shape [n_samples]
目标值

y_pred: array_like of shape [n_samples]
预测值

grad: array_like of shape [n_samples]
每个采样点的梯度值

hess: array_like of shape [n_samples]
每个采样点的二阶导函数值

3.apply

apply(X, ntree_limit=0, iteration_range=None):

Return the predicted leaf every tree for each sample. If the model is trained with early stopping, then best_iteration is used automatically.

Parameters
X (array_like, shape=[n_samples, n_features]) – Input features matrix.

iteration_range (Optional[Tuple[int, int]]) – See predict().

ntree_limit (int) – Deprecated, use iteration_range instead.

Returns
X_leaves – For each datapoint x in X and for each tree, return the index of the leaf x ends up in. Leaves are numbered within [0; 2**(self.max_depth+1)), possibly with gaps in the numbering.

Return type
array_like, shape=[n_samples, n_trees]

apply(X, ntree_limit=0, iteration_range=None):

返回每个样本预测的每棵树的叶子。如果模型以提前停止的方式进行训练,则会自动使用best_iteration。
参数
X (array_like, shape=[n_samples, n_features]) – 输入特征矩阵

iteration_range (Optional[Tuple[int, int]]) –参考 predict()方法.

ntree_limit (int) –已弃用,使用 iteration_range 替代.

返回值
X_leaves – 对于X中的每个数据点x和每个树,返回结尾叶子x的索引。叶子编号在 (0; 2**(self.max_depth+1))范围内,可能在编号中有间隙。

返回类型
array_like, shape=[n_samples, n_trees]

4.property 属性

property best_iteration: int
The best iteration obtained by early stopping. This attribute is 0-based, for instance if the best iteration is the first round, then best_iteration is 0.

property best_score: float
The best score obtained by early stopping.

property coef_: numpy.ndarray
Coefficients property

Note;Coefficients are defined only for linear learners

Coefficients are only defined when the linear model is chosen as base learner (booster=gblinear). It is not defined for other base learner types, such as tree learners (booster=gbtree).

Returns
coef_

Return type
array of shape [n_features] or [n_classes, n_features]

property feature_importances_: numpy.ndarray
Feature importances property, return depends on importance_type parameter.

Returns
feature_importances_ (array of shape [n_features] except for multi-class)
linear model, which returns an array with shape (n_features, n_classes)

property feature_names_in_: numpy.ndarray
Names of features seen during fit(). Defined only when X has feature names that are all strings.

property best_iteration: int
通过提前停止训练获得的最佳迭代。该属性是基于0开始的,例如,如果最佳迭代是第一轮,则最佳迭代是0

property best_score: float
提前停止训练获得的最佳分数。

property coef_: numpy.ndarray
系数属性

Note;仅为线性学习器定义系数
仅当线性模型被选为基础学习器时(booster=gblinear),才定义系数。它没有为其他基本学习器类型定义,例如树学习器(booster=gbtree)。

返回值
coef_

返回类型
数组形状为[n_features] 或 [n_classes, n_features]

property feature_importances_: numpy.ndarray
特征重要性属性,返回取决于参数importance_type 。

返回值
feature_importances_ 形状[n_features]的数组,多类除外)
线性模型,返回具有形状 (n_features, n_classes)的数组

property feature_names_in_: numpy.ndarray
fit()期间看到的特征的名称。仅当X的特征名称均为字符串时定义

5.evals_result()

Return the evaluation results.

If eval_set is passed to the fit() function, you can call evals_result() to get evaluation results for all passed eval_sets. When eval_metric is also passed to the fit() function, the evals_result will contain the eval_metrics passed to the fit() function.

The returned evaluation result is a dictionary:

{‘validation_0’: {‘logloss’: [‘0.604835’, ‘0.531479’]},
‘validation_1’: {‘logloss’: [‘0.41965’, ‘0.17686’]}}

Return type evals_result

evals_result()

返回评估结果。

如果将eval_set 传递给fit()函数,则可以调用evals_result()来获取所有传递eval_sets的评估结果。如果同时将eval_metric 传递给fit()函数,则evals_result将包含传递给fit()函数的 eval_metrics。

返回评估结果是一个字典

{‘validation_0’: {‘logloss’: [‘0.604835’, ‘0.531479’]},
‘validation_1’: {‘logloss’: [‘0.41965’, ‘0.17686’]}}

返回类型 evals_result

Logo

CSDN联合极客时间,共同打造面向开发者的精品内容学习社区,助力成长!

更多推荐