Scikit-Learn API

1.fit()

fit(X, y, *, sample_weight=None, base_margin=None, eval_set=None, eval_metric=None, early_stopping_rounds=None, verbose=True, xgb_model=None, sample_weight_eval_set=None, base_margin_eval_set=None, feature_weights=None, callbacks=None

Fit gradient boosting model.
Note that calling fit() multiple times will cause the model object to be re-fit from scratch. To resume training from a previous checkpoint, explicitly pass xgb_model argument.

拟合梯度提升模型
请注意,多次调用fit()将导致从头开始重新拟合模型对象。要从以前的检查点恢复训练,请显式传递xgb_model的参数。

Parameters

X (Any) – Feature matrix

y (Any) – Labels

sample_weight (Optional[Any]) – instance weights

base_margin (Optional[Any]) – global bias for each instance.

eval_set (Optional[Sequence[Tuple[Any, Any]]]) – A list of (X, y) tuple pairs to use as validation sets, for which metrics will be computed. Validation metrics will help us track the performance of the model.

eval_metric (str, list of str, or callable, optional) –Deprecated since version 1.6.0: Use eval_metric in init() or set_params() instead.

early_stopping_rounds (int) –Deprecated since version 1.6.0: Use early_stopping_rounds in init() or set_params() instead.

verbose (Optional[bool]) – If verbose and an evaluation set is used, writes the evaluation metric measured on the validation set to stderr.

xgb_model (Optional[Union[xgboost.core.Booster, xgboost.sklearn.XGBModel, str]]) – file name of stored XGBoost model or ‘Booster’ instance XGBoost model to be loaded before training (allows training continuation).

sample_weight_eval_set (Optional[Sequence[Any]]) – A list of the form [L_1, L_2, …, L_n], where each L_i is an array like object storing instance weights for the i-th validation set.

base_margin_eval_set (Optional[Sequence[Any]]) – A list of the form [M_1, M_2, …, M_n], where each M_i is an array like object storing base margin for the i-th validation set.

feature_weights (Optional[Any]) – Weight for each feature, defines the probability of each feature being selected when colsample is being used. All values must be greater than 0, otherwise a ValueError is thrown.

callbacks (Optional[Sequence[xgboost.callback.TrainingCallback]]) –Deprecated since version 1.6.0: Use callbacks in init() or set_params() instead.

Return type
xgboost.sklearn.XGBModel

参数

X (Any) –特征矩阵

y (Any) – 标签

sample_weight (Optional[Any]) – 实例权重

base_margin (Optional[Any]) – 每个实例的全局偏差

eval_set (Optional[Sequence[Tuple[Any, Any]]]) –要用作验证集的元组对(X,y)的列表,将为其计算评价指标。验证指标将帮助我们跟踪模型的性能。

eval_metric (str, list of str, or callable, optional) –从版本 1.6.0开始弃用: 使用 init() 中 eval_metric 或者 set_params() 替代.

early_stopping_rounds (int) –从版本 1.6.0开始弃用: 使用 init() 中 early_stopping_rounds in init() 或者 set_params() 替代.

verbose (Optional[bool]) –如果使用verbose和评估集,则将验证集上测量的评估指标写入stderr。

xgb_model (Optional[Union[xgboost.core.Booster, xgboost.sklearn.XGBModel, str]]) – 要在培训前加载存储的XGBoost模型或“Booster”XGBoost实例模型的文件名(允许继续培训)。

sample_weight_eval_set (Optional[Sequence[Any]]) –形式为[L_1,L_2,…,L_n]的列表,其中每个Li是一个类似于数组的对象,存储验证集第i个实例的权重。

base_margin_eval_set (Optional[Sequence[Any]]) – 形式为[M_1,M_2,…,M_n]的列表,其中每个M_i是一个类似数组的对象,存储第i个验证集的基本边际。

feature_weights (Optional[Any]) – 每个特征的权重,使用colsample(列采样)时定义选择每个特征的概率。所有值都必须大于0,否则将引发ValueError。

callbacks (Optional[Sequence[xgboost.callback.TrainingCallback]])–从版本1.6.0开始弃用:使用 init()中callbacks 或者 set_params() 替代.

返回类型
xgboost.sklearn.XGBModel

get_booster()
Get the underlying xgboost Booster of this model.
This will raise an exception when fit was not called

Returns
booster

Return type
a xgboost booster of underlying model

get_num_boosting_rounds()
Gets the number of xgboost boosting rounds.

Return type
int

get_params(deep=True)
Get parameters.

Parameters
deep (bool) –

Return type
Dict[str, Any]

get_xgb_params()
Get xgboost specific parameters.

Return type
Dict[str, Any]

get_booster()
获取此模型的基础xgboost提升器。
这将在未调用fit时引发异常

返回
booster(提升器)

返回类型
基础模型的xgboost提升器

get_num_boosting_rounds()
获取xgboost增强回合数。

返回类型
整型

get_params(deep=True)
获得参数

参数
deep (布尔类型) –

返回类型
Dict[str, Any]

get_xgb_params()
获取xgboost特定参数。

返回类型
Dict[str, Any]

property intercept_: numpy.ndarray
Intercept (bias) property

Note
Intercept is defined only for linear learners

Intercept (bias) is only defined when the linear model is chosen as base learner (booster=gblinear). It is not defined for other base learner types, such as tree learners (booster=gbtree).

Returns
intercept_

Return type
array of shape (1,) or [n_classes]

property intercept_: numpy.ndarray
截距(偏差)属性

注意
仅为线性学习器定义截距

仅当线性模型被选为基础学习器时(助推器=gblinear),才定义截距(偏差)。它没有为其他类型基本学习器定义,例如树学习者(booster=gbtree)。

返回
intercept_

返回类型
array of shape (1,) or [n_classes]

2.load_model(fname)

load_model(fname)
Load the model from a file or bytearray. Path to file can be local or as an URI.

The model is loaded from XGBoost format which is universal among the various XGBoost interfaces. Auxiliary attributes of the Python Booster object (such as feature_names) will not be loaded when using binary format. To save those attributes, use JSON/UBJ instead. See Model IO for more info.

model.load_model(“model.json”)
or
model.load_model(“model.ubj”)

Parameters
fname (Union[str, bytearray, os.PathLike]) – Input file name or memory buffer(see also save_raw)

Return type
None

load_model(fname)
从文件或bytearray加载模型。文件路径可以是本地路径,也可以是URI路径

该模型从XGBoost格式加载,该格式在各种XGBoost接口中是通用的。 使用二进制格式时,不会加载Python Booster对象的辅助属性(例如feature_names)。要保存这些属性,请改用JSON/UBJ。有关更多信息,请参阅Model IO。

model.load_model(“model.json”)
or
model.load_model(“model.ubj”)

参数
fname (Union[str, bytearray, os.PathLike]) – 输入文件名或内存缓冲区(另请参阅save_raw)

返回类型
None

propertyn_features_in_: int
Number of features seen during fit().
fit()期间看到的特征数。

3.predict()

predict(X, output_margin=False, ntree_limit=None, validate_features=True, base_margin=None, iteration_range=None)
Predict with X. If the model is trained with early stopping, then best_iteration is used automatically. For tree models, when data is on GPU, like cupy array or cuDF dataframe and predictor is not specified, the prediction is run on GPU automatically, otherwise it will run on CPU.

Note:This function is only thread safe for gbtree and dart.

Parameters
X (Any) – Data to predict with.

output_margin (bool)– Whether to output the raw untransformed margin value.

ntree_limit (Optional[int]) – Deprecated, use iteration_range instead.

validate_features (bool) – When this is True, validate that the Booster’s and data’s feature_names are identical. Otherwise, it is assumed that the feature_names are the same.

base_margin (Optional[Any]) – Margin added to prediction.

iteration_range (Optional[Tuple[int, int]]) –Specifies which layer of trees are used in prediction. For example, if a random forest is trained with 100 rounds. Specifying iteration_range=(10, 20), then only the forests built during [10, 20) (half open set) rounds are used in this prediction. New in version 1.4.0.

Return type prediction

predict()

predict(X, output_margin=False, ntree_limit=None, validate_features=True, base_margin=None, iteration_range=None)
使用X进行预测。如果训练模型使用提前停止,则自动使用best_iteration(最佳迭代)。对于树模型,当数据位于GPU上时,如cupy数组或cuDF数据框,并且未指定预测器,则预测将自动在GPU上运行,否则将在CPU上运行。
注意:此函数仅对gbtree和dart是线程安全的。

参数
X (Any) – 要预测的数据.

output_margin (bool)– 是否输出原始未转换的边距值.

ntree_limit (Optional[int]) – 已经弃用, 使用 iteration_range 代替

validate_features (bool) – 当选择真时,验证提升器和数据的feature_names 是否相同。否则,假定feature_names 相同。

base_margin (Optional[Any]) – 添加到预测的边距.

iteration_range (Optional[Tuple[int, int]]) –指定在预测中使用哪一层树。例如,如果随机森林训练了100轮。指定iteration\u range=(10,20),则此预测中仅使用在[10,20)(半开集)轮中构建的林。版本1.4.0中新增。

返回类型 预测值

4.predict_proba()

predict_proba(X, ntree_limit=None, validate_features=True, base_margin=None, iteration_range=None)
Predict the probability of each X example being of a given class.

Note
This function is only thread safe for gbtree and dart.

Parameters
X (array_like) – Feature matrix.

ntree_limit (int) – Deprecated, use iteration_range instead.

validate_features (bool) – When this is True, validate that the Booster’s and data’s feature_names are identical. Otherwise, it is assumed that the feature_names are the same.

base_margin (array_like) – Margin added to prediction.

**iteration_range (Optional[Tuple[int, int]]) **– Specifies which layer of trees are used in prediction. For example, if a random forest is trained with 100 rounds. Specifying iteration_range=(10, 20), then only the forests built during [10, 20) (half open set) rounds are used in this prediction.

Returns
a numpy array of shape array-like of shape (n_samples, n_classes) with the probability of each data example being of a given class.

Return type prediction

predict_proba(X, ntree_limit=None, validate_features=True, base_margin=None, iteration_range=None)
预测X中每个示例属于给定类别的概率。

注意:此函数仅对gbtree和dart是线程安全的。

参数
X (array_like) – 特征矩阵

ntree_limit (int) –已经弃用, 使用 iteration_range 代替

validate_features (bool) –当选择真时,验证提升器和数据的feature_names 是否相同。否则,假定feature_names 相同。

base_margin (array_like) – 添加到预测的边距.

**iteration_range (Optional[Tuple[int, int]]) **–指定在预测中使用哪一层树。例如,如果随机森林训练了100轮。指定iteration\u range=(10,20),则此预测中仅使用在[10,20)(半开集)轮中构建的林。版本1.4.0中新增。

返回值
一个numpy 数组。类似于形状(n个样本,n\个类),每个数据示例属于给定类的概率。

返回类型 预测值

5.save_model(fname)

save_model(fname)

Save the model to a file.

The model is saved in an XGBoost internal format which is universal among the various XGBoost interfaces. Auxiliary attributes of the Python Booster object (such as feature_names) will not be saved when using binary format. To save those attributes, use JSON/UBJ instead. See Model IO for more info.

model.save_model(“model.json”)
or
model.save_model(“model.ubj”)
Parameters
fname (string or os.PathLike) – Output file name

Return type
None

save_model(fname)

把模型保存到一个文件中

该模型以XGBoost内部格式保存,该格式在各种XGBoost接口中通用。使用二进制格式时,不会保存Python Booster对象的辅助属性(例如feature_names)。要保存这些属性,请改用JSON/UBJ替代。有关更多信息,请参阅模型IO。

model.save_model(“model.json”)
or
model.save_model(“model.ubj”)
参数
fname (string or os.PathLike) – 输出的文件名

返回类型
None

6.score(X, y, sample_weight=None)

score(X, y, sample_weight=None)

Return the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters
X (array-like of shape (n_samples, n_features))– Test samples.

y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – True labels for X.

sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.

Returns
score – Mean accuracy of self.predict(X) wrt. y.

Return type
float

score(X, y, sample_weight=None)

返回给定测试数据和标签的平均准确(性)

在多标签分类中,这是子集准确度,这是一个苛刻的指标,因为对于每个样本,需要正确预测它属于的标签集。

参数
X (array-like of shape (n_samples, n_features))– 测试用例.

y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – X的真实标签

sample_weight (array-like of shape (n_samples,), default=None) – 样本权重.

Returns
score – 模型预测(X) 的属性相比于真实 y准确性的均值.

Return type
浮点型

Logo

CSDN联合极客时间,共同打造面向开发者的精品内容学习社区,助力成长!

更多推荐