【机器学习笔记】使用lightgbm画并保存Feature Importance

资料参考：1. Evaluate Feature Importance using Tree-based Model2. lgbm.fi.plot: LightGBM Feature Importance Plotting3. lightgbm官方文档前言基于树的模型可以用来评估特征的重要性。在本博客中，我将使用LightGBM中的GBDT模型来评估特性重要性的步骤。 ...

chestnut--

40294人浏览 · 2018-06-04 14:38:30

chestnut-- · 2018-06-04 14:38:30 发布

资料参考：
1. Evaluate Feature Importance using Tree-based Model
2. lgbm.fi.plot: LightGBM Feature Importance Plotting
3. lightgbm官方文档

前言

基于树的模型可以用来评估特征的重要性。在本博客中，我将使用LightGBM中的GBDT模型来评估特性重要性的步骤。 LightGBM是由微软发布的高精度和高速度梯度增强框架（一些测试表明LightGBM可以产生与XGBoost一样的准确预测，但速度可以提高25倍）。

首先，我们导入所需的软件包：用于数据预处理的pandas，用于GBDT模型的LightGBM以及用于构建功能重要性条形图的matplotlib。

import pandas as pd
import matplotlib.pylab as plt
import lightgbm as lgb

然后，我们需要加载和预处理训练数据。在这个例子中，我们使用预测性维护数据集。

# read data
train = pd.read_csv('E:\Data\predicitivemaintance_processed.csv')

# drop the columns that are not used for the model
train = train.drop(['Date', 'FailureDate'],axis=1)

# set the target column
target = 'FailNextWeek'

# One-hot encoding
feature_categorical = ['Model']
train = pd.get_dummies(train, columns=feature_categorical)

接下来，我们用训练数据训练GBDT模型：

lgb_params = {
    'boosting_type': 'gbdt',
    'objective': 'binary',
    'num_leaves': 30,
    'num_round': 360,
    'max_depth':8,
    'learning_rate': 0.01,
    'feature_fraction': 0.5,
    'bagging_fraction': 0.8,
    'bagging_freq': 12
}
lgb_train = lgb.Dataset(train.drop(target, 1), train[target])
model = lgb.train(lgb_params, lgb_train)

模型训练完成后，我们可以调用训练模型的plot_importance函数来获取特征的重要性。

plt.figure(figsize=(12,6))
lgb.plot_importance(model, max_num_features=30)
plt.title("Featurertances")
plt.show()

这里写图片描述

保存feature importance

booster = model.booster_
importance = booster.feature_importance(importance_type='split')
feature_name = booster.feature_name()
# for (feature_name,importance) in zip(feature_name,importance):
#     print (feature_name,importance) 
feature_importance = pd.DataFrame({'feature_name':feature_name,'importance':importance} )
feature_importance.to_csv('feature_importance.csv',index=False)

完美~

CSDN学习社区

CSDN联合极客时间，共同打造面向开发者的精品内容学习社区，助力成长！

更多推荐

嵌入式作业（七）：基于Ardunio的STM32串口通信

嵌入式作业（七）0作业要求1Ardunio 完成STM32的串口通信（1）安装Ardunio IDE（2）stm32串口通信2关于 stduino IDE0作业要求安装 Ardunio IDE 和相关软件支持库，在Ardunio 完成STM32板子的串口通信程序：（1）持续向串口输出“Hello world！”；（2）当接收到“stop!”时，停止输出。网上有一个国人版的MCU集成开发平台， st

CSDN学习社区

JDBC详解

JDBC文章目录JDBC什么是JDBC?JDBC驱动程序:Java使用JDBC访问数据库的步骤:设置classpath:Oracle连接字符串的书写格式:简单的例子:常用数据库的驱动程序及JDBC URL:Oracle数据库:SQL Server数据库MySQL数据库Access数据库PreparedStatement接口:JNDI-数据源（Data Source）与连接池（Connection

CSDN学习社区

“模式识别与机器学习”学习笔记no2.再谈感知机

接**上篇：上篇主要进行了PLA，Pocket算法的理论过程分析和在给定数据集上利用pocket算法对数据集进行分类学习，得到错分数量最少的分类面。上篇中pocket算法的过程已经进行了编程和测试，框架已经建立了起来，这一篇主要上篇中没有提到或涉及不深的几个问题。1.数据集的构造。上篇是直接使用了题目给的向量，这次来根据正态分布来产生数据集。np.random.normal函数可以根据均值和方差生