鸢尾花数据 用gridSearch做模型K近邻 svm 决策树 randomforest adaboost参数调优的练习
gridSearch做参数调优的练习 将K近邻 svm 决策树 randomforest adaboost模型的优化全都放进来代码实现参考了这篇文章 https://blog.csdn.net/weixin_41171061/article/details/83859856比较各种组合下分类效果最好的一个方案用鸢尾花数据#!/usr/bin/python# -*- coding...
gridSearch做参数调优的练习 将K近邻 svm 决策树 randomforest adaboost模型的优化全都放进来
代码实现参考了这篇文章 https://blog.csdn.net/weixin_41171061/article/details/83859856
比较各种组合下分类效果最好的一个方案
用鸢尾花数据
#!/usr/bin/python # -*- coding:utf-8 -*- from sklearn.model_selection import train_test_split from sklearn.neighbors import KNeighborsClassifier from sklearn.svm import SVC from sklearn.tree import DecisionTreeClassifier from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier from sklearn.model_selection import GridSearchCV from sklearn.metrics import accuracy_score from sklearn.metrics import auc import numpy as np import matplotlib as mpl import pandas as pd seed = 1231 np.random.seed(seed) names = ['KNearesNeighbors','SVC','Decision Tree', 'Random Forest','AdaBoostClassifier'] classifiers = [KNeighborsClassifier(),SVC(kernel='rbf'),DecisionTreeClassifier(), RandomForestClassifier(),AdaBoostClassifier(base_estimator=DecisionTreeClassifier())] parameter_knn = {'n_neighbors':[3,5]} parameter_svc={'C': np.logspace(-2, 2, 10), 'gamma': np.logspace(-2, 2, 10)} parameter_dtc = {'max_features': ['auto', 'sqrt', 'log2', None], 'max_depth': range(3, 100, 2)} parameter_rfc = {'n_estimators': range(5, 200, 40), 'max_features': ['auto', 'sqrt', 'log2', None], 'max_depth': range(3, 100, 20)} parameter_adb = {"base_estimator__criterion": ["gini", "entropy"], "base_estimator__splitter": ["best", "random"], "n_estimators": [3, 5, 7]} parameters = [parameter_knn,parameter_svc,parameter_dtc, parameter_rfc, parameter_adb] def gird_search_model(clf, param, name, x_train, y_train, x_test,y_test): # clf-classifier;param-parameter;name-classifier_name model = GridSearchCV(clf, param, cv=5) # GridSearchCV模型 fit = model.fit(x_train, y_train) # GridSearchCV模型拟合训练集数据,并返回训练器集合为fit y_train_pred = fit.best_estimator_.predict(x_train) # 用训练器集合中最好的estimator预测y_train_pred y_test_pred = fit.best_estimator_.predict(x_test) # 用训练器集合中最好的estimator预测y_test_pred print('MODEL : %r' % name) print('Best cv_test_roc_auc: %f using %s' % ( fit.best_score_, fit.best_params_)) # 训练器集合fit中最好的模型得到的:best_score和best_params train_score_list = [] test_score_list = [] score_list = [] model_metrics_name = [accuracy_score] # 模型评价指标,与scoreing相对应 for matrix in model_metrics_name: # 计算各个模型评价指标 train_score = matrix(y_train, y_train_pred) # 计算训练集的 test_score = matrix(y_test, y_test_pred) # 计算测试集的 train_score_list.append(train_score) # 把训练集的各个模型指标放在同一行 test_score_list.append(test_score) # 把测试集的各个模型指标放在同一行 score_list.append(train_score_list) # 合并训练集和测试集的结果(便于展示) score_list.append(test_score_list) # 合并训练集和测试集的结果(便于展示) score_df = pd.DataFrame(score_list, index=['train', 'test'], columns=['accuracy']) # 将结果显示为df格式,加上行列index print('EVALUATE_METRICS:') print(score_df) return score_list, y_train_pred, y_test_pred if __name__ == "__main__": mpl.rcParams['font.sans-serif'] = [u'SimHei'] mpl.rcParams['axes.unicode_minus'] = False iris_feature = '花萼长度', '花萼宽度', '花瓣长度', '花瓣宽度' path = 'iris.data' # 数据文件路径 data = pd.read_csv(path, header=None) x_prime = data.iloc[:,0:3] y = pd.Categorical(data[4]).codes train_score_list = [] test_score_list = [] x_train,x_test,y_train,y_test=train_test_split(x_prime, y, train_size=0.7, random_state=0) for clf,param,name in zip(classifiers,parameters,names): score_list,y_train_pred,y_test_pred =gird_search_model(clf,param,name,x_train,y_train,x_test,y_test) train_score_list.append(score_list[0]) test_score_list.append(score_list[1]) print('------------------------------------------------------------------------------------') train_score_df = pd.DataFrame(train_score_list,index=names,columns=['acc']) test_score_df = pd.DataFrame(test_score_list,index=names,columns=['acc']) print('TRAIN_SCORE:') print(train_score_df) print('TEST_SCORE:') print(test_score_df)
输出最优的参数及模型
MODEL : 'KNearesNeighbors'
Best cv_test_roc_auc: 0.961905 using {'n_neighbors': 5}
EVALUATE_METRICS:
accuracy
train 0.961905
test 0.977778
--------------------------------------------------------------------------------------------------------
MODEL : 'SVC'
Best cv_test_roc_auc: 0.980952 using {'C': 1.6681005372000592, 'gamma': 0.21544346900318834}
EVALUATE_METRICS:
accuracy
train 0.980952
test 0.955556
--------------------------------------------------------------------------------------------------------
MODEL : 'Decision Tree'
Best cv_test_roc_auc: 0.952381 using {'max_depth': 3, 'max_features': None}
EVALUATE_METRICS:
accuracy
train 0.961905
test 0.911111
--------------------------------------------------------------------------------------------------------
MODEL : 'Random Forest'
Best cv_test_roc_auc: 0.952381 using {'max_depth': 3, 'max_features': None, 'n_estimators': 85}
EVALUATE_METRICS:
accuracy
train 0.961905
test 0.911111
--------------------------------------------------------------------------------------------------------
MODEL : 'AdaBoostClassifier'
Best cv_test_roc_auc: 0.933333 using {'base_estimator__criterion': 'gini', 'base_estimator__splitter': 'best', 'n_estimators': 5}
EVALUATE_METRICS:
accuracy
train 1.000000
test 0.955556
--------------------------------------------------------------------------------------------------------
TRAIN_SCORE:
acc
KNearesNeighbors 0.961905
SVC 0.980952
Decision Tree 0.961905
Random Forest 0.961905
AdaBoostClassifier 1.000000
TEST_SCORE:
acc
KNearesNeighbors 0.977778
SVC 0.955556
Decision Tree 0.911111
Random Forest 0.911111
AdaBoostClassifier 0.955556
更多推荐
所有评论(0)