4.3.1 带有L2正则化的线性回归-岭回归

岭回归,其实也是一种线性回归。只不过在算法建立回归方程时候,加上L2正则化的限制,从而达到解决过拟合的效果

1. API

sklearn.linear_model.Ridge(alpha=1.0, fit_intercept=True,solver="auto", normalize=False)

  • 具有L2正则化的线性回归
  • alpha:正则化力度=惩罚项系数,也叫 λ ,λ取值:0~1 ,1~10
  • solver:会根据数据自动选择优化方法
    sag:如果数据集、特征都比较大,选择该随机梯度下降优化
  • normalize:数据是否进行标准化
    normalize=False:可以在fit之前调用preprocessing.StandardScaler标准化数据
  • Ridge.coef_:回归权重
  • Ridge.intercept_:回归偏置

All last four solvers support both dense and sparse data. However, only ‘sag’ supports sparse input when fit_intercept is True.

Ridge方法相当于SGDRegressor(penalty=‘l2’, loss=“squared_loss”),只不过
SGDRegressor实现了一个普通的随机梯度下降学习,推荐使用Ridge(实现了SAG)

sklearn.linear_model.RidgeCV(_BaseRidgeCV, RegressorMixin)

  • 具有L2正则化的线性回归,可以进行交叉验证
  • coef_:回归系数
class _BaseRidgeCV(LinearModel): 
def __init__(self, alphas=(0.1, 1.0, 10.0), 
			fit_intercept=True, normalize=False, scoring=None, 
			cv=None, gcv_mode=None, 
			store_cv_values=False):

2. 观察正则化程度的变化,对结果的影响?

在这里插入图片描述

  • 正则化力度越大(从右往左),权重系数会越小
  • 正则化力度越小(从左往右),权重系数会越大

3. 岭回归波士顿房价预测

from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import Ridge
from sklearn.metrics import mean_squared_error

# 1.获取数据
boston = load_boston()
print("特征数量:\n",boston.data.shape)

# 2.划分数据集
x_train,x_test,y_train,y_test = train_test_split(boston.data,boston.target,random_state=22)

# 3.标准化
transfer = StandardScaler()
x_train = transfer.fit_transform(x_train)
x_test = transfer.transform(x_test)

# 岭回归对波士顿放假的预测优化方法
# 4.预估器
estimator = Ridge()
estimator.fit(x_train,y_train)

# 5.得出模型
print("岭回归权重系数为:\n",estimator.coef_)
print("岭回归偏置为:\n",estimator.intercept_)


# 6.模型评估
y_predict = estimator.predict(x_test)
print("预测房价:\n",y_predict)
error = mean_squared_error(y_test,y_predict)
print("岭回归的均方误差为:\n",error)

输出结果:

特征数量:
 (506, 13)
岭回归权重系数为:
 [-0.63591916  1.12109181 -0.09319611  0.74628129 -1.91888749  2.71927719
 -0.08590464 -3.25882705  2.41315949 -1.76930347 -1.74279405  0.87205004
 -3.89758657]
岭回归偏置为:
 22.62137203166228
预测房价:
 [28.22119941 31.49858594 21.14690941 32.64962343 20.03976087 19.07187629
 21.11827061 19.61935024 19.64669848 32.83666525 21.01034708 27.47939935
 15.55875601 19.80406014 36.86415472 18.79442579  9.42343608 18.5205955
 30.67129766 24.30659711 19.07820077 34.08772738 29.77396117 17.50394928
 34.87750492 26.52508961 34.65566473 27.42939944 19.08639183 15.04854291
 30.84974343 15.76894723 37.18814441  7.81864035 16.27847433 17.15510852
  7.46590141 19.98474662 40.55565604 28.96103939 25.25570196 17.7598197
 38.78171653  6.87935126 21.76805062 25.25888823 20.47319256 20.48808719
 17.24949519 26.11755181  8.61005188 27.47070495 30.57806886 16.57080888
  9.42312214 35.50731907 32.20467352 21.93128073 17.62011278 22.08454636
 23.50121152 24.08248876 20.16840581 38.47001591 24.69276673 19.7638548
 13.96547058  6.76070715 42.04033544 21.9237625  16.88030656 22.60637682
 40.74664535 21.44631815 36.86936185 27.17135794 21.09470367 20.40689317
 25.35934079 22.35676321 31.1513028  20.39303322 23.99948991 31.54251155
 26.77734347 20.89368871 29.05880401 22.00850263 26.31965286 20.04852734
 25.46476799 24.08084537 19.90846889 16.47030743 15.27936372 18.39475348
 24.80822272 16.62280764 20.86393724 26.70418608 20.74534996 17.89544942
 24.25949423 23.35743497 21.51817773 36.76202304 15.90293344 21.52915882
 32.78684766 33.68666117 20.61700911 26.78345059 22.72685584 17.40478038
 21.67136433 21.6912557  27.66684993 25.08825085 23.72539867 14.64260535
 15.21105331  3.81916568 29.16662813 20.67913144 22.33386579 28.01241753
 28.531445  ]
岭回归的均方误差为:
 20.656448214354967

改变岭回归参数

estimator = Ridge(alpha=0.5,max_iter=1000)
estimator.fit(x_train,y_train)
Logo

CSDN联合极客时间,共同打造面向开发者的精品内容学习社区,助力成长!

更多推荐