4.3 线性回归的改进—岭回归
文章目录4.3.1 带有L2正则化的线性回归-岭回归1. API2. 观察正则化程度的变化,对结果的影响?3. 波士顿房价预测4.3.1 带有L2正则化的线性回归-岭回归岭回归,其实也是一种线性回归。只不过在算法建立回归方程时候,加上L2正则化的限制,从而达到解决过拟合的效果1. APIsklearn.linear_model.Ridge(alpha=1.0, fit_intercept=True
·
4.3.1 带有L2正则化的线性回归-岭回归
岭回归,其实也是一种线性回归。只不过在算法建立回归方程时候,加上L2正则化的限制,从而达到解决过拟合的效果
1. API
sklearn.linear_model.Ridge(alpha=1.0, fit_intercept=True,solver="auto", normalize=False)
- 具有L2正则化的线性回归
- alpha:正则化力度=惩罚项系数,也叫 λ ,λ取值:0~1 ,1~10
- solver:会根据数据自动选择优化方法
sag:如果数据集、特征都比较大,选择该随机梯度下降优化 - normalize:数据是否进行标准化
normalize=False:可以在fit之前调用preprocessing.StandardScaler标准化数据 - Ridge.coef_:回归权重
- Ridge.intercept_:回归偏置
All last four solvers support both dense and sparse data. However, only ‘sag’ supports sparse input when
fit_intercept
is True.
Ridge方法相当于SGDRegressor(penalty=‘l2’, loss=“squared_loss”),只不过
SGDRegressor实现了一个普通的随机梯度下降学习,推荐使用Ridge(实现了SAG)
sklearn.linear_model.RidgeCV(_BaseRidgeCV, RegressorMixin)
- 具有L2正则化的线性回归,可以进行交叉验证
- coef_:回归系数
class _BaseRidgeCV(LinearModel):
def __init__(self, alphas=(0.1, 1.0, 10.0),
fit_intercept=True, normalize=False, scoring=None,
cv=None, gcv_mode=None,
store_cv_values=False):
2. 观察正则化程度的变化,对结果的影响?
- 正则化力度越大(从右往左),权重系数会越小
- 正则化力度越小(从左往右),权重系数会越大
3. 岭回归波士顿房价预测
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import Ridge
from sklearn.metrics import mean_squared_error
# 1.获取数据
boston = load_boston()
print("特征数量:\n",boston.data.shape)
# 2.划分数据集
x_train,x_test,y_train,y_test = train_test_split(boston.data,boston.target,random_state=22)
# 3.标准化
transfer = StandardScaler()
x_train = transfer.fit_transform(x_train)
x_test = transfer.transform(x_test)
# 岭回归对波士顿放假的预测优化方法
# 4.预估器
estimator = Ridge()
estimator.fit(x_train,y_train)
# 5.得出模型
print("岭回归权重系数为:\n",estimator.coef_)
print("岭回归偏置为:\n",estimator.intercept_)
# 6.模型评估
y_predict = estimator.predict(x_test)
print("预测房价:\n",y_predict)
error = mean_squared_error(y_test,y_predict)
print("岭回归的均方误差为:\n",error)
输出结果:
特征数量:
(506, 13)
岭回归权重系数为:
[-0.63591916 1.12109181 -0.09319611 0.74628129 -1.91888749 2.71927719
-0.08590464 -3.25882705 2.41315949 -1.76930347 -1.74279405 0.87205004
-3.89758657]
岭回归偏置为:
22.62137203166228
预测房价:
[28.22119941 31.49858594 21.14690941 32.64962343 20.03976087 19.07187629
21.11827061 19.61935024 19.64669848 32.83666525 21.01034708 27.47939935
15.55875601 19.80406014 36.86415472 18.79442579 9.42343608 18.5205955
30.67129766 24.30659711 19.07820077 34.08772738 29.77396117 17.50394928
34.87750492 26.52508961 34.65566473 27.42939944 19.08639183 15.04854291
30.84974343 15.76894723 37.18814441 7.81864035 16.27847433 17.15510852
7.46590141 19.98474662 40.55565604 28.96103939 25.25570196 17.7598197
38.78171653 6.87935126 21.76805062 25.25888823 20.47319256 20.48808719
17.24949519 26.11755181 8.61005188 27.47070495 30.57806886 16.57080888
9.42312214 35.50731907 32.20467352 21.93128073 17.62011278 22.08454636
23.50121152 24.08248876 20.16840581 38.47001591 24.69276673 19.7638548
13.96547058 6.76070715 42.04033544 21.9237625 16.88030656 22.60637682
40.74664535 21.44631815 36.86936185 27.17135794 21.09470367 20.40689317
25.35934079 22.35676321 31.1513028 20.39303322 23.99948991 31.54251155
26.77734347 20.89368871 29.05880401 22.00850263 26.31965286 20.04852734
25.46476799 24.08084537 19.90846889 16.47030743 15.27936372 18.39475348
24.80822272 16.62280764 20.86393724 26.70418608 20.74534996 17.89544942
24.25949423 23.35743497 21.51817773 36.76202304 15.90293344 21.52915882
32.78684766 33.68666117 20.61700911 26.78345059 22.72685584 17.40478038
21.67136433 21.6912557 27.66684993 25.08825085 23.72539867 14.64260535
15.21105331 3.81916568 29.16662813 20.67913144 22.33386579 28.01241753
28.531445 ]
岭回归的均方误差为:
20.656448214354967
改变岭回归参数
estimator = Ridge(alpha=0.5,max_iter=1000)
estimator.fit(x_train,y_train)
更多推荐
已为社区贡献2条内容
所有评论(0)