线性回归的改进-岭回归

1 APIsklearn.linear_model.Ridge(alpha=1.0, fit_intercept=True,solver="auto", normalize=False)具有l2正则化的线性回归alpha:正则化力度，也叫 λλ取值：0~1 1~10solver:会根据数据自动选择优化方法sag:如果数据集、特征都比较大，选择该随机梯度下降优化normalize:数据是否进行标准化

缘源园

489人浏览 · 2021-04-18 19:59:47

缘源园 · 2021-04-18 19:59:47 发布

1 API

sklearn.linear_model.Ridge(alpha=1.0, fit_intercept=True,solver="auto", normalize=False)

具有l2正则化的线性回归
alpha:正则化力度，也叫 λ
- λ取值：0~1 1~10
solver:会根据数据自动选择优化方法
- sag:如果数据集、特征都比较大，选择该随机梯度下降优化
normalize:数据是否进行标准化
- normalize=False:可以在fit之前调用preprocessing.StandardScaler标准化数据
Ridge.coef_:回归权重
Ridge.intercept_:回归偏置

Ridge方法相当于SGDRegressor(penalty='l2', loss="squared_loss"),只不过SGDRegressor实现了一个普通的随机梯度下降学习，推荐使用Ridge(实现了SAG)

sklearn.linear_model.RidgeCV(_BaseRidgeCV, RegressorMixin)

具有l2正则化的线性回归，可以进行交叉验证
coef_:回归系数

class _BaseRidgeCV(LinearModel):
    def __init__(self, alphas=(0.1, 1.0, 10.0),
                 fit_intercept=True, normalize=False,scoring=None,
                 cv=None, gcv_mode=None,
                 store_cv_values=False):

2 观察正则化程度的变化，对结果的影响？

正则化力度越大，权重系数会越小
正则化力度越小，权重系数会越大

3 波士顿房价预测

def linear_model3():
    """
    线性回归:岭回归
    :return:
    """
    # 1.获取数据
    data = load_boston()

    # 2.数据集划分
    x_train, x_test, y_train, y_test = train_test_split(data.data, data.target, random_state=22)

    # 3.特征工程-标准化
    transfer = StandardScaler()
    x_train = transfer.fit_transform(x_train)
    x_test = transfer.fit_transform(x_test)

    # 4.机器学习-线性回归(岭回归)
    estimator = Ridge(alpha=1)
    # estimator = RidgeCV(alphas=(0.1, 1, 10))
    estimator.fit(x_train, y_train)

    # 5.模型评估
    # 5.1 获取系数等值
    y_predict = estimator.predict(x_test)
    print("预测值为:\n", y_predict)
    print("模型中的系数为:\n", estimator.coef_)
    print("模型中的偏置为:\n", estimator.intercept_)

    # 5.2 评价
    # 均方误差
    error = mean_squared_error(y_test, y_predict)
    print("误差为:\n", error)

'''
预测值为:
 [28.13514381 31.28742806 20.54637256 31.45779505 19.05568933 18.26035004
 20.59277879 18.46395399 18.49310689 32.89149735 20.38916336 27.19539571
 14.82641534 19.22385973 36.98699955 18.29852297  7.78481347 17.58930015
 30.19228148 23.61186682 18.14688039 33.81334203 28.44588593 16.97492092
 34.72357533 26.19400705 34.77212916 26.62689656 18.63066492 13.34246426
 30.35128911 14.59472585 37.18259957  8.93178571 15.10673508 16.1072542
  7.22299512 19.14535184 39.53308652 28.26937936 24.62676357 16.76310494
 37.85719041  5.71249289 21.17777272 24.60640023 18.90197753 19.95020929
 15.1922374  26.27853095  7.55102357 27.10160025 29.17947182 16.275476
  8.02888564 35.42165713 32.28262473 20.9525814  16.43494393 20.88177884
 22.92764493 23.58271167 19.35870763 38.27704421 23.98459232 18.96691367
 12.66552625  6.122414   41.44033214 21.09214394 16.23412117 21.51649375
 40.72274345 20.53192898 36.78646575 27.01972904 19.91315009 19.66906691
 24.59629369 21.2589005  30.93402996 19.33386041 22.3055747  31.07671682
 26.39230161 20.24709071 28.79113538 20.85968277 26.04247756 19.25344252
 24.9235031  22.29606909 18.94734935 18.83346051 14.09641763 17.43434945
 24.16599713 15.86179766 20.05792005 26.51141362 20.11472351 17.03501767
 23.83611956 22.82305362 20.88305157 36.10592864 14.72050619 20.67225818
 32.43628539 33.17614341 19.8129561  26.45401305 20.97734485 16.47095097
 20.76417338 20.58558754 26.85985053 24.18030055 23.22217136 13.7919355
 15.38830634  2.78927979 28.87941047 19.80046894 21.50479706 27.53668749
 28.48598562]
模型中的系数为:
 [-0.63591916  1.12109181 -0.09319611  0.74628129 -1.91888749  2.71927719
 -0.08590464 -3.25882705  2.41315949 -1.76930347 -1.74279405  0.87205004
 -3.89758657]
模型中的偏置为:
 22.62137203166228
误差为:
 20.064724392806898

'''