算法流程
  1. 计算待测样品与训练集里每个样品x的角度距离
  2. 角度距离最大的就是所属的样品类别
算法实现

计算夹角余弦

def anglecos(x_train,y_train,sample):
    """
    :function 按照夹角余弦距离法计算待测样品与样品库中的相似度
    :param x_train: 训练集 M*N  M为样本个数 N为特征个数
    :param y_train: 训练集标签 1*M
    :param sample: 待识别样品
    :return: 返回判断类别
    """
    label = 0
    disMax = -1*np.inf
    for i,train in enumerate(x_train):
        dis = np.sum(train*sample)/(np.sqrt(np.sum(train*train)*np.sum(sample*sample)))
        if disMax<dis:
            disMax = dis
            label = y_train[i]
    return label

测试代码

from sklearn import datasets
from Include.chapter3 import function
import numpy as np

#读取数据
digits = datasets.load_digits()
x , y = digits.data,digits.target

#划分数据集
x_train, y_train, x_test, y_test = function.train_test_split(x,y)
testId = np.random.randint(0, x_test.shape[0])
sample = x_test[testId, :]


ans = function.anglecos(x_train,y_train,sample)
print(ans==y_test[testId])
算法结果
True
Logo

CSDN联合极客时间,共同打造面向开发者的精品内容学习社区,助力成长!

更多推荐