使用KNN算法实现手写数字识别
1.文本文件数据等等2.将其3232的二进制图像转换为11024的向量3.测试算法#!/usr/bin/env python# -*- coding: UTF-8 -*-'''=================================================@Project -> File:KNN -> kNN@IDE:PyCharm@Author :zgq@Date:
·
1.文本文件数据
等等
2.将其3232的二进制图像转换为11024的向量
3.测试算法
#!/usr/bin/env python
# -*- coding: UTF-8 -*-
'''=================================================
@Project -> File :KNN -> kNN
@IDE :PyCharm
@Author :zgq
@Date :2021/1/7 14:15
@Desc :
=================================================='''
from numpy import *
import operator #运算符模块
import matplotlib
import matplotlib.pyplot as plt
from os import listdir
def classify0(inX,dataSet,labels,k):
#inx用于分类的输入向量
#训练样本集dataset
#lables标签
#k最近邻数目
#距离计算
dataSetSize=dataSet.shape[0] #dataset有几行
diffMat=tile(inX,(dataSetSize,1))-dataSet #输入向量重复了已有数据集的行数,一起减掉,出来一个新的矩阵,每个数字都记录当前新样本该维度与每个样本差值
sqDiffMat=diffMat**2
sqDistances=sqDiffMat.sum(axis=1) #所有横轴元素加和
distances=sqDistances**0.5 #到此处时 distance为一个一位列数组,记录每条样本与新样本的距离
sortedDistIndicies= distances.argsort() #对distance进行升序排序
classCount={} #DICT类型
for i in range(k): #寻找距离最小的K个点
voteIlabel = labels[sortedDistIndicies[i]] #返回距离排序中前K条数据的标签
classCount[voteIlabel]=classCount.get(voteIlabel,0)+1
#classCount.get(voteIlabel,0) 字典获取vouteIlabel值,没有的话返回0
#此处for循环将距离最近的K个数据标签进行统计:每次for循环第一步,将第i个标签记录到voteIlable中,第二部将该标签出现后再dict中次数加一
sortedClassCount=sorted(classCount.items(),key=operator.itemgetter(1),reverse=True)
return sortedClassCount[0][0]
#将img数据转换为向量
def img2vector(filename):
returnVect=zeros((1,1024))
fr=open(filename)
for i in range(32):
lineStr=fr.readline()
for j in range(32):
returnVect[0,32*i+j]=int(lineStr[j])
return returnVect
#手写数字识别系统的测试代码
def handwritingClassTest():
hwLabels=[]
trainingFileList=listdir('trainingDigits') #listdir可以列出给定目录的文件名
m=len(trainingFileList)
trainingMat=zeros((m,1024))
for i in range(m):
fileNameStr=trainingFileList[i] #获取当前第i个文件名
fileStr=fileNameStr.split('.')[0] #先用点来切分,切分为0_0和txt [0_0,txt]取第0项
classNumStr=int(fileStr.split('_')[0])
hwLabels.append(classNumStr) #将所有的标签按照顺序添加到了hwLables中
trainingMat[i,:]=img2vector('trainingDigits/%s' % fileNameStr) #顺便将每一个文件都转为向量存入trainingMat中
testFileList=listdir('testDigits') #将测试文件的名字作为列表给予testFilelist
errorCount=0.0
mTest=len(testFileList) #取test的集的总数
for i in range(mTest):
fileNameStr=testFileList[i]
fileStr=fileNameStr.split('.')[0]
classNumStr=int(fileStr.split('_')[0])
vectorUnderTest=img2vector('testDigits/%s' % fileNameStr) #拿出一条测试集数据构成测试向量
classifierResult=classify0(vectorUnderTest,trainingMat,hwLabels,3) #此处训练集样本和标签行数是对齐的
print("the classifier came back with : %d,the real answer is : %d" %(classifierResult,classNumStr))
if (classifierResult!=classNumStr):
errorCount=errorCount+1.0
print("\n the total number of errors is :%d" % errorCount)
print("\n the total error rate is: %f" % (errorCount/float(mTest)))
handwritingClassTest()
测试结果:
the classifier came back with : 0,the real answer is : 0
the classifier came back with : 0,the real answer is : 0
the classifier came back with : 0,the real answer is : 0
the classifier came back with : 0,the real answer is : 0
the classifier came back with : 0,the real answer is : 0
the classifier came back with : 0,the real answer is : 0
……
the classifier came back with : 1,the real answer is : 1
the classifier came back with : 7,the real answer is : 1
the classifier came back with : 1,the real answer is : 1
……
the classifier came back with : 8,the real answer is : 8
the classifier came back with : 6,the real answer is : 8
the classifier came back with : 8,the real answer is : 8
the classifier came back with : 8,the real answer is : 8
the classifier came back with : 8,the real answer is : 8
the classifier came back with : 8,the real answer is : 8
the classifier came back with : 8,the real answer is : 8
the classifier came back with : 8,the real answer is : 8
the classifier came back with : 8,the real answer is : 8
the classifier came back with : 8,the real answer is : 8
the classifier came back with : 8,the real answer is : 8
the classifier came back with : 8,the real answer is : 8
the classifier came back with : 8,the real answer is : 8
the classifier came back with : 8,the real answer is : 8
the classifier came back with : 3,the real answer is : 8
the classifier came back with : 8,the real answer is : 8
……
the total number of errors is :10
the total error rate is: 0.010571
更多推荐
所有评论(0)