kNN算法实例（约会对象喜好预测和手写识别）

import numpy as np

import operator

import random

import os

def file2matrix(filePath):#从文本中提取特征矩阵和标签

    f = open(filePath,'r+').readlines()

    fileLength = len(f)

    dataSet = np.zeros((fileLength,3),np.float64)

    labelList = []

    for i in range(fileLength):

        row = f[i].split('\t')

        dataSet[i,:] = row[0:3]

        labelList.append(row[-1].strip('\n'))

    return dataSet,labelList

def autoNormal(data):#归一化处理

    dataShape = data.shape

    dataMin = data.min(0)

    dataMax = data.max(0)

    normalDataSet = np.zeros(dataShape,np.float64)

    diff = dataMax - dataMin

    normalDataSet = (data -np.tile(dataMin,(dataShape[0],1)))/np.tile(diff,(dataShape[0],1))

    return normalDataSet,diff,dataMin

def dataClassTest(dataSet,labelList):#测试算法准确率

    ratio = 0.1

    correntCount = 0

    testNumber = int(ratio*dataSet.shape[0])

    for i in range(testNumber):

        k = random.randint(0, dataSet.shape[0])

        label = classify0(dataSet[k],dataSet,labelList,20)

        if label == labelList[k]:

            correntCount += 1

    return correntCount*100/testNumber

def classifyPerson():#输入数据进行预测

    dataSet,labelSet = file2matrix('datingTestSet.txt')

    percentTats = float(input('Please input percentage of time spend playing video games?'))

    miles = float(input('Please input frequent flier miles earned per year?'))

    cream = float(input('Please input liters of ice cream consumed per year?'))

    dataSet,diff,dataMin = autoNormal(dataSet)

    intX = np.array([percentTats,miles,cream],np.float64)

    label = classify0((intX-dataMin)/diff,dataSet,labelSet,20)

    print("You likely {0} the man！".format(label))

    correntPercent = dataClassTest(dataSet,labelSet)

    print("The estimate corrent percent is {0}%！".format(correntPercent))

def classify0(intX,dataSet,labelSet,k):#kNN分类算法

    intX = np.tile(intX,(dataSet.shape[0],1))

    square = (intX - dataSet)**2

    sum = square.sum(axis=1)

    sqrt = sum**0.5

    sortedDistIndicies = sqrt.argsort()

    classCount={}

    for i in range(k):

        label = labelSet[sortedDistIndicies[i]]

        classCount[label] = classCount.get(label,0)+1

    sortedClassCount = sorted(classCount.items(),key=operator.itemgetter(1),reverse=True)

    return sortedClassCount[0][0]

def img2vector(filename):#将32*32图片转换成1*1024向量

    vector = np.zeros((1,1024))

    f = open(filename)

    for i in range(32):

        fr = f.readline()

        for j in range(32):

            vector[0,32*i+j] = int(fr[j])

    return vector

def handwritingClassTest():

    filenameList = os.listdir(r'machinelearninginaction\Ch02\digits\trainingDigits')

    m = len(filenameList)

    trainLabelList = []

    trainDataMatrix = np.zeros((m,1024))

    for i in range(m):

        trainLabelList.append(int(filenameList[i].strip('_')[0]))

        trainDataMatrix[i,:] = img2vector(r'machinelearninginaction\Ch02\digits\trainingDigits\{0}'.format(filenameList[i]))

    filenameList = os.listdir(r'machinelearninginaction\Ch02\digits\testDigits')

    m = len(filenameList)

    corrent = 0.0

    for i in range(m):

        testLabel = int(filenameList[i].strip('_')[0])

        testIn = img2vector(r'machinelearninginaction\Ch02\digits\testDigits\{0}'.format(filenameList[i]))

        testOut = classify0(testIn,trainDataMatrix,trainLabelList,3)

        if testOut == testLabel:

            corrent += 1

        else:

            print("Error:the classifier came back with:{0}, the real answer is:{1}。".format(testOut,testLabel))

    print("the corrent percent is:%.2f %%。"%(corrent*100/m))

if __name__ == '__main__':

    classifyPerson() #约会预测

    #handwritingClassTest() #手写识别

约会预测运行结果：

Please input percentage of time spend playing video games?100

Please input frequent flier miles earned per year?8

Please input liters of ice cream consumed per year?200

You likely didntLike the man！

The estimate corrent percent is 96.0%！

进程已结束，退出代码 0

手写识别运行结果：

Error:the classifier came back with:7, the real answer is:1。

Error:the classifier came back with:9, the real answer is:3。

Error:the classifier came back with:3, the real answer is:5。

Error:the classifier came back with:6, the real answer is:5。

Error:the classifier came back with:6, the real answer is:8。

Error:the classifier came back with:3, the real answer is:8。

Error:the classifier came back with:1, the real answer is:8。

Error:the classifier came back with:1, the real answer is:8。

Error:the classifier came back with:1, the real answer is:9。

Error:the classifier came back with:7, the real answer is:9。

the corrent percent is:98.94 %。

进程已结束，退出代码 0

测试数据：

说明：代码参考《机器学习实战》

kNN算法实例（约会对象喜好预测和手写识别）的更多相关文章

吴裕雄--天生自然python机器学习实战：K-NN算法约会网站好友喜好预测以及手写数字预测分类实验
实验设备与软件环境硬件环境:内存ddr3 4G及以上的x86架构主机一部系统环境:windows 软件环境:Anaconda2(64位),python3.5,jupyter 内核版本:window ...
第二篇：基于K-近邻分类算法的约会对象智能匹配系统
前言假如你想到某个在线约会网站寻找约会对象,那么你很可能将该约会网站的所有用户归为三类: 1. 不喜欢的 2. 有点魅力的 3. 很有魅力的你如何决定某个用户属于上述的哪一类呢?想必你会分析用户的 ...
k最邻近算法——使用kNN进行手写识别
上篇文章中提到了使用pillow对手写文字进行预处理,本文介绍如何使用kNN算法对文字进行识别. 基本概念 k最邻近算法(k-Nearest Neighbor, KNN),是机器学习分类算法中最简单的 ...
python 实现 KNN 分类器——手写识别
1 算法概述 1.1 优劣优点:进度高,对异常值不敏感,无数据输入假定缺点:计算复杂度高,空间复杂度高应用:主要用于文本分类,相似推荐适用数据范围:数值型和标称型 1.2 算法伪代码 (1)计 ...
机器学习实战一：kNN手写识别系统
实战一:kNN手写识别系统本文将一步步地构造使用K-近邻分类器的手写识别系统.由于能力有限,这里构造的系统只能识别0-9.需要识别的数字已经使用图形处理软件,处理成具有相同的色彩和大小:32像素*3 ...
TensorFlow 入门之手写识别(MNIST) softmax算法
TensorFlow 入门之手写识别(MNIST) softmax算法 MNIST flyu6 softmax回归 softmax回归算法 TensorFlow实现softmax softmax回归算 ...
机器学习实战kNN之手写识别
kNN算法算是机器学习入门级绝佳的素材.书上是这样诠释的:“存在一个样本数据集合,也称作训练样本集,并且样本集中每个数据都有标签,即我们知道样本集中每一条数据与所属分类的对应关系.输入没有标签的新数据 ...
TensorFlow 入门之手写识别(MNIST) softmax算法二
TensorFlow 入门之手写识别(MNIST) softmax算法二 MNIST Fly softmax回归 softmax回归算法 TensorFlow实现softmax softmax回归算 ...
TensorFlow MNIST（手写识别 softmax）实例运行
TensorFlow MNIST(手写识别 softmax)实例运行首先要有编译环境,并且已经正确的编译安装,关于环境配置参考:http://www.cnblogs.com/dyufei/p/802 ...

随机推荐

十大基本功之testbench
1. 激励的产生对于testbench而言,端口应当和被测试的module一一对应.端口分为input,output和inout类型产生激励信号的时候,input对应的端口应当申明为reg, o ...
pandas读取Excel文件
In [7]: import pandas as pd filname = 'ch02数据导入\\student.xlsx' data = pd.read_excel(filname) data Ou ...
长短时间记忆的中文分词 (LSTM for Chinese Word Segmentation)
翻译学长的一片论文:Long Short-Term Memory Neural Networks for Chinese Word Segmentation 传统的neural Model for C ...
codeforces 108D Basketball Team(简单组合)
D. Basketball Team time limit per test 2 seconds memory limit per test 256 megabytes input standard ...
linux中查找文件
locate arm-none-linux-gnueabi-gcc//有效 find / -name "arm-none-linux-gnueabi-gcc"
jenkins git项目clean before checkout 和 wipe out repository & force clone
clean before checkout:会先执行一遍git clone,删除一些untracked文件和目录,比如删除上一次打包编译产生的文件 wipe out repository & ...
什么是UAT
基本概念 UAT,英文User Acceptance Test的简写,也就是用户验收测试,或用户可接受测试,系统开发生命周期方法论的一个阶段,这时相关的用户或独立测试人员根据测试计划和结果对系统进行测 ...
springboot poi
pom.xml <dependency> <groupId>org.apache.poi</groupId> <artifactId>poi</a ...
Vue-搭建环境
项目开发完react-native,因为又对vue开始感兴趣了,又开始自学起了vue,关于vue是一个很简便的前端框架,要学习它,当然是要先学会搭建vue的环境, 不会搭建环境的程序员不是一个好的程序 ...
MySql的导入导出
可以参看之前的博客:https://www.cnblogs.com/shijinglu2018/p/8672699.html 可以参看视频:https://i.cnblogs.com/EditPost ...

kNN算法实例（约会对象喜好预测和手写识别）

kNN算法实例（约会对象喜好预测和手写识别）的更多相关文章

随机推荐

热门专题