sklearn学习笔记1

Image recognition with Support Vector Machines

#our dataset is provided within scikit-learn

#let's start by importing and printing its description

import sklearn as sk

import numpy as np

import matplotlib.pyplot as plt

from sklearn.datasets import fetch_olivetti_faces

faces = fetch_olivetti_faces()

print(faces.DESCR)

Modified Olivetti faces dataset. The original database was available from (now defunct)

http://www.uk.research.att.com/facedatabase.html

The version retrieved here comes in MATLAB format from the personal web page of Sam Roweis:

http://www.cs.nyu.edu/~roweis/

There are ten different images of each of 40 distinct subjects. For some subjects, the images were taken at different times, varying the lighting, facial expressions (open / closed eyes, smiling / not smiling) and facial details (glasses / no glasses). All the images were taken against a dark homogeneous background with the subjects in an upright, frontal position (with tolerance for some side movement). The original dataset consisted of 92 x 112, while the Roweis version consists of 64x64 images.

print(faces.keys())

print(faces.images.shape)

print(faces.data.shape)

print(faces.target.shape)

print(np.max(faces.data))

print(np.min(faces.data))

print(np.mean(faces.data))

Before learning, let’s plot some faces.

def print_faces(images, target, top_n):

    #set up the figure size in inches

    fig = plt.figure(figsize=(12, 12))

    fig.subplots_adjust(left = 0, right = 1, bottom = 0, top = 1, hspace = 0.05,wspace = 0.05)

    for i in range(top_n):

        #plot the images in a matrix of 20x20

        p = fig.add_subplot(20, 20, i + 1, xticks = [], yticks = [])

        p.imshow(images[i], cmap = plt.cm.bone)

        #label the image with target value

        p.text(0, 14, str(target[i]))

        p.text(0, 60, str(i))

If we print the first 20 images, we can see faces from two faces.(但是不知道为什么，打印不出图片)

print_faces(faces.images, faces.target, 20)

Training a Support Vector machine

Import the SVC class from the sklearn.svm module:

from sklearn.svm import SVC

To start, we will use the simplest kernel, the linear one

svc_1 = SVC(kernel = 'linear')

Before continuing, we will split our dataset into training and testing datasets.

from sklearn.cross_validation import train_test_split

X_train, X_test, y_train, y_test = train_test_split(faces.data,faces.target, test_size = 0.25, random_state = 0)

And we will define a function to evaluate K-fold cross-validation.

from sklearn.cross_validation import cross_val_score, KFold

from scipy.stats import sem

def evaluate_cross_validation(clf, X, y, K):

    #create a k-fold cross validation iterator

    cv = KFold(len(y), K, shuffle=True, random_state=0)

    #by default the score used is the one return by score method of the estimator (accuracy)

    scores = cross_val_score(clf, X, y, cv = cv)

    print(scores)

    print(("Mean score: {0: .3f} (+/-{1: .3f})").format(np.mean(scores), sem(scores)))

evaluate_cross_validation(svc_1, X_train, y_train, 5)

[ 0.93333333  0.86666667  0.91666667  0.93333333  0.91666667]

Mean score:  0.913 (+/- 0.012)

We will also define a function to perform training on the training set and evaluate the performance on the testing set.

from sklearn import metrics

def train_and_evaluate(clf, X_train, X_test, y_train, y_test):

    clf.fit(X_train, y_train)

    print("Accuracy on training set:")

    print(clf.score(X_train, y_train))

    print("Accuracy on testing set:")

    print(clf.score(X_test, y_test))

    y_pred = clf.predict(X_test)

    print("Classification Report:")

    print(metrics.classification_report(y_test, y_pred))

    print("Confusion Matrix:")

    print(metrics.confusion_matrix(y_test, y_pred))

train_and_evaluate(svc_1, X_train, X_test, y_train, y_test)

Accuracy on training set: 1.0 Accuracy on testing set: 0.99

classify the faces as people with and without glasses

First thing to do is to defne the range of the images that show faces wearing glasses.
The following list shows the indexes of these images:

# the index ranges of images of people with glasses

glasses = [

(10, 19), (30, 32), (37, 38), (50, 59), (63, 64),

(69, 69), (120, 121), (124, 129), (130, 139), (160, 161),

(164, 169), (180, 182), (185, 185), (189, 189), (190, 192),

(194, 194), (196, 199), (260, 269), (270, 279), (300, 309),

(330, 339), (358, 359), (360, 369)

]

Then we'll defne a function that from those segments returns a new target array that marks with 1 for the faces with glasses and 0 for the faces without glasses (our new target classes):

def create_target(segments):

    # create a new y array of target size initialized with zeros

    y = np.zeros(faces.target.shape[0])

    # put 1 in the specified segments

    for (start, end) in segments:

        y[start:end + 1] = 1

    return y

target_glasses = create_target(glasses)

So we must perform the training/testing split again.

X_train, X_test, y_train, y_test = train_test_split(faces.data, target_glasses, test_size=0.25, random_state=0)

Now let's create a new SVC classifer, and train it with the new target vector using the following command:

svc_2 = SVC(kernel='linear')

If we check the performance with cross-validation by the following code:

evaluate_cross_validation(svc_2, X_train, y_train, 5)

[ 1.          0.95        0.98333333  0.98333333  0.93333333]

Mean score:  0.970 (+/- 0.012)

We obtain a mean accuracy of 0.970 with cross-validation if we evaluate on our testing set.

 train_and_evaluate(svc_2, X_train, X_test, y_train, y_test)

Accuracy on training set:

1.0

Accuracy on testing set:

0.99

Classification Report:

             precision    recall  f1-score   support

        0.0       1.00      0.99      0.99        67

        1.0       0.97      1.00      0.99        33

avg / total       0.99      0.99      0.99       100

Confusion Matrix:

[[66  1]

 [ 0 33]]

Could it be possible that our classifer has learned to identify peoples' faces associated with glasses and without glasses precisely? How can we be sure that this is not happening and that if we get new unseen faces, it will work as expected? Let's separate all the images of the same person, sometimes wearing glasses and sometimes not. We will also separate all the images of the same person, the ones with indexes from 30 to 39, train by using the remaining instances, and evaluate on our new 10 instances set. With this experiment we will try to discard the fact that it is remembering faces, not glassed-related features.

X_test = faces.data[30:40]

y_test = target_glasses[30:40]

print(y_test.shape[0])

select = np.ones(target_glasses.shape[0])

select[30:40] = 0

X_train = faces.data[select == 1]

y_train = target_glasses[select == 1]

print(y_train.shape[0])

svc_3 = SVC(kernel='linear')

train_and_evaluate(svc_3, X_train, X_test, y_train, y_test)

10

390

Accuracy on training set:

1.0

Accuracy on testing set:

0.9

Classification Report:

             precision    recall  f1-score   support

        0.0       0.83      1.00      0.91         5

        1.0       1.00      0.80      0.89         5

avg / total       0.92      0.90      0.90        10

Confusion Matrix:

[[5 0]

 [1 4]]

From the 10 images, only one error, still pretty good results, let's check out which one was incorrectly classifed. First, we have to reshape the data from arrays to 64 x 64 matrices:

y_pred = svc_3.predict(X_test)

eval_faces = [np.reshape(a, (64, 64)) for a in X_test]

Then plot with our print_faces function:

print_faces(eval_faces, y_pred, 10)

The image number 8 in the preceding fgure has glasses and was classifed as no glasses. If we look at that instance, we can see that it is different from the rest of the images with glasses (the border of the glasses cannot be seen clearly and the person is shown with closed eyes), which could be the reason it has been misclassifed.

sklearn学习笔记1的更多相关文章

sklearn学习笔记之简单线性回归
简单线性回归线性回归是数据挖掘中的基础算法之一,从某种意义上来说,在学习函数的时候已经开始接触线性回归了,只不过那时候并没有涉及到误差项.线性回归的思想其实就是解一组方程,得到回归函数,不过在出现误 ...
sklearn学习笔记3
Explaining Titanic hypothesis with decision trees decision trees are very simple yet powerful superv ...
sklearn学习笔记2
Text classifcation with Naïve Bayes In this section we will try to classify newsgroup messages using ...
sklearn学习笔记
用Bagging优化模型的过程:1.对于要使用的弱模型(比如线性分类器.岭回归),通过交叉验证的方式找到弱模型本身的最好超参数:2.然后用这个带着最好超参数的弱模型去构建强模型:3.对强模型也是通过交 ...
sklearn学习笔记（一）——数据预处理 sklearn.preprocessing
https://blog.csdn.net/zhangyang10d/article/details/53418227 数据预处理 sklearn.preprocessing 标准化 (Standar ...
sklearn学习笔记之岭回归
岭回归岭回归是一种专用于共线性数据分析的有偏估计回归方法,实质上是一种改良的最小二乘估计法,通过放弃最小二乘法的无偏性,以损失部分信息.降低精度为代价获得回归系数更为符合实际.更可靠的回归方法,对病 ...
sklearn学习笔记之开始
简介自2007年发布以来,scikit-learn已经成为Python重要的机器学习库了.scikit-learn简称sklearn,支持包括分类.回归.降维和聚类四大机器学习算法.还包含了特征 ...
sklearn学习笔记（1）--make_blobs函数及相应参数简介
make_blobs方法: sklearn.datasets.make_blobs(n_samples=100,n_features=2,centers=3, cluster_std=1.0,cent ...
Google TensorFlow深度学习笔记
Google Deep Learning Notes Google 深度学习笔记由于谷歌机器学习教程更新太慢,所以一边学习Deep Learning教程,经常总结是个好习惯,笔记目录奉上. Gith ...

随机推荐

[转] Jenkins实战演练之Windows系统节点管理
[前提] 通过<Jenkins实战演练之Windows服务器快速搭建>(http://my.oschina.net/iware/blog /191818)和<Jenkins实战演练之 ...
必须掌握的八个cmd 命令
一,ping 它是用来检查网络是否通畅或者网络连接速度的命令.作为一个生活在网络上的管理员或者黑客来说,ping命令是第一个必须掌握的DOS命令,它所利用的原理是这样的:网络上的机器都有唯一确定的I ...
android沉浸式状态栏设置(4.4以上版本)
其实设置比较简单,我用了小米和htc的几款机型都可以用. 主要代码就是这个(注意要在Activity的setContentView之前调用才行) /** * 开启沉浸式状态栏 * */ public ...
Visual Studio 2013 各个版本的产品密钥
win7/win8/win10系统下Visual Studio 2013各个版本的密钥: Visual Studio Ultimate 2013 KEY:BWG7X-J98B3-W34RT-33B3R ...
CSS hack方式一览【转】
做前端多年,虽然不是经常需要hack,但是我们经常会遇到各浏览器表现不一致的情况.基于此,某些情况我们会极不情愿的使用这个不太友好的方式来达到大家要求的页面表现.我个人是不太推荐使用hack的,要知道 ...
浏览器功能记住账号和密码解决方法（HTML解决方式）
1.在input标签里应用html5的新特性autocomplete="off" 注:对chrome不管用.其他浏览器没试. 2.如果是一个输入框那就在当前input标签后面(一 ...
{part1}DFN+LOW（tarjan）割点
什么是jarjan? 1)求割点定义:在无向连通图中,如果去掉一个点/边,剩下的点之间不连通,那么这个点/边就被称为割点/边(或割顶/桥). 意义:由于割点和割边涉及到图的连通性,所以快速地求出割点 ...
使用SAXReader读取ftp服务器上的xml文件（原创）
根据项目需求,需要监测ftp服务器上的文件变化情况,并将新添加的文件读入项目系统(不需要下载). spring配置定时任务就不多说了,需要注意的一点就是,现在的项目很多都是通过maven构建的,分好多 ...
【jq】c#零基础学习之路（5）自己编写简单的Mylist<T>
public class MyList<T> where T : IComparable { private T[] array; private int count; public My ...
Unity浅析
在分析PRISM项目的时候, 发现里面用到了Unity 这个Component, 主要用于依赖注入的.由于对其不熟悉,索性分析了一下,记载在此,以作备忘. 任何事物的出现,总有它独特的原因,Unity ...

sklearn学习笔记1

sklearn学习笔记1的更多相关文章

随机推荐

热门专题