机器学习--Classifier comparison

最近在学习机器学习，学习和积累和一些关于机器学习的算法，今天介绍一种机器学习里面各种分类算法的比较

#!/usr/bin/python

# -*- coding: utf-8 -*-

"""

=====================

Classifier comparison

=====================

A comparison of a several classifiers in scikit-learn on synthetic datasets.

The point of this example is to illustrate the nature of decision boundaries

of different classifiers.

与其他的机器学习的分类的算法在合成数据方面相比较，本示例为了说明不同算法边界的性质。

This should be taken with a grain of salt, as the intuition conveyed by

these examples does not necessarily carry over to real datasets.

Particularly in high-dimensional spaces, data can more easily be separated

linearly and the simplicity of classifiers such as naive Bayes and linear SVMs

might lead to better generalization than is achieved by other classifiers.

The plots show training points in solid colors and testing points

semi-transparent. The lower right shows the classification accuracy on the test

set.

"""

print(__doc__)

# Code source: Gaël Varoquaux

#              Andreas Müller

# Modified for documentation by Jaques Grobler

# License: BSD 3 clause

import numpy as np

import matplotlib.pyplot as plt

from matplotlib.colors import ListedColormap

from sklearn.model_selection import train_test_split

from sklearn.preprocessing import StandardScaler

from sklearn.datasets import make_moons, make_circles, make_classification

from sklearn.neural_network import MLPClassifier

from sklearn.neighbors import KNeighborsClassifier

from sklearn.svm import SVC

from sklearn.gaussian_process import GaussianProcessClassifier

from sklearn.gaussian_process.kernels import RBF

from sklearn.tree import DecisionTreeClassifier

from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier

from sklearn.naive_bayes import GaussianNB

from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis

h = .02  # step size in the mesh

names = ["Nearest Neighbors", "Linear SVM", "RBF SVM", "Gaussian Process",

         "Decision Tree", "Random Forest", "Neural Net", "AdaBoost",

         "Naive Bayes", "QDA"]

classifiers = [

    KNeighborsClassifier(3),

    SVC(kernel="linear", C=0.025),

    SVC(gamma=2, C=1),

    GaussianProcessClassifier(1.0 * RBF(1.0), warm_start=True),

    DecisionTreeClassifier(max_depth=5),

    RandomForestClassifier(max_depth=5, n_estimators=10, max_features=1),

    MLPClassifier(alpha=1),

    AdaBoostClassifier(),

    GaussianNB(),

    QuadraticDiscriminantAnalysis()]

X, y = make_classification(n_features=2, n_redundant=0, n_informative=2,

                           random_state=1, n_clusters_per_class=1)

#print X

#print len(y)

rng = np.random.RandomState(2)

#print X.shape

X+= 2 * rng.uniform(size=X.shape)

#print X

linearly_separable = (X, y)

datasets = [make_moons(noise=0.3, random_state=0),

            make_circles(noise=0.2, factor=0.5, random_state=1),

            linearly_separable

            ]

figure = plt.figure(figsize=(27, 9))

i = 1

# iterate over datasets

for ds_cnt, ds in enumerate(datasets):

    '''

    上面的循环 ds_cnt是从0-datasets的长度变换

    ds 代表datasets的每个值，在这里相当于每个数据生成方法的返回值

    '''

    # preprocess dataset, split into training and test part

    '''

    将 ds 的返回值赋值给X,y

    '''

    X, y = ds

    '''

    标准化，均值去除和按方差比例缩放数据集的标准化：

    当个体特征太过或明显不遵从高斯正态分布时，标准化表现的效果较差。

    实际操作中，经常忽略特征数据的分布形状，移除每个特征均值，划分离散特征的标准差，从而等级化，进而实现数据中心化。

    通过删除平均值和缩放到单位方差来标准化特征

    '''

    X = StandardScaler().fit_transform(X)

    '''

    定义了四个变量

    '''

    '''

    利用数据分割函数将数据分为训练数据集和测试数据集

    以及训练数据集和测试数据集对应的整数标签

    '''

    X_train, X_test, y_train, y_test =train_test_split(X, y, test_size=.4, random_state=42)

    '''

    '''

    x_min, x_max = X[:, 0].min() - .5, X[:, 0].max() + .5

    y_min, y_max = X[:, 1].min() - .5, X[:, 1].max() + .5

    #print X[:, 0]

    xx, yy = np.meshgrid(np.arange(x_min, x_max, h),

                         np.arange(y_min, y_max, h))

    # just plot the dataset first

    cm = plt.cm.RdBu

    '''

    红色和蓝色

    '''

    cm_bright = ListedColormap(['#FF0000', '#0000FF'])

    ax = plt.subplot(len(datasets), len(classifiers) + 1, i)

    if ds_cnt == 0:

        ax.set_title("Input data")

    # Plot the training points

    '''

    scatter函数绘制散列图:

    '''

    '''

    深红色和深蓝色是划分出来的训练数据

    '''

    ax.scatter(X_train[:, 0], X_train[:, 1], c=y_train, cmap=cm_bright)

    # and testing points

    '''

    浅红色和浅蓝色是划分出来的测试数据

    这样就形成了四种颜色的数据

    '''

    ax.scatter(X_test[:, 0], X_test[:, 1], c=y_test, cmap=cm_bright, alpha=0.6)

    ax.set_xlim(xx.min(), xx.max())

    ax.set_ylim(yy.min(), yy.max())

    ax.set_xticks(())

    ax.set_yticks(())

    i += 1

    # iterate over classifiers

    for name, clf in zip(names, classifiers):

        ax = plt.subplot(len(datasets), len(classifiers) + 1, i)

        clf.fit(X_train, y_train)

        score = clf.score(X_test, y_test)

        # Plot the decision boundary. For that, we will assign a color to each

        # point in the mesh [x_min, x_max]x[y_min, y_max].

        if hasattr(clf, "decision_function"):

            Z = clf.decision_function(np.c_[xx.ravel(), yy.ravel()])

        else:

            Z = clf.predict_proba(np.c_[xx.ravel(), yy.ravel()])[:, 1]

        # Put the result into a color plot

        Z = Z.reshape(xx.shape)

        ax.contourf(xx, yy, Z, cmap=cm, alpha=.8)

        # Plot also the training points

        ax.scatter(X_train[:, 0], X_train[:, 1], c=y_train, cmap=cm_bright)

        # and testing points

        ax.scatter(X_test[:, 0], X_test[:, 1], c=y_test, cmap=cm_bright,

                   alpha=0.6)

        ax.set_xlim(xx.min(), xx.max())

        ax.set_ylim(yy.min(), yy.max())

        ax.set_xticks(())

        ax.set_yticks(())

        if ds_cnt == 0:

            ax.set_title(name)

        ax.text(xx.max() - .3, yy.min() + .3, ('%.2f' % score).lstrip(''),

                size=15, horizontalalignment='right')

        i += 1

plt.tight_layout()

plt.show()

机器学习--Classifier comparison的更多相关文章

Google发布机器学习平台Tensorflow游乐场～带你玩神经网络（转载）
Google发布机器学习平台Tensorflow游乐场-带你玩神经网络原文地址:http://f.dataguru.cn/article-9324-1.html> 摘要: 昨天,Google发 ...
[Bayesian] “我是bayesian我怕谁”系列 - Gaussian Process
科班出身,贝叶斯护体,正本清源,故拿”九阳神功“自比,而非邪气十足的”九阴真经“: 现在看来,此前的八层功力都为这第九层作基础: 本系列第九篇,助/祝你早日hold住神功第九重,加入血统纯正的人工智能 ...
学习笔记之scikit-learn
scikit-learn: machine learning in Python — scikit-learn 0.20.0 documentation https://scikit-learn.or ...
sklearn中的数据预处理----good!! 标准化归一化在何时使用
RESCALING attribute data to values to scale the range in [0, 1] or [−1, 1] is useful for the optimiz ...
Tensorflow游乐场
昨天,Google发布了Tensorflow游乐场.Tensorflow是Google今年推出的机器学习开源平台.而有了Tensorflow游乐场,我们在浏览器中就可以训练自己的神经网络,还有酷酷的图 ...
机器学习算法 --- Naive Bayes classifier
一.引言在开始算法介绍之前,让我们先来思考一个问题,假设今天你准备出去登山,但起床后发现今天早晨的天气是多云,那么你今天是否应该选择出去呢? 你有最近这一个月的天气情况数据如下,请做出判断. 这个月 ...
机器学习：eclipse中调用weka的Classifier分类器代码Demo
weka中实现了很多机器学习算法,不管实验室研究或者公司研发,都会或多或少的要使用weka,我的理解是weka是在本地的SparkML,SparkML是分布式的大数据处理机器学习算法,数据量不是很大的 ...
从线性模型（linear model）衍生出的机器学习分类器（classifier）
1. 线性模型简介 0x1:线性模型的现实意义在一个理想的连续世界中,任何非线性的东西都可以被线性的东西来拟合(参考Taylor Expansion公式),所以理论上线性模型可以模拟物理世界中的绝大 ...
机器学习---朴素贝叶斯分类器（Machine Learning Naive Bayes Classifier）
朴素贝叶斯分类器是一组简单快速的分类算法.网上已经有很多文章介绍,比如这篇写得比较好:https://blog.csdn.net/sinat_36246371/article/details/6014 ...

随机推荐

虚拟机拷贝后网卡eth0变成了eth1的解决办法
一.修改/etc/udev/rules.d/70-persistent-net.rules文件将之前的eth0那行删了,将eth1改为eth0 二.配置ifcfg-eth0脚本,注意HWADDR那行 ...
转 Dynamics CRM Alert and Notification JavaScript Methods
http://www.powerobjects.com/2015/09/23/dynamics-crm-alert-and-notification-javascript-methods/ Befor ...
[SQL]复制数据库某一个表到另一个数据库中
SQL:复制数据库某一个表到另一个数据库中 SELECT * INTO 表1 FROM 表2 --复制表2如果只复制结构而不复制内容或只复制某一列只要加WHERE条件就好了例子:SELECT * I ...
[DFNews] Touch ID不是神话，指模依旧能搞定。
扫描制作翻模,使用含石墨硅胶压膜,前者复制指纹纹路,后者欺骗活体检测.
windows环境PhpStorm中简单使用PHP_CodeSniffer规范php代码
为什么使用PHP_CodeSniffer 一个开发团队统一的编码风格,有助于他人对代码的理解和维护,对于大项目来说尤其重要. PHP_CodeSniffer是PEAR中的一个用PHP5写的用来检查嗅探 ...
SQL Server选项综述
I. 基本概念 SQL Server中的选项根据其作用范围分为如下几类: 实例选项 —— 在数据库实例范围内有效,通过 sp_configure 存储过程进行配置. 数据库选项 —— 在数据库范围内有 ...
动手学servlet(五) 共享变量
1. 无论对象的作用域如何,设置和读取共享变量的方法是一致的 -setAttribute("varName",obj); -getAttribute("varName&q ...
在 Arch Linux 玩百度 Flash 战曲游戏乱码
#!/bin/sh #From: http://hi.baidu.com/imtinge/item/3516761d314481542b3e22f0 #Info: CJK Unicode font M ...
PowerDesigner设计时表显示注释选项
PowerDesigner设计时表显示注释选项:选定编辑的表,右键- >Properties- >Columns- >Customize Columns and Filter(或直接 ...
Hadoop_配置_linux下编译eclipse插件
使用的hadoop版本为hadoop-1.2.1(对应的含源码的安装包为hadoop-1.2.1.tar.gz) 将hadoop和eclipse都解压在home中的用户目录下 /home/chen/h ...

机器学习--Classifier comparison

机器学习--Classifier comparison的更多相关文章

随机推荐

热门专题