最近在学习机器学习,学习和积累和一些关于机器学习的算法,今天介绍一种机器学习里面各种分类算法的比较

#!/usr/bin/python
# -*- coding: utf-8 -*- """
=====================
Classifier comparison
===================== A comparison of a several classifiers in scikit-learn on synthetic datasets.
The point of this example is to illustrate the nature of decision boundaries
of different classifiers.
与其他的机器学习的分类的算法在合成数据方面相比较,本示例为了说明不同算法边界的性质。
This should be taken with a grain of salt, as the intuition conveyed by
these examples does not necessarily carry over to real datasets. Particularly in high-dimensional spaces, data can more easily be separated
linearly and the simplicity of classifiers such as naive Bayes and linear SVMs
might lead to better generalization than is achieved by other classifiers. The plots show training points in solid colors and testing points
semi-transparent. The lower right shows the classification accuracy on the test
set.
"""
print(__doc__) # Code source: Gaël Varoquaux
# Andreas Müller
# Modified for documentation by Jaques Grobler
# License: BSD 3 clause import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import make_moons, make_circles, make_classification
from sklearn.neural_network import MLPClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.gaussian_process import GaussianProcessClassifier
from sklearn.gaussian_process.kernels import RBF
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis h = .02 # step size in the mesh names = ["Nearest Neighbors", "Linear SVM", "RBF SVM", "Gaussian Process",
"Decision Tree", "Random Forest", "Neural Net", "AdaBoost",
"Naive Bayes", "QDA"] classifiers = [
KNeighborsClassifier(3),
SVC(kernel="linear", C=0.025),
SVC(gamma=2, C=1),
GaussianProcessClassifier(1.0 * RBF(1.0), warm_start=True),
DecisionTreeClassifier(max_depth=5),
RandomForestClassifier(max_depth=5, n_estimators=10, max_features=1),
MLPClassifier(alpha=1),
AdaBoostClassifier(),
GaussianNB(),
QuadraticDiscriminantAnalysis()] X, y = make_classification(n_features=2, n_redundant=0, n_informative=2,
random_state=1, n_clusters_per_class=1)
#print X
#print len(y)
rng = np.random.RandomState(2)
#print X.shape
X+= 2 * rng.uniform(size=X.shape)
#print X
linearly_separable = (X, y) datasets = [make_moons(noise=0.3, random_state=0),
make_circles(noise=0.2, factor=0.5, random_state=1),
linearly_separable
] figure = plt.figure(figsize=(27, 9))
i = 1
# iterate over datasets
for ds_cnt, ds in enumerate(datasets):
'''
上面的循环 ds_cnt是从0-datasets的长度变换
ds 代表datasets的每个值,在这里相当于每个数据生成方法的返回值
'''
# preprocess dataset, split into training and test part
'''
将 ds 的返回值赋值给X,y
'''
X, y = ds
'''
标准化,均值去除和按方差比例缩放数据集的标准化:
当个体特征太过或明显不遵从高斯正态分布时,标准化表现的效果较差。
实际操作中,经常忽略特征数据的分布形状,移除每个特征均值,划分离散特征的标准差,从而等级化,进而实现数据中心化。
通过删除平均值和缩放到单位方差来标准化特征 '''
X = StandardScaler().fit_transform(X)
'''
定义了四个变量 '''
'''
利用数据分割函数将数据分为训练数据集和测试数据集
以及训练数据集和测试数据集对应的整数标签
'''
X_train, X_test, y_train, y_test =train_test_split(X, y, test_size=.4, random_state=42) ''' '''
x_min, x_max = X[:, 0].min() - .5, X[:, 0].max() + .5
y_min, y_max = X[:, 1].min() - .5, X[:, 1].max() + .5
#print X[:, 0]
xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
np.arange(y_min, y_max, h)) # just plot the dataset first
cm = plt.cm.RdBu
''' 红色和蓝色
'''
cm_bright = ListedColormap(['#FF0000', '#0000FF'])
ax = plt.subplot(len(datasets), len(classifiers) + 1, i)
if ds_cnt == 0:
ax.set_title("Input data")
# Plot the training points
'''
scatter函数绘制散列图:
'''
'''
深红色和深蓝色是划分出来的训练数据
'''
ax.scatter(X_train[:, 0], X_train[:, 1], c=y_train, cmap=cm_bright)
# and testing points
'''
浅红色和浅蓝色是划分出来的测试数据
这样就形成了四种颜色的数据 '''
ax.scatter(X_test[:, 0], X_test[:, 1], c=y_test, cmap=cm_bright, alpha=0.6) ax.set_xlim(xx.min(), xx.max())
ax.set_ylim(yy.min(), yy.max())
ax.set_xticks(())
ax.set_yticks(())
i += 1
# iterate over classifiers
for name, clf in zip(names, classifiers):
ax = plt.subplot(len(datasets), len(classifiers) + 1, i)
clf.fit(X_train, y_train)
score = clf.score(X_test, y_test) # Plot the decision boundary. For that, we will assign a color to each
# point in the mesh [x_min, x_max]x[y_min, y_max].
if hasattr(clf, "decision_function"):
Z = clf.decision_function(np.c_[xx.ravel(), yy.ravel()])
else:
Z = clf.predict_proba(np.c_[xx.ravel(), yy.ravel()])[:, 1] # Put the result into a color plot
Z = Z.reshape(xx.shape)
ax.contourf(xx, yy, Z, cmap=cm, alpha=.8) # Plot also the training points
ax.scatter(X_train[:, 0], X_train[:, 1], c=y_train, cmap=cm_bright)
# and testing points
ax.scatter(X_test[:, 0], X_test[:, 1], c=y_test, cmap=cm_bright,
alpha=0.6) ax.set_xlim(xx.min(), xx.max())
ax.set_ylim(yy.min(), yy.max())
ax.set_xticks(())
ax.set_yticks(())
if ds_cnt == 0:
ax.set_title(name)
ax.text(xx.max() - .3, yy.min() + .3, ('%.2f' % score).lstrip(''),
size=15, horizontalalignment='right')
i += 1 plt.tight_layout()
plt.show()

机器学习--Classifier comparison的更多相关文章

  1. Google发布机器学习平台Tensorflow游乐场~带你玩神经网络(转载)

    Google发布机器学习平台Tensorflow游乐场-带你玩神经网络 原文地址:http://f.dataguru.cn/article-9324-1.html> 摘要: 昨天,Google发 ...

  2. [Bayesian] “我是bayesian我怕谁”系列 - Gaussian Process

    科班出身,贝叶斯护体,正本清源,故拿”九阳神功“自比,而非邪气十足的”九阴真经“: 现在看来,此前的八层功力都为这第九层作基础: 本系列第九篇,助/祝你早日hold住神功第九重,加入血统纯正的人工智能 ...

  3. 学习笔记之scikit-learn

    scikit-learn: machine learning in Python — scikit-learn 0.20.0 documentation https://scikit-learn.or ...

  4. sklearn中的数据预处理----good!! 标准化 归一化 在何时使用

    RESCALING attribute data to values to scale the range in [0, 1] or [−1, 1] is useful for the optimiz ...

  5. Tensorflow游乐场

    昨天,Google发布了Tensorflow游乐场.Tensorflow是Google今年推出的机器学习开源平台.而有了Tensorflow游乐场,我们在浏览器中就可以训练自己的神经网络,还有酷酷的图 ...

  6. 机器学习算法 --- Naive Bayes classifier

    一.引言 在开始算法介绍之前,让我们先来思考一个问题,假设今天你准备出去登山,但起床后发现今天早晨的天气是多云,那么你今天是否应该选择出去呢? 你有最近这一个月的天气情况数据如下,请做出判断. 这个月 ...

  7. 机器学习:eclipse中调用weka的Classifier分类器代码Demo

    weka中实现了很多机器学习算法,不管实验室研究或者公司研发,都会或多或少的要使用weka,我的理解是weka是在本地的SparkML,SparkML是分布式的大数据处理机器学习算法,数据量不是很大的 ...

  8. 从线性模型(linear model)衍生出的机器学习分类器(classifier)

    1. 线性模型简介 0x1:线性模型的现实意义 在一个理想的连续世界中,任何非线性的东西都可以被线性的东西来拟合(参考Taylor Expansion公式),所以理论上线性模型可以模拟物理世界中的绝大 ...

  9. 机器学习---朴素贝叶斯分类器(Machine Learning Naive Bayes Classifier)

    朴素贝叶斯分类器是一组简单快速的分类算法.网上已经有很多文章介绍,比如这篇写得比较好:https://blog.csdn.net/sinat_36246371/article/details/6014 ...

随机推荐

  1. Azure

    ylbtech-Miscellaneos:Azure A,返回顶部 1, Windows Azure是微软基于云计算的操作系统,现在更名为“Microsoft Azure”,和Azure Servic ...

  2. Axure 使用心得总结

    Axure的本意是高效快捷的完成原型制作,能够清晰的说明功能,交互就是好的,"够漂亮"就行,不需要做到很完美,至于完美还是交给专业的UI吧. 一些心得记录下来: 1.下载一些常用的 ...

  3. windows apache24 php Call to undefined function curl_init

    check dll files in dir: Apache24/bin libssh2.dll, ssleay32.dll, libeay32.dll http://nz.php.net/manua ...

  4. ECS挂载数据盘

    1.先在阿里控制台挂载硬盘: 2.df -h 确认没有分区 3.fdisk -l 4.fdisk /dev/xvdb 分区 根据提示m-n-p-1-Enter-Enter-w 5.fdisk -l 查 ...

  5. WPF 绑定的校验

    为进行校验,必须准备一个RangeValidationRule类,该类继承自ValidationRule 该类实现如下: class RangeValidationRule : ValidationR ...

  6. winform窗体之间通过 windows API SendMessage函数传值

    -----------------------------------------------------------‘接收窗体’代码.cs------------------------------ ...

  7. Retina屏实现1px边框

    问题描述 通常我们实现边框的方法都是设置1px的边框,但是在retina屏上因为设备像素比的不同,边框在移动设备上的表现也不相同,例如在devicePixelRatio = 2的retina屏下会显示 ...

  8. 通过Nginx+tomcat+redis实现反向代理 、负载均衡及session同步

    一直对于负载均衡比较陌生,今天尝试着去了解了一下,并做了一个小的实验,对于这个概念有一些认识,在此做一个简单的总结 什么是负载均衡 负载均衡,英文 名称为Load Balance,指由多台服务器以对称 ...

  9. QTP学习笔记之—VBS

    1.ToString() : Returns a string that represents the current test object. Example The following examp ...

  10. 时间序列分析之ARIMA模型预测__R篇

    http://www.cnblogs.com/bicoffee/p/3838049.html