机器学习--Classifier comparison
最近在学习机器学习,学习和积累和一些关于机器学习的算法,今天介绍一种机器学习里面各种分类算法的比较
#!/usr/bin/python
# -*- coding: utf-8 -*- """
=====================
Classifier comparison
===================== A comparison of a several classifiers in scikit-learn on synthetic datasets.
The point of this example is to illustrate the nature of decision boundaries
of different classifiers.
与其他的机器学习的分类的算法在合成数据方面相比较,本示例为了说明不同算法边界的性质。
This should be taken with a grain of salt, as the intuition conveyed by
these examples does not necessarily carry over to real datasets. Particularly in high-dimensional spaces, data can more easily be separated
linearly and the simplicity of classifiers such as naive Bayes and linear SVMs
might lead to better generalization than is achieved by other classifiers. The plots show training points in solid colors and testing points
semi-transparent. The lower right shows the classification accuracy on the test
set.
"""
print(__doc__) # Code source: Gaël Varoquaux
# Andreas Müller
# Modified for documentation by Jaques Grobler
# License: BSD 3 clause import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import make_moons, make_circles, make_classification
from sklearn.neural_network import MLPClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.gaussian_process import GaussianProcessClassifier
from sklearn.gaussian_process.kernels import RBF
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis h = .02 # step size in the mesh names = ["Nearest Neighbors", "Linear SVM", "RBF SVM", "Gaussian Process",
"Decision Tree", "Random Forest", "Neural Net", "AdaBoost",
"Naive Bayes", "QDA"] classifiers = [
KNeighborsClassifier(3),
SVC(kernel="linear", C=0.025),
SVC(gamma=2, C=1),
GaussianProcessClassifier(1.0 * RBF(1.0), warm_start=True),
DecisionTreeClassifier(max_depth=5),
RandomForestClassifier(max_depth=5, n_estimators=10, max_features=1),
MLPClassifier(alpha=1),
AdaBoostClassifier(),
GaussianNB(),
QuadraticDiscriminantAnalysis()] X, y = make_classification(n_features=2, n_redundant=0, n_informative=2,
random_state=1, n_clusters_per_class=1)
#print X
#print len(y)
rng = np.random.RandomState(2)
#print X.shape
X+= 2 * rng.uniform(size=X.shape)
#print X
linearly_separable = (X, y) datasets = [make_moons(noise=0.3, random_state=0),
make_circles(noise=0.2, factor=0.5, random_state=1),
linearly_separable
] figure = plt.figure(figsize=(27, 9))
i = 1
# iterate over datasets
for ds_cnt, ds in enumerate(datasets):
'''
上面的循环 ds_cnt是从0-datasets的长度变换
ds 代表datasets的每个值,在这里相当于每个数据生成方法的返回值
'''
# preprocess dataset, split into training and test part
'''
将 ds 的返回值赋值给X,y
'''
X, y = ds
'''
标准化,均值去除和按方差比例缩放数据集的标准化:
当个体特征太过或明显不遵从高斯正态分布时,标准化表现的效果较差。
实际操作中,经常忽略特征数据的分布形状,移除每个特征均值,划分离散特征的标准差,从而等级化,进而实现数据中心化。
通过删除平均值和缩放到单位方差来标准化特征 '''
X = StandardScaler().fit_transform(X)
'''
定义了四个变量 '''
'''
利用数据分割函数将数据分为训练数据集和测试数据集
以及训练数据集和测试数据集对应的整数标签
'''
X_train, X_test, y_train, y_test =train_test_split(X, y, test_size=.4, random_state=42) ''' '''
x_min, x_max = X[:, 0].min() - .5, X[:, 0].max() + .5
y_min, y_max = X[:, 1].min() - .5, X[:, 1].max() + .5
#print X[:, 0]
xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
np.arange(y_min, y_max, h)) # just plot the dataset first
cm = plt.cm.RdBu
''' 红色和蓝色
'''
cm_bright = ListedColormap(['#FF0000', '#0000FF'])
ax = plt.subplot(len(datasets), len(classifiers) + 1, i)
if ds_cnt == 0:
ax.set_title("Input data")
# Plot the training points
'''
scatter函数绘制散列图:
'''
'''
深红色和深蓝色是划分出来的训练数据
'''
ax.scatter(X_train[:, 0], X_train[:, 1], c=y_train, cmap=cm_bright)
# and testing points
'''
浅红色和浅蓝色是划分出来的测试数据
这样就形成了四种颜色的数据 '''
ax.scatter(X_test[:, 0], X_test[:, 1], c=y_test, cmap=cm_bright, alpha=0.6) ax.set_xlim(xx.min(), xx.max())
ax.set_ylim(yy.min(), yy.max())
ax.set_xticks(())
ax.set_yticks(())
i += 1
# iterate over classifiers
for name, clf in zip(names, classifiers):
ax = plt.subplot(len(datasets), len(classifiers) + 1, i)
clf.fit(X_train, y_train)
score = clf.score(X_test, y_test) # Plot the decision boundary. For that, we will assign a color to each
# point in the mesh [x_min, x_max]x[y_min, y_max].
if hasattr(clf, "decision_function"):
Z = clf.decision_function(np.c_[xx.ravel(), yy.ravel()])
else:
Z = clf.predict_proba(np.c_[xx.ravel(), yy.ravel()])[:, 1] # Put the result into a color plot
Z = Z.reshape(xx.shape)
ax.contourf(xx, yy, Z, cmap=cm, alpha=.8) # Plot also the training points
ax.scatter(X_train[:, 0], X_train[:, 1], c=y_train, cmap=cm_bright)
# and testing points
ax.scatter(X_test[:, 0], X_test[:, 1], c=y_test, cmap=cm_bright,
alpha=0.6) ax.set_xlim(xx.min(), xx.max())
ax.set_ylim(yy.min(), yy.max())
ax.set_xticks(())
ax.set_yticks(())
if ds_cnt == 0:
ax.set_title(name)
ax.text(xx.max() - .3, yy.min() + .3, ('%.2f' % score).lstrip(''),
size=15, horizontalalignment='right')
i += 1 plt.tight_layout()
plt.show()

机器学习--Classifier comparison的更多相关文章
- Google发布机器学习平台Tensorflow游乐场~带你玩神经网络(转载)
Google发布机器学习平台Tensorflow游乐场-带你玩神经网络 原文地址:http://f.dataguru.cn/article-9324-1.html> 摘要: 昨天,Google发 ...
- [Bayesian] “我是bayesian我怕谁”系列 - Gaussian Process
科班出身,贝叶斯护体,正本清源,故拿”九阳神功“自比,而非邪气十足的”九阴真经“: 现在看来,此前的八层功力都为这第九层作基础: 本系列第九篇,助/祝你早日hold住神功第九重,加入血统纯正的人工智能 ...
- 学习笔记之scikit-learn
scikit-learn: machine learning in Python — scikit-learn 0.20.0 documentation https://scikit-learn.or ...
- sklearn中的数据预处理----good!! 标准化 归一化 在何时使用
RESCALING attribute data to values to scale the range in [0, 1] or [−1, 1] is useful for the optimiz ...
- Tensorflow游乐场
昨天,Google发布了Tensorflow游乐场.Tensorflow是Google今年推出的机器学习开源平台.而有了Tensorflow游乐场,我们在浏览器中就可以训练自己的神经网络,还有酷酷的图 ...
- 机器学习算法 --- Naive Bayes classifier
一.引言 在开始算法介绍之前,让我们先来思考一个问题,假设今天你准备出去登山,但起床后发现今天早晨的天气是多云,那么你今天是否应该选择出去呢? 你有最近这一个月的天气情况数据如下,请做出判断. 这个月 ...
- 机器学习:eclipse中调用weka的Classifier分类器代码Demo
weka中实现了很多机器学习算法,不管实验室研究或者公司研发,都会或多或少的要使用weka,我的理解是weka是在本地的SparkML,SparkML是分布式的大数据处理机器学习算法,数据量不是很大的 ...
- 从线性模型(linear model)衍生出的机器学习分类器(classifier)
1. 线性模型简介 0x1:线性模型的现实意义 在一个理想的连续世界中,任何非线性的东西都可以被线性的东西来拟合(参考Taylor Expansion公式),所以理论上线性模型可以模拟物理世界中的绝大 ...
- 机器学习---朴素贝叶斯分类器(Machine Learning Naive Bayes Classifier)
朴素贝叶斯分类器是一组简单快速的分类算法.网上已经有很多文章介绍,比如这篇写得比较好:https://blog.csdn.net/sinat_36246371/article/details/6014 ...
随机推荐
- [ActionScript 3.0] 获取随机颜色
import flash.display.MovieClip; import flash.geom.ColorTransform; //方法一 var red:Number = Math.random ...
- tomcat 9.0配置管理员用户名和密码
登录后,可以管理查看发布网站的信息,岗安装好的tomcat是没有管理密码的,可以在配置文件中修改. 在conf/tomcat-users.xml的文件中修改配置,默认的角色用户是被注销掉的.我去掉注释 ...
- Sorry, but the Android VPN API doesn’t currently allow TAP-based tunnels.
Sorry, but the Android VPN API doesn’t currently allow TAP-based tunnels. Edit .ovpn configfile “dev ...
- zz Must read
http://www.opengpu.org/forum.php?mod=viewthread&tid=965&extra=page%3D1 游戏引擎剖析(Game Engine An ...
- jQuery String Functions
In today's post, I have put together all jQuery String Functions. Well, I should say that these are ...
- 浅谈oracle10G spfile与pfile(转)
转自:http://blog.csdn.net/onebigday/article/details/6108348,http://www.linuxidc.com/Linux/2012-11/7371 ...
- pthread 学习
1. 创建线程 int pthread_create (pthread_t* thread, pthread_attr_t* attr, void* (*start_routine)(void*), ...
- JS正则大全
验证数字:^[0-9]*$ 验证n位的数字:^\d{n}$ 验证至少n位数字:^\d{n,}$ 验证m-n位的数字:^\d{m,n}$ 验证零和非零开头的数字:^(0|[1-9][0-9]*)$ 验证 ...
- vc-complex-type.2.3: Element 'filter-mapping' cannot have character [children], because the type's content type is element-only.
报这种错一般是因为导入的项目或者黏贴的代码中有中文空格或者中文编码,只需将报错代码重写一遍即可.
- 二模12day1解题报告
T1.笨笨与电影票(ticket) 有n个1和m个0,求每个数前1的个数都大于等于0的个数的排列数. 非常坑的一道题,推导过程很烦.首先求出所有排列数是 C(n+m,m),然后算不合法的个数. 假设存 ...