scikit-learn

Machine Learning in Python

Simple and efficient tools for data mining and data analysis
Accessible to everybody, and reusable in various contexts
Built on NumPy, SciPy, and matplotlib
Open source, commercially usable - BSD license

http://scikit-learn.org/stable/index.html

sklearn中算法有四类，分类，回归，聚类，降维。

分类和回归是监督式学习，即每个数据对应一个 label。

聚类是非监督式学习，即没有 label。

降维，当数据集有很多很多属性的时候，可以通过降维算法把属性归纳起来。例如 20 个属性只变成 2 个，注意，这不是挑出 2 个，而是压缩成为 2 个，它们集合了 20 个属性的所有特征，相当于把重要的信息提取的更好，不重要的信息就不要了。

然后看问题属于哪一类问题，是分类还是回归，还是聚类，就选择相应的算法。当然还要考虑数据的大小，例如 100K 是一个阈值。

可以发现有些方法是既可以作为分类，也可以作为回归，例如 SGD

监督学习（supervised learning）：监督学习的任务是学习一个模型，使模型能够对任意一个输入给出一个预测的输出，监督学习是统计学的一个重要分支。

from sklearn import datasets

from sklearn.model_selection import train_test_split

from sklearn.neighbors import KNeighborsClassifier

#下载iris数据集

iris = datasets.load_iris()
#将数据的data部分和target进行赋值， data包含iris花朵的长宽和茎的长宽

iris_X = iris.data

iris_Y = iris.target

iris_X

Out[9]:

array([[5.1, 3.5, 1.4, 0.2],

       [4.9, 3. , 1.4, 0.2],

       [4.7, 3.2, 1.3, 0.2],

       。。。 。。。

       [6.7, 3. , 5.2, 2.3],

       [6.3, 2.5, 5. , 1.9],

       [6.5, 3. , 5.2, 2. ],

       [6.2, 3.4, 5.4, 2.3],

       [5.9, 3. , 5.1, 1.8]])

#iris_Y是花的种类，共三种类型

iris_Y

Out[10]:

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,

       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,

       0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,

       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,

       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,

       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,

       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

#将数据分为训练集合测试集， 用到sklearn API train_test_split, test_size=0.3代表测试集占总数据集的30%。

X_train, X_test, y_train, y_test = train_test_split(iris_X, iris_Y, test_size=0.3)

y_train

Out[13]:

array([2, 0, 0, 0, 0, 2, 2, 0, 2, 0, 1, 2, 0, 2, 1, 1, 1, 1, 1, 2, 2, 2,

       1, 2, 0, 0, 1, 2, 2, 1, 1, 1, 2, 1, 2, 1, 1, 0, 0, 1, 1, 1, 0, 0,

       0, 0, 0, 2, 0, 0, 2, 2, 0, 2, 2, 2, 1, 2, 1, 2, 0, 0, 2, 2, 0, 2,

       0, 2, 0, 1, 1, 1, 2, 0, 2, 1, 2, 1, 2, 2, 0, 1, 2, 0, 1, 2, 0, 0,

       2, 0, 1, 1, 2, 2, 0, 0, 1, 2, 1, 1, 2, 0, 0, 0, 1])

X_train

Out[14]:

array([[6.7, 3.1, 5.6, 2.4],

       [5.4, 3.4, 1.7, 0.2],

       [5.1, 3.8, 1.9, 0.4],

      。。。 。。。

       [5.4, 3.9, 1.7, 0.4],

       [4.6, 3.4, 1.4, 0.3],

       [5.5, 3.5, 1.3, 0.2],

       [5.5, 2.6, 4.4, 1.2]])

#建立模型

knn = KNeighborsClassifier()
#训练

knn.fit(X_train, y_train)

Out[16]:

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',

           metric_params=None, n_jobs=1, n_neighbors=5, p=2,

           weights='uniform')
#预测

knn.predict(X_test)

Out[17]:

array([0, 0, 2, 2, 1, 2, 0, 0, 1, 1, 0, 2, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0,

       0, 0, 2, 2, 2, 0, 1, 0, 2, 2, 1, 1, 1, 2, 2, 0, 1, 0, 2, 1, 2, 1,

       1])
#对比预测值和测试值

y_test

Out[18]:

array([0, 0, 2, 1, 1, 2, 0, 0, 2, 1, 0, 2, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0,

       0, 0, 2, 1, 2, 0, 1, 0, 2, 2, 1, 1, 1, 2, 2, 0, 1, 0, 2, 1, 2, 1,

       1])

AI-sklearn 学习笔记（一）sklearn 一般概念的更多相关文章

.NET Remoting学习笔记（一）概念
目录 .NET Remoting学习笔记(一)概念 .NET Remoting学习笔记(二)激活方式 .NET Remoting学习笔记(三)信道背景自接触编程以来,一直听过这个名词Remotin ...
【转载】.NET Remoting学习笔记（一）概念
目录 .NET Remoting学习笔记(一)概念 .NET Remoting学习笔记(二)激活方式 .NET Remoting学习笔记(三)信道背景自接触编程以来,一直听过这个名词Remotin ...
【学习笔记】sklearn数据集与估计器
数据集划分机器学习一般的数据集会划分为两个部分: 训练数据:用于训练,构建模型测试数据:在模型检验时使用,用于评估模型是否有效训练数据和测试数据常用的比例一般为:70%: 30%, 80%: 2 ...
sklearn学习笔记1
Image recognition with Support Vector Machines #our dataset is provided within scikit-learn #let's s ...
sklearn学习笔记之简单线性回归
简单线性回归线性回归是数据挖掘中的基础算法之一,从某种意义上来说,在学习函数的时候已经开始接触线性回归了,只不过那时候并没有涉及到误差项.线性回归的思想其实就是解一组方程,得到回归函数,不过在出现误 ...
sklearn学习笔记3
Explaining Titanic hypothesis with decision trees decision trees are very simple yet powerful superv ...
sklearn学习笔记2
Text classifcation with Naïve Bayes In this section we will try to classify newsgroup messages using ...
sklearn学习笔记
用Bagging优化模型的过程:1.对于要使用的弱模型(比如线性分类器.岭回归),通过交叉验证的方式找到弱模型本身的最好超参数:2.然后用这个带着最好超参数的弱模型去构建强模型:3.对强模型也是通过交 ...
sklearn学习笔记（一）——数据预处理 sklearn.preprocessing
https://blog.csdn.net/zhangyang10d/article/details/53418227 数据预处理 sklearn.preprocessing 标准化 (Standar ...
sklearn学习笔记之岭回归
岭回归岭回归是一种专用于共线性数据分析的有偏估计回归方法,实质上是一种改良的最小二乘估计法,通过放弃最小二乘法的无偏性,以损失部分信息.降低精度为代价获得回归系数更为符合实际.更可靠的回归方法,对病 ...

随机推荐

Sonys TRC save data plolicy
帖子地址:https://forums.unrealengine.com/showthread.php?130820-Sonys-TRC-save-data-plolicy we had severa ...
[CSP-S模拟测试]:树（树上上升序列+主席树+线段树）
题目传送门(内部题78) 输入格式第一行输入两个整数$n,q$,表示节点数和询问数. 第二行输入$n$个整数$w_i$,表示第$i$个点的智商. 第三行至第$n+1$行每行输入两个数$x,y$,表示 ...
HTML,CSS,JavaScript的思维导图
一个思维导图是把抽象的事物具体化,以一个东西为思想核心内容,映射出一系列的组成及作用影响的内容. HTML的思维导图 HTML是一种超文本标记语言.我认为要学习一门语言首先要知道其是什么,编辑工具是 ...
view组件
view标签的属性值: hover-class:按下的点击态属性值:字符串如果:hover-class="none" 按下就没有点击态 hover-stop-pro ...
iOS 命令行打包--xcworkspace
参考: 打包的具体操作步骤: https://www.jianshu.com/p/6a0aa8cd2e97 打包时使用到的参数详解,参考这篇: https://debugtalk.com/post/i ...
format和urlencode的使用对比
一:format的基本语法使用基本语法是通过 {} 和 : 来代替以前的 % . format 函数可以接受不限个参数,位置可以不按顺序. 例如: >>>"{} {}&q ...
*args 和**kwargs 的理解以及函数的参数的总结
一:函数参数的理解: def 函数名(函数参数): 函数体例如: def func(a): # a 是形参 print(a) func(123) # 123 是实参形参又分为: 关键字参数,位置参 ...
python-网络编程urllib模块
一.操作网络发送请求 from urllib.request import urlopen #发送请求 from urllib.parse import urlencode #用来把字典形式转 ...
005-unity3d 添加背景音乐、音效以及天空盒子
一.基础知识 1.项目中需要有AudioListener,播放器中播放的声音就是AudioListener组件坐在的位置听到的声音.默认AudioListener是放到Main Camera上.没有A ...
三种方式创建bean对象在springIOC容器中初始化、销毁阶段要调用的自定义方法
1. 使用@Bean注解定义initMethod和destroyMethod 所谓initMethod和destroyMethod,是指在springIOC容器中,对于bean对象执行到初始化阶段和销 ...

AI-sklearn 学习笔记（一）sklearn 一般概念

scikit-learn

Machine Learning in Python

AI-sklearn 学习笔记（一）sklearn 一般概念的更多相关文章

随机推荐

热门专题