scikit-learn(project中用的相对较多的模型介绍):2.3. Clustering(可用于特征的无监督降维)
參考:http://scikit-learn.org/stable/modules/clustering.html
在实际项目中,我们真的非常少用到那些简单的模型,比方LR、kNN、NB等。尽管经典,但在project中确实不有用。
今天我们不关注详细的模型,而关注无监督的聚类方法。
之所以关注无监督聚类方法。是由于。在实际项目中,我们除了使用PCA等方法降维外。有时候我们也会考虑使用聚类的方法降维特征。
Overview of clustering methods:
A comparison of the clustering algorithms in scikit-learn
| Method name | Parameters | Scalability | Usecase | Geometry (metric used) |
|---|---|---|---|---|
| K-Means | number of clusters |
Very large n_samples, medium n_clusterswith MiniBatch code |
General-purpose, even cluster size, flat geometry, not too many clusters | Distances between points |
| Affinity propagation | damping, sample preference | Not scalable with n_samples | Many clusters, uneven cluster size, non-flat geometry | Graph distance (e.g. nearest-neighbor graph) |
| Mean-shift | bandwidth | Not scalable withn_samples | Many clusters, uneven cluster size, non-flat geometry | Distances between points |
| Spectral clustering | number of clusters | Medium n_samples, small n_clusters | Few clusters, even cluster size, non-flat geometry | Graph distance (e.g. nearest-neighbor graph) |
| Ward hierarchical clustering | number of clusters | Large n_samples andn_clusters | Many clusters, possibly connectivity constraints | Distances between points |
| Agglomerative clustering | number of clusters, linkage type, distance | Large n_samples andn_clusters | Many clusters, possibly connectivity constraints, non Euclidean distances | Any pairwise distance |
| DBSCAN | neighborhood size | Very large n_samples, medium n_clusters | Non-flat geometry, uneven cluster sizes | Distances between nearest points |
| Gaussian mixtures | many | Not scalable | Flat geometry, good for density estimation | Mahalanobis distances to centers |
| Birch | branching factor, threshold, optional global clusterer. | Large n_clusters andn_samples | Large dataset, outlier removal, data reduction. |
Euclidean distance between points |
scikit-learn(project中用的相对较多的模型介绍):2.3. Clustering(可用于特征的无监督降维)的更多相关文章
- scikit-learn(project中用的相对较多的模型介绍):1.14. Semi-Supervised
參考:http://scikit-learn.org/stable/modules/label_propagation.html The semi-supervised estimators insk ...
- scikit learn 模块 调参 pipeline+girdsearch 数据举例:文档分类 (python代码)
scikit learn 模块 调参 pipeline+girdsearch 数据举例:文档分类数据集 fetch_20newsgroups #-*- coding: UTF-8 -*- import ...
- (原创)(三)机器学习笔记之Scikit Learn的线性回归模型初探
一.Scikit Learn中使用estimator三部曲 1. 构造estimator 2. 训练模型:fit 3. 利用模型进行预测:predict 二.模型评价 模型训练好后,度量模型拟合效果的 ...
- (原创)(四)机器学习笔记之Scikit Learn的Logistic回归初探
目录 5.3 使用LogisticRegressionCV进行正则化的 Logistic Regression 参数调优 一.Scikit Learn中有关logistics回归函数的介绍 1. 交叉 ...
- Scikit Learn: 在python中机器学习
转自:http://my.oschina.net/u/175377/blog/84420#OSC_h2_23 Scikit Learn: 在python中机器学习 Warning 警告:有些没能理解的 ...
- Scikit Learn
Scikit Learn Scikit-Learn简称sklearn,基于 Python 语言的,简单高效的数据挖掘和数据分析工具,建立在 NumPy,SciPy 和 matplotlib 上.
- Linear Regression with Scikit Learn
Before you read This is a demo or practice about how to use Simple-Linear-Regression in scikit-lear ...
- 【359】scikit learn 官方帮助文档
官方网站链接 sklearn.neighbors.KNeighborsClassifier sklearn.tree.DecisionTreeClassifier sklearn.naive_baye ...
- 如何使用scikit—learn处理文本数据
答案在这里:http://www.tuicool.com/articles/U3uiiu http://scikit-learn.org/stable/modules/feature_extracti ...
随机推荐
- jquery 获取 checkbox 的 checked 状态问题
这个郁闷了,今天写这个功能的时候发现了问题,上网找了好多资料对照,更加纠结... 事实证明一切,自己测试了N遍,发现网上的说法和自己以前的理解都是错的,不知道大家有没发现. 下面来看看网上大多资料的说 ...
- python(7)-- 文件I/O
1 打印到屏幕:print 语句.你可以给它传递零个或多个用逗号隔开的表达式.此函数把你传递的表达式转换成一个字符串表达式,并将结果写到标准输出,eg:print "Python 是一个非常 ...
- linux中的strip命令简介------给文件脱衣服【转】
转自:http://blog.csdn.net/stpeace/article/details/47090255 版权声明:本文为博主原创文章,转载时请务必注明本文地址, 禁止用于任何商业用途, 否则 ...
- 和菜鸟一起学android4.0.3源码之lcd屏幕背光调节
周六的中午还是依旧来了公司,本来也没有打算来的,既然来了,那就把上次遗留下来的一些问题给解决吧,把android下的pwm调lcd背光给总结下吧.关于android的背光,是用pwm波来控制的,通过占 ...
- java基础(1-50)-------->超级简单,不信你不会!!!
1:java中的保留字:const&goto; 2:&和&&都可以做逻辑运算符,即运算符两边的表达式都为true,结果才为true,一方为false,则结果为false ...
- OpenMP参考链接
做个笔记. http://www.cnblogs.com/China3S/p/3500507.html
- Process 'command 'D:\IDE\SDK\build-tools\28.0.3\aapt.exe'' finished with non-zero exit value 1问题分析解决
当在Android Studio的XML布局文件写错属性或单词拼错时,会出现如下所列的错误,而AS编辑器又没任何提示, 再次点击下方的"Run build",也只能得到:app:p ...
- AC日记——3的幂的和 51nod 1013
3的幂的和 思路: 矩阵快速幂: sn-1 3 1 sn * = 1 0 1 1 来,上代码: #include <cstdio> ...
- Tallest Cow
题目描述 FJ's N (1 ≤ N ≤ 10,000) cows conveniently indexed 1..N are standing in a line. Each cow has a p ...
- 学习GRPC(一) 简单实现
Grpc 实现流程图 资料 https://grpc.io/docs/quickstart/go/ https://studygolang.com/articles/16627 使用方法 make r ...