scikit-learn(project中用的相对较多的模型介绍):2.3. Clustering(可用于特征的无监督降维)
參考:http://scikit-learn.org/stable/modules/clustering.html
在实际项目中,我们真的非常少用到那些简单的模型,比方LR、kNN、NB等。尽管经典,但在project中确实不有用。
今天我们不关注详细的模型,而关注无监督的聚类方法。
之所以关注无监督聚类方法。是由于。在实际项目中,我们除了使用PCA等方法降维外。有时候我们也会考虑使用聚类的方法降维特征。
Overview of clustering methods:
A comparison of the clustering algorithms in scikit-learn
| Method name | Parameters | Scalability | Usecase | Geometry (metric used) |
|---|---|---|---|---|
| K-Means | number of clusters |
Very large n_samples, medium n_clusterswith MiniBatch code |
General-purpose, even cluster size, flat geometry, not too many clusters | Distances between points |
| Affinity propagation | damping, sample preference | Not scalable with n_samples | Many clusters, uneven cluster size, non-flat geometry | Graph distance (e.g. nearest-neighbor graph) |
| Mean-shift | bandwidth | Not scalable withn_samples | Many clusters, uneven cluster size, non-flat geometry | Distances between points |
| Spectral clustering | number of clusters | Medium n_samples, small n_clusters | Few clusters, even cluster size, non-flat geometry | Graph distance (e.g. nearest-neighbor graph) |
| Ward hierarchical clustering | number of clusters | Large n_samples andn_clusters | Many clusters, possibly connectivity constraints | Distances between points |
| Agglomerative clustering | number of clusters, linkage type, distance | Large n_samples andn_clusters | Many clusters, possibly connectivity constraints, non Euclidean distances | Any pairwise distance |
| DBSCAN | neighborhood size | Very large n_samples, medium n_clusters | Non-flat geometry, uneven cluster sizes | Distances between nearest points |
| Gaussian mixtures | many | Not scalable | Flat geometry, good for density estimation | Mahalanobis distances to centers |
| Birch | branching factor, threshold, optional global clusterer. | Large n_clusters andn_samples | Large dataset, outlier removal, data reduction. |
Euclidean distance between points |
scikit-learn(project中用的相对较多的模型介绍):2.3. Clustering(可用于特征的无监督降维)的更多相关文章
- scikit-learn(project中用的相对较多的模型介绍):1.14. Semi-Supervised
參考:http://scikit-learn.org/stable/modules/label_propagation.html The semi-supervised estimators insk ...
- scikit learn 模块 调参 pipeline+girdsearch 数据举例:文档分类 (python代码)
scikit learn 模块 调参 pipeline+girdsearch 数据举例:文档分类数据集 fetch_20newsgroups #-*- coding: UTF-8 -*- import ...
- (原创)(三)机器学习笔记之Scikit Learn的线性回归模型初探
一.Scikit Learn中使用estimator三部曲 1. 构造estimator 2. 训练模型:fit 3. 利用模型进行预测:predict 二.模型评价 模型训练好后,度量模型拟合效果的 ...
- (原创)(四)机器学习笔记之Scikit Learn的Logistic回归初探
目录 5.3 使用LogisticRegressionCV进行正则化的 Logistic Regression 参数调优 一.Scikit Learn中有关logistics回归函数的介绍 1. 交叉 ...
- Scikit Learn: 在python中机器学习
转自:http://my.oschina.net/u/175377/blog/84420#OSC_h2_23 Scikit Learn: 在python中机器学习 Warning 警告:有些没能理解的 ...
- Scikit Learn
Scikit Learn Scikit-Learn简称sklearn,基于 Python 语言的,简单高效的数据挖掘和数据分析工具,建立在 NumPy,SciPy 和 matplotlib 上.
- Linear Regression with Scikit Learn
Before you read This is a demo or practice about how to use Simple-Linear-Regression in scikit-lear ...
- 【359】scikit learn 官方帮助文档
官方网站链接 sklearn.neighbors.KNeighborsClassifier sklearn.tree.DecisionTreeClassifier sklearn.naive_baye ...
- 如何使用scikit—learn处理文本数据
答案在这里:http://www.tuicool.com/articles/U3uiiu http://scikit-learn.org/stable/modules/feature_extracti ...
随机推荐
- JavaScript 笔记(1) -- 基础 & 函数 & 循环 & ...
目录(代码编写): 显示数据 语法 变量 & 变量类型 对象 函数 事件 字符串 运算符 条件语句 循环语句 Break 和 Continue 使用 JS 近两年,现整理下一些基本: HTML ...
- Mysql EXISTS NOT EXISTS
SELECT c.CustomerId, CompanyName FROM Customers c WHERE EXISTS( SELECT OrderID FROM Orders o WHERE o ...
- 冬训 day2
模拟枚举... A - New Year and Buggy Bot(http://codeforces.com/problemset/problem/908/B) 暴力枚举即可,但是直接手动暴力会非 ...
- SQL查询重复记录方法大全 转
原文发布时间为:2010-08-09 -- 来源于本人的百度文章 [由搬家工具导入] 查找所有重复标题的记录: SELECT *FROM t_info aWHERE ((SELECT COUNT(*) ...
- 什麼是 usb upstream port
主機USB埠是定義為USB纜線的上行端(Upstream)或「A」接頭,即PC端.而裝置USB埠是定義為USB纜線的下行端(Downstream)或「B」接頭,即行動產品端. Reference ht ...
- Swift Perfect 服务器配置(Ubuntu16.0.4 主机、虚拟机)
Mac 开发环境 brew install mysql@5.7 && brew link mysql@5.7 --force mysql.server startmysql_secur ...
- hdu 2674(余数性质)
N!Again Time Limit: 2000/1000 MS (Java/Others) Memory Limit: 32768/32768 K (Java/Others)Total Sub ...
- 转载——Visiual Studio2012 CLR20r3问题
看到有更新,习惯性的点了,升级到Visiual Studio Ultimate 2012 Update 1,并且按照提升重启了电脑.因为昨天太晚,也没验证.尽早打开VS,结果直接Crash.错误如下: ...
- commons-lang3-StringUtils
字符串工具类 abbreviate(String str, int maxWidth) 返回一个指定长度加省略号的字符串,maxWidth必须大于3 StringUtils.abbreviate( ...
- 洛谷2085最小函数值(minval) + 洛谷1631序列合并
题目描述 有n个函数,分别为F1,F2,-,Fn.定义Fi(x)=Ai*x^2+Bi*x+Ci (x∈N*).给定这些Ai.Bi和Ci,请求出所有函数的所有函数值中最小的m个(如有重复的要输出多个). ...