An example showing univariate feature selection.

Noisy (non informative) features are added to the iris data and univariate feature selection(单因素特征选择) is applied. For each feature, we plot the p-values for the univariate feature selection and the corresponding weights of an SVM. We can see that univariate feature selection selects the informative features and that these have larger SVM weights.

In the total set of features, only the 4 first ones are significant. We can see that they have the highest score with univariate feature selection. The SVM assigns a large weight to one of these features, but also Selects many of the non-informative features. Applying univariate feature selection before the SVM increases the SVM weight attributed to the significant features, and will thus improve classification.

#encoding:utf-8
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets,svm
from sklearn.feature_selection import SelectPercentile,f_classif ###load iris dateset
iris=datasets.load_iris() ###Some Noisy data not correlated
E=np.random.uniform(0,0.1,size=(len(iris.data),20)) ###uniform distribution 150*20
X=np.hstack((iris.data,E))
y=iris.target plt.figure(1)
plt.clf() X_indices=np.arange(X.shape[-1]) ###X.shape=(150,24) X.shape([-1])=24 selector=SelectPercentile(f_classif,percentile=10)
selector.fit(X,y)
scores=-np.log10(selector.pvalues_)
scores/=scores.max() plt.bar(X_indices-0.45,scores,width=0.2,label=r"Univariate score ($-Log(p_{value})$)",color='darkorange')
# plt.show() ####Compare to weight of an svm
clf=svm.SVC(kernel='linear')
clf.fit(X,y) svm_weights=(clf.coef_**2).sum(axis=0)
svm_weights/=svm_weights.max()
plt.bar(X_indices - .25, svm_weights, width=.2, label='SVM weight',
color='navy')
clf_selected=svm.SVC(kernel='linear')
# clf_selected.fit(selector.transform((X,y)))
clf_selected.fit(selector.transform(X),y) svm_weights_selected=(clf_selected.coef_**2).sum(axis=0)
svm_weights_selected/=svm_weights_selected.max() plt.bar(X_indices[selector.get_support()]-.05,svm_weights_selected,width=.2,label='SVM weight after selection',color='c') plt.title("Comparing feature selection")
plt.xlabel('Feature number')
plt.yticks(())
plt.axis('tight')
plt.legend(loc='upper right')
plt.show()

实验结果:

单因素特征选择--Univariate Feature Selection的更多相关文章

  1. 机器学习概念之特征选择(Feature selection)之RFormula算法介绍

    不多说,直接上干货! RFormula算法介绍: RFormula通过R模型公式来选择列.支持R操作中的部分操作,包括‘~’, ‘.’, ‘:’, ‘+’以及‘-‘,基本操作如下: 1. ~分隔目标和 ...

  2. 机器学习概念之特征选择(Feature selection)之VectorSlicer算法介绍

    不多说,直接上干货! VectorSlicer 算法介绍: VectorSlicer是一个转换器,输入特征向量,输出原始特征向量子集.VectorSlicer接收带有特定索引的向量列,通过对这些索引的 ...

  3. 机器学习概念之特征选择(Feature selection)

    不多说,直接上干货! .

  4. 特征选择与稀疏学习(Feature Selection and Sparse Learning)

    本博客是针对周志华教授所著<机器学习>的"第11章 特征选择与稀疏学习"部分内容的学习笔记. 在实际使用机器学习算法的过程中,往往在特征选择这一块是一个比较让人模棱两可 ...

  5. [Feature] Feature selection

    Ref: 1.13. Feature selection Ref: 1.13. 特征选择(Feature selection) 大纲列表 3.1 Filter 3.1.1 方差选择法 3.1.2 相关 ...

  6. 【转】[特征选择] An Introduction to Feature Selection 翻译

    中文原文链接:http://www.cnblogs.com/AHappyCat/p/5318042.html 英文原文链接: An Introduction to Feature Selection ...

  7. 机器学习-特征选择 Feature Selection 研究报告

    原文:http://www.cnblogs.com/xbinworld/archive/2012/11/27/2791504.html 机器学习-特征选择 Feature Selection 研究报告 ...

  8. highly variable gene | 高变异基因的选择 | feature selection | 特征选择

    在做单细胞的时候,有很多基因属于noise,就是变化没有规律,或者无显著变化的基因.在后续分析之前,我们需要把它们去掉. 以下是一种找出highly variable gene的方法: The fea ...

  9. the steps that may be taken to solve a feature selection problem:特征选择的步骤

    參考:JMLR的paper<an introduction to variable and feature selection> we summarize the steps that m ...

随机推荐

  1. js 表单验证方法二

    function ckReight () { var pass = true; var new = $("#new"); if( new.find('input[name=name ...

  2. CGBitmapContextCreate函数参数详解

    函数原型: CGContextRef CGBitmapContextCreate ( void *data, size_t width, size_t height, size_t bitsPerCo ...

  3. [解决方案] pythonchallenge level 1

    http://www.pythonchallenge.com/pc/def/map.html g fmnc wms bgblr rpylqjyrc gr zw fylb. rfyrq ufyr amk ...

  4. 实现Launcher默认壁纸、选择壁纸定制化功能

    需求功能说明:     该定制需求为在系统中增加一个新的分区如myimage,用以实现存放定制资源.例如在myimage下新建wallpaper文件夹用于存放定制的墙纸图片资源,当Launcher加载 ...

  5. [GodLove]Wine93 Tarining Round #4

    比赛链接: http://acm.hust.edu.cn/vjudge/contest/view.action?cid=44903#overview 题目来源: 2011 Asia ChengDu R ...

  6. 第三个Sprint冲刺第九天

    讨论地点:宿舍 讨论成员:邵家文.李新.朱浩龙.陈俊金 讨论问题:做最后的工作

  7. Cocos2d-x建工程时避免copy文件夹和库

    方法一:(官方做法) 打开F:\cocos2d-1.0.1-x-0.9.1目录下的cocos2d-win32.vc2010.sln文件,然后右键点击解决方案,选择"添加"—&quo ...

  8. Bootstrap模态弹出窗

    Bootstrap模态弹出窗有三种方式: 1.href触发模态弹出窗元素: <a class="btn btn-primary" data-toggle="moda ...

  9. C#编程语言与面向对象——继承

    现实生活中的事物都归属于一定的类别,比如,狮子是一种(IS_A)动物,为了在计算机中模拟这种关系,面向对象的语言引入了继承(inherit)特性. 构成继承关系的两个类中,Animal称为父类(par ...

  10. EL与Velocity基本语法总结:

    El(expression language): 基本语法点: $与{}搭配使用是常态取值 . 与[]的区别,后者可以取特殊值:- .等 支持一些基本的逻辑运算: && || > ...