The Practical Importance of Feature Selection(变量筛选重要性)
python机器学习-乳腺癌细胞挖掘(博主亲自录制视频)
https://study.163.com/course/introduction.htm?courseId=1005269003&utm_campaign=commission&utm_source=cp-400000000398149&utm_medium=share
原文链接
https://www.kdnuggets.com/2017/06/practical-importance-feature-selection.html
Feature selection is useful on a variety of fronts: it is the best weapon against the Curse of Dimensionality; it can reduce overall training times; and it is a powerful defense against overfitting, increasing generalizability.
特征选择在各个方面都很有用:它是反对过度拟合的最佳武器; 它可以减少整体培训时间; 它是对过度拟合,增加普遍性的有力防御。
By Matthew Mayo, KDnuggets.
If you wanted to classify animals, for example, based on a plethora of relevant collected data, you would quickly find that all sorts of potential data attributes, or features, were relatively unhelpful for classification. For example, given that most living creatures have precisely 1 heart, this particular feature would not be beneficial, from a learning perspective. On the other hand, an attribute denoting whether or not a given animal is hoofed would likely be a powerful predictor.
如果您想对动物进行分类,例如,基于过多的相关收集数据,您会很快发现各种潜在的数据属性或特征对于分类而言相对无益。例如,鉴于大多数生物只有1颗心脏,从学习的角度来看,这一特殊功能并不是有益的。另一方面,表示给定动物是否有蹄的属性可能是强有力的预测因子。
Further, using all of these irrelevant attributes, mixed in with the powerful predictors, may actually have a negative effect on the resulting model. This is to say nothing of the increased training times that may come along with the inclusion of useless attributes, or the overfitting which may occur on the training data.
此外,使用所有这些无关属性,与强大的预测变量混合,实际上可能对结果模型产生负面影响。这也就是说,可能伴随着包含无用属性或训练数据可能出现的过度拟合而增加的训练时间。
Feature selection is the process of narrowing down a subset of features, or attributes, to be used in the predictive modeling process. Feature selection is useful on a variety of fronts: it is the best weapon against the Curse of Dimensionality; it can reduce overall training times; and it is a powerful defense against overfitting, increasing model generalizability.
特征选择是缩小要在预测建模过程中使用的特征或属性子集的过程。特征选择在各个方面都很有用:它是反对维度诅咒的最佳武器; 它可以减少整体培训时间; 它是对过度拟合的强大防御,增加了模型的普遍性。
Something I read recently -- written so eloquently and concisely by data scientist Rubens Zimbres -- alludes to the importance of feature selection from a practical standpoint:
After some experiences, using stacked neural nets, parallel neural nets, asymmetric configs, simple neural nets, multiple layers, dropouts, activation functions etc there is one conclusion: There's NOTHING like a good Feature Selection.
Having had some previous professional contacts with Rubens Zimbres in the past, I reached out to him for some elaboration. He provided the following:
Feature selection should be one of the main concerns for a Data Scientist. Accuracy and generalization power can be leveraged by a correct feature selection, based in correlation, skewness, t-test, ANOVA, entropy and information gain.
Many times a correct feature selection allows you to develop simpler and faster Machine Learning models. Consider the picture below (Support Vector Machine classification of the IRIS dataset): on the left side a wrong variable selection is presented. The linear kernel cannot handle the classification task properly, neither the radial basis function kernel. On the right side, petal width and petal length were selected as features and even the linear kernel is quite accurate. A correct variable selection, a good algorithm choice and hyperparameter tuning are the keys to success. Picture below made with Python.
特征选择应该是数据科学家的主要关注点之一。基于相关性,偏度,t检验,ANOVA,熵和信息增益,通过正确的特征选择可以利用准确性和泛化能力。
很多时候,正确的功能选择可以让您开发更简单,更快速的机器学习模型。考虑下面的图片(IRIS数据集的支持向量机分类):在左侧显示错误的变量选择。线性内核无法正确处理分类任务,也不能处理径向基函数内核。在右侧,选择花瓣宽度和花瓣长度作为特征,甚至线性内核也非常准确。正确的变量选择,良好的算法选择和超参数调整是成功的关键。下面用Python制作的图片。
In a time when ample processing power can tempt us to think that feature selection may not be as relevant as it once was, it's important to remember that this only accounts for one of the numerous benefits of informed feature selection -- decreased training times. As Zimbres notes above, with a simple concrete example, feature selection can quite literally mean the difference between valid, generalizable models and a big waste of time.
在充足的处理能力可以诱使我们认为特征选择可能不像以前那样具有相关性的时代,重要的是要记住,这仅仅是知情特征选择的众多好处之一 - 减少了训练时间。 正如Zimbres上面所说,通过一个简单的具体例子,特征选择可以完全意味着有效的,可推广的模型之间的差异和浪费大量时间。
https://study.163.com/provider/400000000398149/index.htm?share=2&shareId=400000000398149( 欢迎关注博主主页,学习python视频资源,还有大量免费python经典文章)
The Practical Importance of Feature Selection(变量筛选重要性)的更多相关文章
- Feature Selection Can Reduce Overfitting And RF Show Feature Importance
一.特征选择可以减少过拟合代码实例 该实例来自机器学习实战第四章 #coding=utf-8 ''' We use KNN to show that feature selection maybe r ...
- 【转】[特征选择] An Introduction to Feature Selection 翻译
中文原文链接:http://www.cnblogs.com/AHappyCat/p/5318042.html 英文原文链接: An Introduction to Feature Selection ...
- 数据准备<5>:变量筛选-实战篇
在上一篇文章<数据准备<4>:变量筛选-理论篇>中,我们介绍了变量筛选的三种方法:基于经验的方法.基于统计的方法和基于机器学习的方法,本文将介绍后两种方法在Python(skl ...
- the steps that may be taken to solve a feature selection problem:特征选择的步骤
參考:JMLR的paper<an introduction to variable and feature selection> we summarize the steps that m ...
- [Feature] Feature selection
Ref: 1.13. Feature selection Ref: 1.13. 特征选择(Feature selection) 大纲列表 3.1 Filter 3.1.1 方差选择法 3.1.2 相关 ...
- [Feature] Feature selection - Embedded topic
基于惩罚项的特征选择法 一.直接对特征筛选 Ref: 1.13.4. 使用SelectFromModel选择特征(Feature selection using SelectFromModel) 通过 ...
- Feature Engineering and Feature Selection
首先,弄清楚三个相似但是不同的任务: feature extraction and feature engineering: 将原始数据转换为特征,以适合建模. feature transformat ...
- 机器学习-特征工程-Feature generation 和 Feature selection
概述:上节咱们说了特征工程是机器学习的一个核心内容.然后咱们已经学习了特征工程中的基础内容,分别是missing value handling和categorical data encoding的一些 ...
- 单因素特征选择--Univariate Feature Selection
An example showing univariate feature selection. Noisy (non informative) features are added to the i ...
随机推荐
- Linux中通过ssh将客户端与服务端的远程连接
前提需要:1.在VMware中装上两台linux虚拟机,本博客使用的都是CentOS 7.2.两部虚拟机可以通过命令ping通.3.两部虚拟机中已经通过yum本地仓库安装了sshd服务. 首先 1. ...
- Ubuntu 18.04 使用apt-get 华为源支持 arm64 鲲鹏处理器
网上搜的源,什么阿里云163等等的,都不支持arm64 执行以下代码,使用华为源 wget -O /etc/apt/sources.list https://repo.huaweicloud.com/ ...
- 使用Cloudera Manager部署oozie
使用Cloudera Manager部署oozie 作者:尹正杰 版权声明:原创作品,谢绝转载!否则将追究法律责任. 1>.进入CM服务安装向导 2>.选择要添加的oozie服务 3> ...
- 安装 uwsgi报错解决
背景: 安装 uwsgi时报错如下,查阅相关资料说是 python-devel的问题,于是安装之后python-devel后问题解决 报错如下: (venv) [xxxxxxx]# pip insta ...
- 行为型模式(六) 状态模式(State)
一.动机(Motivate) 在软件构建过程中,某些对象的状态如果改变,其行为也会随之而发生变化,比如文档处于只读状态,其支持的行为和读写状态支持的行为就可能完全不同. 如何在运行时根据对象的状态 ...
- PPT扁平化设计总结
注:以下内容基本都来自知乎,由于已经不记得网址了,所以未能附上所有相关链接,抱歉. PPT扁平化设计原则一.亲密:意思相近的内容放在一起二.对齐:页面上的某两个元素之间总是围绕一条直线对齐三.对比:有 ...
- (转载) 搭建非域AlwaysOn win2016+SQL2016
非域搭建Alwayson只是省去搭建域控那一部分,其他大同小异 条件: 操作系统:windows server 2016 数据库:SQL Server 2016 SSMS版本:17.3 节点1:HDD ...
- MySQL常用五大引擎的区别
MyISAM: 如果你有一个 MyISAM 数据表包含着 FULLTEXT 或 SPATIAL 索引,你将不能把它转换为使用 另一种引擎,因为只有 MyISAM 支持这两种索引. BLOB: 如果你有 ...
- C++语言第一课的学习
// HelloApp.cpp: 定义控制台应用程序的入口点. // #include "stdafx.h" #include <iostream> #include ...
- C#实现上传/下载Excel文档
要求 环境信息:WIN2008SERVER 开发工具:VS2015 开发语言:C# 要求: 1.点击同步数据后接口获取数据展示页面同时过滤无效数据并写入数据库,数据可导出Excel并支持分类导出 2 ...