Ref: 1.13. Feature selection

Ref: 1.13. 特征选择(Feature selection)

大纲列表

3.1 Filter

3.1.1 方差选择法

3.1.2 相关系数法

3.1.3 卡方检验

3.1.4 互信息法

3.2 Wrapper

3.2.1 递归特征消除法

3.3 Embedded

3.3.1 基于惩罚项的特征选择法

3.3.2 基于树模型的特征选择法

所属方式 说明
VarianceThreshold Filter 方差选择法
SelectKBest Filter 可选关联系数、卡方校验、最大信息系数作为得分计算的方法
RFE Wrapper 递归地训练基模型,将权值系数较小的特征从特征集合中消除
SelectFromModel Embedded 训练基模型,选择权值系数较高的特征

策略依据

从两个方面考虑来选择特征:

    • 特征是否发散:如果一个特征不发散,例如方差接近于0,也就是说样本在这个特征上基本上没有差异,这个特征对于样本的区分并没有什么用。
    • 特征与目标的相关性:这点比较显见,与目标相关性高的特征,应当优选选择。除方差法外,本文介绍的其他方法均从相关性考虑。

  根据特征选择的形式又可以将特征选择方法分为3种:

    • Filter:过滤法,按照发散性或者相关性对各个特征进行评分,设定阈值或者待选择阈值的个数,选择特征。
    • Wrapper:包装法,根据目标函数(通常是预测效果评分),每次选择若干特征,或者排除若干特征。
    • Embedded:嵌入法,先使用某些机器学习的算法和模型进行训练,得到各个特征的权值系数,根据系数从大到小选择特征。类似于Filter方法,但是是通过训练来确定特征的优劣。

特征选择


Filter

一、方差选择法

假设我们有一个带有布尔特征的数据集,我们要移除那些超过80%的数据都为1或0的特征。

结论:第一列被移除。

>>> from sklearn.feature_selection import VarianceThreshold
>>> X = [[0, 0, 1], [0, 1, 0], [1, 0, 0], [0, 1, 1], [0, 1, 0], [0, 1, 1]]
>>> sel = VarianceThreshold(threshold=(.8 * (1 - .8)))
>>> sel.fit_transform(X)
array([[0, 1],
[1, 0],
[0, 0],
[1, 1],
[1, 0],
[1, 1]])

二、卡方检验

支持稀疏数据。常用的两个API:

(1) SelectKBest 移除得分前  名以外的所有特征

(2) SelectPercentile 移除得分在用户指定百分比以后的特征

from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import chi2 # 找到最佳的2个特征
rst = SelectKBest(chi2, k=2).fit_transform(iris.data, iris.target)
print(rst[:5])

参数设置

加入噪声列属性(特征),检测打分机制。

(1) 用于回归: f_regression

(2) 用于分类: chi2 or f_classif

#%%
print(__doc__) import numpy as np
import matplotlib.pyplot as plt from sklearn import datasets, svm
from sklearn.feature_selection import SelectPercentile, f_classif, chi2 ###############################################################################
# import some data to play with # The iris dataset
iris = datasets.load_iris() # Some noisy data not correlated
E = np.random.uniform(0, 0.1, size=(len(iris.data), 20)) # Add the noisy data to the informative features
X = np.hstack((iris.data, E))
y = iris.target ###############################################################################
plt.figure(1)
plt.clf() X_indices = np.arange(X.shape[-1]) ###############################################################################
# Univariate feature selection with F-test for feature scoring
# We use the default selection function: the 10% most significant features
# selector = SelectPercentile(f_classif, percentile=10)
selector = SelectPercentile(chi2, percentile=10)
selector.fit(X, y)
scores = -np.log10(selector.pvalues_)
scores /= scores.max()
plt.bar(X_indices - .45, scores, width=.2,
label=r'Univariate score ($-Log(p_{value})$)', color='g')

f_classif 的结果

aaarticlea/png;base64,iVBORw0KGgoAAAANSUhEUgAAAXQAAAD4CAYAAAD8Zh1EAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjAsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+17YcXAAAQaUlEQVR4nO3dfWiV9f/H8demCylNq2mDc05O6lgToZmeGZhkdDOPoCsq2DKykiNBMwqJ0Q1sI4gsQoJWyMFISxtLM0/UmIgJBi0uc07nNt2hlTstb6ZpRYXLru8f/To/5zavze3s2HvPBxw817k+51xvL45PDud0ThmSXAEA/vMy0z0AAGB4EHQAMIKgA4ARBB0AjCDoAGDE2HQd+Pjx4/rhhx/SdXgA+E+aOnWqpkyZ0ue+tAX9hx9+UCgUStfhAeA/yXGcfvfxlgsAGEHQAcAIgg4ARhB0ADCCoAOAEQQdAIzwDPq6det07NgxHThwoN81b731ltra2tTY2KhZs2YN64AAgIHxDPr777+vhQsX9rs/HA4rGAwqGAxqxYoVevfdd4d1QADAwHgGfffu3Tp16lS/+4uKirRhwwZJ0jfffKNJkyYpJydn+CYEAAzIkN9D9/l86ujoSG4nEgn5fL4+10YiETmOI8dxlJ2dPdRDj4yK/7sAwGVuyEHPyMjodZvr9v0/QYpGowqFQgqFQurq6hrqoQEA5xly0BOJhAKBQHLb7/ers7NzqA8LABikIQc9FovpsccekyTNnTtXZ86c0dGjR4c8GABgcDx/bXHTpk1asGCBsrOz1dHRofLycmVlZUmS1q5dqy+++EKLFi1SPB7X77//rieeeCLlQwMAevMM+iOPPOL5IKWlpcMyDADg0vFNUQAwgqADgBEEHQCMIOgAYARBBwAjCDoAGEHQAcAIgg4ARhD0VKgQv9AIYMQRdAAwgqADgBEEHQCMIOgAYARBBwAjCDoAGEHQAcAIgg4ARhB0ADCCoAOAEQQdAIwg6ABgBEEHACMIOgAYQdABwAiCDgBGEHQAMIKgA4ARBB0AjCDoAGAEQQcAIwg6ABhB0AHAiAEFvbCwUK2trWpra1NZWVmv/YFAQDt37tTevXvV2NiocDg87IMCAC7OM+iZmZmqqqpSOBzWjBkzVFJSory8vB5rXn75ZdXU1Oi2225TcXGx3nnnnZQNDADom2fQCwoKFI/H1d7eru7ublVXV6uoqKjHGtd1dfXVV0uSJk6cqM7OztRMCwDo11ivBT6fTx0dHcntRCKhuXPn9lhTUVGh7du3a+XKlbrqqqt0zz339PlYkUhEK1askCRlZ2cPZW4AwAU8X6FnZGT0us113R7bJSUlev/99xUIBLRo0SJ98MEHfd4vGo0qFAopFAqpq6trCGMDAC7kGfREIqFAIJDc9vv9vd5SWb58uWpqaiRJ9fX1GjduHK/AAWCEeQbdcRwFg0Hl5uYqKytLxcXFisViPdYcOXJEd999tyTplltu0bhx43TixInUTAwA6JNn0M+dO6fS0lLV1dWppaVFNTU1am5uVmVlpRYvXixJWrVqlSKRiPbt26ePPvpIjz/+eKrnBgBcwPNDUUmqra1VbW1tj9vKy8uT11taWnTHHXcM72QAgEHhm6IAYARBBwAjCDoAGEHQAcAIgg4ARhB0ADCCoAOAEQQdAIwg6ABgBEEHACMIOgAYQdABwAiCDgBGEHQAMIKgA4ARBB0AjCDoAGAEQQcAIwg6ABhB0AHACIIOAEYQdAAwgqADgBEEHQCMIOgAYARBBwAjCDoAGEHQAcAIgg4ARhB0ADCCoAOAEQQdAIwYUNALCwvV2tqqtrY2lZWV9bnm4Ycf1sGDB9XU1KSNGzcO65AAAG9jvRZkZmaqqqpK9957rxKJhBzHUSwWU0tLS3LNTTfdpBdeeEHz5s3T6dOnNXny5JQODQDozfMVekFBgeLxuNrb29Xd3a3q6moVFRX1WBOJRFRVVaXTp09Lkk6cOJGaaQEA/fIMus/nU0dHR3I7kUjI5/P1WDN9+nRNnz5dX331lb7++msVFhYO/6QAgIvyfMslIyOj122u6/Z8kLFjFQwGtWDBAvn9fu3evVszZ87UmTNneqyLRCJasWKFJCk7O3socwMALuD5Cj2RSCgQCCS3/X6/Ojs7e63Ztm2b/vrrL33//fc6dOiQgsFgr8eKRqMKhUIKhULq6uoahvEBAP/yDLrjOAoGg8rNzVVWVpaKi4sVi8V6rPn000911113SZKuu+46TZ8+Xd99911qJgYA9Mkz6OfOnVNpaanq6urU0tKimpoaNTc3q7KyUosXL5Yk1dXV6eTJkzp48KC+/PJLPf/88zp16lTKhwcA/L8MSa7nqhRwHEehUCgdhx6cigv+TNV9AGAALtZOvikKAEYQdAAwgqADgBEEHQCMIOgAYARBBwAjCDoAGEHQAcAIgg4ARhB0ADCCoAOAEQQdAIwg6ABgBEEHACMIOgAYQdABwAiCDgBGEHQAMIKgA4ARBB0AjCDoAGAEQQcAIwg6ABhB0AHACIIOAEYQdAAwgqADgBEEHQCMIOgAYARBBwAjCDoAGEHQAcAIgg4ARgwo6IWFhWptbVVbW5vKysr6Xffggw/KdV3Nnj172AYEAAyMZ9AzMzNVVVWlcDisGTNmqKSkRHl5eb3WjR8/Xs8884zq6+tTMigA4OI8g15QUKB4PK729nZ1d3erurpaRUVFvda98sorev311/Xnn3+mZFAAwMV5Bt3n86mjoyO5nUgk5PP5eqzJz89XIBDQ559/ftHHikQichxHjuMoOzv7EkcGAPTFM+gZGRm9bnNdt8f+NWvWaNWqVZ4Hi0ajCoVCCoVC6urqGuSoAICL8Qx6IpFQIBBIbvv9fnV2dia3J0yYoJkzZ2rXrl1qb2/X7bffrlgsxgejADDCPIPuOI6CwaByc3OVlZWl4uJixWKx5P5ffvlFkydP1rRp0zRt2jTV19dryZIl+vbbb1M6OACgJ8+gnzt3TqWlpaqrq1NLS4tqamrU3NysyspKLV68eCRmBAAMwNiBLKqtrVVtbW2P28rLy/tce9dddw19KgDAoPFNUQAwgqADgBEEHQCMIOgAYARBBwAjCDoAGDGg/2zxslPRz3UAGMV4hQ4ARhB0ADCCoAOAEQQdAIwg6ABgBEEHACMIOgAYQdABwAiCDgBGEHQAMIKgA4ARBB0AjCDoAGAEQQcAIwg6ABhB0AHACIIOAEYQdAAwgqADgBEEHQCMIOgAYARBBwAjCDoAGEHQAcAIgg4ARgwo6IWFhWptbVVbW5vKysp67X/uued08OBBNTY2aseOHbrhhhuGfVAAwMV5Bj0zM1NVVVUKh8OaMWOGSkpKlJeX12NNQ0OD5syZo1tvvVWbN2/W66+/nrKBAQB98wx6QUGB4vG42tvb1d3drerqahUVFfVYs2vXLv3xxx+SpPr6evn9/tRMCwDol2fQfT6fOjo6ktuJREI+n6/f9cuXL1dtbW2f+yKRiBzHkeM4ys7OvoRxAQD9Geu1ICMjo9dtruv2uXbp0qWaM2eO7rzzzj73R6NRRaNRSZLjOIOZEwDgwTPoiURCgUAgue33+9XZ2dlr3d13362XXnpJd955p86ePTu8UwIAPHm+5eI4joLBoHJzc5WVlaXi4mLFYrEea/Lz87V27VotWbJEJ06cSNmwAID+eb5CP3funEpLS1VXV6cxY8bovffeU3NzsyorK7Vnzx599tlneuONNzR+/Hh9/PHHkqQjR470+uA07Sr6uQ4ARngGXZJqa2t7fdBZXl6evH7vvfcO71QAgEHjm6IAYARBBwAjCDoAGEHQAcAIgg4ARhB0ADCCoAOAEQQdAIwg6ABgBEEHACMIOgAYQdABwAiCDgBGEHQAMIKgA4ARBB0AjCDoAGAEQQcAIwg6ABhB0AHACIIOAEYQdAAwgqADgBEEHQCMIOgAYARBBwAjCDoAGEHQAcAIgg4ARhB0ADCCoAOAEWPTPQAAjAoV/VwfRrxCBwAjBhT0wsJCtba2qq2tTWVlZb32X3HFFaqurlZbW5vq6+s1derUYR8U/zEV511GswpxDipk7zxU6LL8O3kGPTMzU1VVVQqHw5oxY4ZKSkqUl5fXY83y5cv1888/KxgMas2aNVq9enXKBgbMq9DgY3Ep98E/KmTmvHm+h15QUKB4PK729nZJUnV1tYqKitTS0pJcU1RUpIqKCknS5s2b9fbbb6dmWssqLvhzMPcZ7P0G63I+zqXc5/y1g7kPRu65cP7jD+Y4l3IfQzIkuRdb8OCDD2rhwoWKRCKSpEcffVRz587VypUrk2sOHDighQsX6scff5QkxeNxzZ07VydPnuzxWJFIRCtWrJAk3XzzzTp06NBw/l2SsrOz1dXVlZLH/q/gHPyD88A5+JeV8zB16lRNmTKlz32er9AzMjJ63ea67qDXSFI0GlU0GvU65JA5jqNQKJTy41zOOAf/4DxwDv41Gs6D53voiURCgUAgue33+9XZ2dnvmjFjxmjixIk6derUMI8KALgYz6A7jqNgMKjc3FxlZWWpuLhYsVisx5pYLKZly5ZJkh566CHt3LkzNdMCAPo1Rh4fH7iuq7a2Nm3cuFErV67Uhx9+qE8++USVlZWaMGGCDh8+rP3792vp0qV69dVXlZ+fr6eeekqnT58emb9BP/bu3ZvW418OOAf/4DxwDv5l/Tx4figKAPhv4JuiAGAEQQcAI0wF3esnCkaL9vZ27d+/Xw0NDXIcJ93jjJh169bp2LFjOnDgQPK2a665Rtu3b9fhw4e1fft2TZo0KY0Tpl5f56C8vFyJREINDQ1qaGhQOBxO44Sp5/f7tXPnTjU3N6upqUnPPPOMpNHzXHAtXDIzM914PO5OmzbNzcrKcvft2+fm5eWlfa50XNrb293rrrsu7XOM9GX+/PnurFmz3AMHDiRvW716tVtWVuZKcsvKytzXXnst7XOO9DkoLy93V61alfbZRuqSk5Pjzpo1y5Xkjh8/3j106JCbl5c3Kp4LZl6hn/8TBd3d3cmfKMDosXv37l7ffygqKtL69eslSevXr9f999+fjtFGTF/nYLQ5evSoGhoaJEm//fabWlpa5PP5RsVzwUzQfT6fOjo6ktuJREI+ny+NE6WP67ravn279uzZk/zJhtHq+uuv19GjRyX98w+9v69MW1daWqrGxkatW7fO7FsNfZk6dapmzZqlb775ZlQ8F8wEfaA/PzAazJs3T7Nnz1Y4HNbTTz+t+fPnp3skpNG7776rG2+8Ufn5+frpp5/05ptvpnukEXHVVVdpy5YtevbZZ/Xrr7+me5wRYSboA/mJgtHip59+kiSdOHFCW7duVUFBQZonSp9jx44pJydHkpSTk6Pjx4+neaKRd/z4cf39999yXVfRaHRUPB/Gjh2rLVu2aOPGjdq6dauk0fFcMBP0gfxEwWhw5ZVXavz48cnr9913n5qamtI8Vfqc/7MUy5Yt07Zt29I80cj7N2KS9MADD4yK58O6devU0tKiNWvWJG8bLc+FtH8yO1yXcDjsHjp0yI3H4+6LL76Y9nnScZk2bZq7b98+d9++fW5TU9OoOg+bNm1yOzs73bNnz7odHR3uk08+6V577bXujh073MOHD7s7duxwr7nmmrTPOdLnYMOGDe7+/fvdxsZGd9u2bW5OTk7a50zlZd68ea7rum5jY6Pb0NDgNjQ0uOFweFQ8F/jqPwAYYeYtFwAY7Qg6ABhB0AHACIIOAEYQdAAwgqADgBEEHQCM+B9FfonYZP0umQAAAABJRU5ErkJggg==" alt="" />

ch2 的结果

aaarticlea/png;base64,iVBORw0KGgoAAAANSUhEUgAAAXQAAAD4CAYAAAD8Zh1EAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjAsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+17YcXAAAQEUlEQVR4nO3dfWiV9f/H8deWJyRnak4bnHNyo441EdrQMwMRFat1/DFXVLAlZCVnCE0pJAb1xxxBpBHSH0PkMFBLG0uzDWpsiAkGTa5ym85tukMn22HebJpWVLjW5/tHtV9zN9dsN0c/ez7gwl3X+excby+OTw5nO8ckSUYAgDtecqIHAACMD4IOAJYg6ABgCYIOAJYg6ABgiWmJOvHly5d1/vz5RJ0eAO5ICxYs0Pz584e8LWFBP3/+vILBYKJODwB3JMdxhr2Nl1wAwBIEHQAsQdABwBIEHQAsQdABwBIEHQAs4Rr0iooKXbp0SadPnx52zQcffKCOjg41NzcrOzt7XAcEAIyOa9D37Nmjp556atjbQ6GQAoGAAoGAioqKtGvXrnEdEAAwOq5BP378uK5evTrs7fn5+dq3b58k6cSJE5o9e7bS0tLGb0IAwKiM+TV0r9erzs7O/v14PC6v1zvk2nA4LMdx5DiOUlNTx3rqybHt7w0AbnNjDnpSUtKgY8YM/Z8gRSIRBYNBBYNB9fT0jPXUAIB/GXPQ4/G4/H5//77P51NXV9dY7xYAcIvGHPSamhq9+OKLkqRly5bp+vXrunjx4pgHAwDcGtdPWzxw4IBWrVql1NRUdXZ2qrS0VB6PR5K0e/duffHFF1q7dq2i0ah+/fVXvfzyyxM+NABgMNegv/DCC653UlxcPC7DAAD+O94pCgCWIOgAYAmCDgCWIOgAYAmCDgCWIOgAYAmCDgCWIOgAYAmCDgCWIOgAYAmCDgCWIOgAYAmCDgCWIOgAYAmCDgCWIOgAYAmCDgCWIOgAYAmCDgCWIOgAYAmCDgCWIOgAYAmCDgCWIOgAYAmCDgCWIOgAYAmCDgCWIOgAYAmCDgCWIOgAYAmCDgCWIOgAYIlRBT03N1ft7e3q6OhQSUnJoNv9fr+OHj2qkydPqrm5WaFQaNwHBQC4MyNtycnJJhqNmoyMDOPxeExTU5PJzMwcsGb37t1m06ZNRpLJzMw0sVhsxPuUZBzHcV1zW2zb/t4SPQcbGxubRm6n6zP0nJwcRaNRxWIx9fb2qrKyUvn5+QPWGGN07733SpJmzZqlrq4ut7sFAIyzaW4LvF6vOjs7+/fj8biWLVs2YM22bdtUX1+vzZs3a8aMGXr88ceHvK9wOKyioiJJUmpq6ljmBgDcxPUZelJS0qBjxpgB+4WFhdqzZ4/8fr/Wrl2rDz/8cMjvi0QiCgaDCgaD6unpGcPYAICbuQY9Ho/L7/f37/t8vkEvqWzcuFFVVVWSpIaGBk2fPp1n4AAwyVyD7jiOAoGA0tPT5fF4VFBQoJqamgFrfvjhB61Zs0aS9Mgjj2j69Onq7u6emIkBAENyDXpfX5+Ki4tVV1entrY2VVVVqbW1VWVlZcrLy5Mkbd26VeFwWE1NTfr444/10ksvTfTcAICbJOmvX3eZdI7jKBgMJuLUt2bbTX8CQAKN1E7eKQoAliDoAGAJgg4AliDoAGAJgg4AliDoAGAJgg4AliDoAGAJgg4AliDoAGAJgg4AliDoAGAJgg4AliDoAGAJgg4AliDoAGAJgg4AliDoAGAJgg4AliDoAGAJgg4AliDoAGAJgg4AliDoAGAJgg4AliDoAGAJgg4AliDoAGAJgg4AliDoAGAJgg4AliDoAGCJUQU9NzdX7e3t6ujoUElJyZBrnn/+eZ05c0YtLS3av3//uA4JABgdM9KWnJxsotGoycjIMB6PxzQ1NZnMzMwBax566CFz8uRJM3v2bCPJzJs3b8T7lGQcx3Fdc1ts2/7eEj0HGxsbm0Zup+sz9JycHEWjUcViMfX29qqyslL5+fkD1oTDYZWXl+vatWuSpO7ubre7BQCMM9ege71edXZ29u/H43F5vd4BaxYuXKiFCxfqq6++0tdff63c3NzxnxQAMKJpbguSkpIGHTPGDLyTadMUCAS0atUq+Xw+HT9+XIsXL9b169cHrAuHwyoqKpIkpaamjmVuAMBNXJ+hx+Nx+f3+/n2fz6eurq5Ba6qrq/XHH3/o+++/19mzZxUIBAbdVyQSUTAYVDAYVE9PzziMDwD4h2vQHcdRIBBQenq6PB6PCgoKVFNTM2DNZ599ptWrV0uS5s6dq4ULF+q7776bmIkBAENyDXpfX5+Ki4tVV1entrY2VVVVqbW1VWVlZcrLy5Mk1dXV6cqVKzpz5oy+/PJLvfHGG7p69eqEDw8A+H9J+uvXXSad4zgKBoOJOPWt2XbTnxP1PQAwCiO1k3eKAoAlCDoAWIKgA4AlCDoAWIKgA4AlCDoAWIKgA4AlCDoAWIKgA4AlCDoAWIKgA4AlCDoAWIKgA4AlCDoAWIKgA4AlCDoAWIKgA4AlCDoAWIKgA4AlCDoAWIKgA4AlCDoAWIKgA4AlCDoAWIKgA4AlCDoAWIKgA4AlCDoAWIKgA4AlCDoAWIKgA4AlCDoAWIKgA4AlRhX03Nxctbe3q6OjQyUlJcOue/bZZ2WM0ZIlS8ZtQADA6LgGPTk5WeXl5QqFQlq0aJEKCwuVmZk5aF1KSoq2bNmihoaGCRkUADAy16Dn5OQoGo0qFoupt7dXlZWVys/PH7Tu7bff1o4dO/T7779PyKAAgJG5Bt3r9aqzs7N/Px6Py+v1DliTlZUlv9+vzz//fMT7CofDchxHjuMoNTX1P44MABiKa9CTkpIGHTPGDLh9586d2rp1q+vJIpGIgsGggsGgenp6bnFUAMBIXIMej8fl9/v7930+n7q6uvr3Z86cqcWLF+vYsWOKxWJ67LHHVFNTww9GAWCSuQbdcRwFAgGlp6fL4/GooKBANTU1/bf/9NNPmjdvnjIyMpSRkaGGhgatW7dO33777YQODgAYyDXofX19Ki4uVl1dndra2lRVVaXW1laVlZUpLy9vMmYEAIzCtNEsqq2tVW1t7YBjpaWlQ65dvXr12KcCANwy3ikKAJYg6ABgCYIOAJYg6ABgCYIOAJYg6ABgCYIOAJYg6ABgCYIOAJYg6ABgCYIOAJYg6ABgCYIOAJYg6ABgCYIOAJYg6ABgCYIOAJYg6ABgCYIOAJYg6ABgCYIOAJYg6ABgCYIOAJYg6ABgCYIOAJYg6ABgCYIOAJYg6ABgCYIOAJYg6ABgCYIOAJYg6ABgiVEFPTc3V+3t7ero6FBJScmg219//XWdOXNGzc3NOnLkiB544IFxHxQAMDLXoCcnJ6u8vFyhUEiLFi1SYWGhMjMzB6xpbGzU0qVL9eijj+rgwYPasWPHhA0MABiaa9BzcnIUjUYVi8XU29uryspK5efnD1hz7Ngx/fbbb5KkhoYG+Xy+iZkWADAs16B7vV51dnb278fjcXm93mHXb9y4UbW1tUPeFg6H5TiOHMdRamrqfxgXADCcaW4LkpKSBh0zxgy5dv369Vq6dKlWrlw55O2RSESRSESS5DjOrcwJAHDhGvR4PC6/39+/7/P51NXVNWjdmjVr9NZbb2nlypW6cePG+E4JAHDl+pKL4zgKBAJKT0+Xx+NRQUGBampqBqzJysrS7t27tW7dOnV3d0/YsACA4bkGva+vT8XFxaqrq1NbW5uqqqrU2tqqsrIy5eXlSZLee+89paSk6JNPPlFjY6Oqq6snfHAAwECuL7lIUm1t7aAfdJaWlvZ//cQTT4zvVACAW8Y7RQHAEgQdACxB0AHAEgQdACxB0AHAEgQdACxB0AHAEgQdACxB0AHAEgQdACxB0AHAEgQdACxB0AHAEgQdACxB0AHAEgQdACxB0AHAEqP6H4tuO9uG+RoApjCeoQOAJQg6AFiCoAOAJQg6AFiCoAOAJQg6AFiCoAOAJQg6AFiCoAOAJQg6AFiCoAOAJe7Mz3L5L7YN8zUAWIJn6ABgCYIOAJYg6ABgiVEFPTc3V+3t7ero6FBJScmg2++++25VVlaqo6NDDQ0NWrBgwbgPCgAYmWvQk5OTVV5erlAopEWLFqmwsFCZmZkD1mzcuFE//vijAoGAdu7cqe3bt0/YwACAobkGPScnR9FoVLFYTL29vaqsrFR+fv6ANfn5+dq7d68k6eDBg1qzZs3ETAsAGJbrry16vV51dnb278fjcS1btmzYNX19fbp+/brmzp2rK1euDFgXDodVVFQkSXr44YflOM6Y/wL6v8GHUlNT1dPTc0vf81/OMyHfM05cr8EUwXXgGvzDlusw0kvarkFPSkoadMwYc8trJCkSiSgSibidcswcx1EwGJzw89zOuAZ/4TpwDf4xFa6D60su8Xhcfr+/f9/n86mrq2vYNXfddZdmzZqlq1evjvOoAICRuAbdcRwFAgGlp6fL4/GooKBANTU1A9bU1NRow4YNkqTnnntOR48enZhpAQDDuksub4Q3xqijo0P79+/X5s2b9dFHH+nTTz9VWVmZZs6cqXPnzunUqVNav3693nnnHWVlZWnTpk26du3a5PwNhnHy5MmEnv92wDX4C9eBa/AP269DkqTBL3YDAO44vFMUACxB0AHAElYF3e0jCqaKWCymU6dOqbGxcXx+1/8OUVFRoUuXLun06dP9x+bMmaP6+nqdO3dO9fX1mj17dgInnHhDXYPS0lLF43E1NjaqsbFRoVAogRNOPJ/Pp6NHj6q1tVUtLS3asmWLpKnzWDA2bMnJySYajZqMjAzj8XhMU1OTyczMTPhcidhisZiZO3duwueY7G3FihUmOzvbnD59uv/Y9u3bTUlJiZFkSkpKzLvvvpvwOSf7GpSWlpqtW7cmfLbJ2tLS0kx2draRZFJSUszZs2dNZmbmlHgsWPMMfTQfUQC7HT9+fND7H/79sRR79+7V008/nYjRJs1Q12CquXjxohobGyVJv/zyi9ra2uT1eqfEY8GaoA/1EQVerzeBEyWOMUb19fX65ptvFA6HEz1OQt1///26ePGipL/+oc+fPz/BEyVGcXGxmpubVVFRYe1LDUNZsGCBsrOzdeLEiSnxWLAm6KP9+IGpYPny5VqyZIlCoZBeffVVrVixItEjIYF27dqlBx98UFlZWbpw4YLef//9RI80KWbMmKFDhw7ptdde088//5zocSaFNUEfzUcUTBUXLlyQJHV3d+vw4cPKyclJ8ESJc+nSJaWlpUmS0tLSdPny5QRPNPkuX76sP//8U8YYRSKRKfF4mDZtmg4dOqT9+/fr8OHDkqbGY8GaoI/mIwqmgnvuuUcpKSn9Xz/55JNqaWlJ8FSJ8++PpdiwYYOqq6sTPNHk+ydikvTMM89MicdDRUWF2tratHPnzv5jU+WxkPCfzI7XFgqFzNmzZ000GjVvvvlmwudJxJaRkWGamppMU1OTaWlpmVLX4cCBA6arq8vcuHHDdHZ2mldeecXcd9995siRI+bcuXPmyJEjZs6cOQmfc7Kvwb59+8ypU6dMc3Ozqa6uNmlpaQmfcyK35cuXG2OMaW5uNo2NjaaxsdGEQqEp8Vjgrf8AYAlrXnIBgKmOoAOAJQg6AFiCoAOAJQg6AFiCoAOAJQg6AFjif4VLXL4Ek1rmAAAAAElFTkSuQmCC" alt="" />

 

三、皮尔逊相关系数

四、互信息法

链接:https://www.zhihu.com/question/28641663/answer/41653367

计算每一个特征与响应变量的相关性,工程上常用的手段有计算皮尔逊系数和互信息系数,
皮尔逊系数只能衡量线性相关性;
互信息系数能够很好地度量各种相关性,但是计算相对复杂一些。 

Wrapper

一、递归特征消除法

原理就是给每个“特征”打分:

首先,预测模型在原始特征上训练,每项特征指定一个权重。

之后,那些拥有最小绝对值权重的特征被踢出特征集。

如此往复递归,直至剩余的特征数量达到所需的特征数量。

(1) Recursive feature elimination: 一个递归特征消除的示例,展示了在数字分类任务中,像素之间的相关性。

(2) Recursive feature elimination with cross-validation: 一个递归特征消除示例,通过交叉验证的方式自动调整所选特征的数量。

print(__doc__)

from sklearn.svm import SVC
from sklearn.datasets import load_digits
from sklearn.feature_selection import RFE
import matplotlib.pyplot as plt # Load the digits dataset
digits = load_digits()
X = digits.images.reshape((len(digits.images), -1))
y = digits.target

########################################################
# Create the RFE object and rank each pixel
svc = SVC(kernel="linear", C=1)
rfe = RFE(estimator=svc, n_features_to_select=1, step=1)

rfe.fit(X, y)
ranking = rfe.ranking_.reshape(digits.images[0].shape) # Plot pixel ranking
plt.matshow(ranking)
plt.colorbar()
plt.title("Ranking of pixels with RFE")
plt.show()

对64个特征的重要性进行绘图,如下:

aaarticlea/png;base64,iVBORw0KGgoAAAANSUhEUgAAAPgAAADwCAYAAAAtgqlmAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjAsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+17YcXAAAgAElEQVR4nO3de1RTV/o38G8SotwvQgUBBW0Hqq5WwBekMi1TvFKs0llVsVppZeGvrRb76lSpM63zqh3tdMZLVzt2ZBikHdGKDgVrVSgoo9ZilIvyI0jVVEFuIrcAFiHZ7x8MqQh4TvSckMvzWWuvRZJznuwAT/Y+++yzjwQAAyHELEmHugKEEPFQghNixijBCTFjlOCEmDFKcELMGCU4IWbMbBN8w4YN+PLLLwd87b333kNSUpKBawSMHDkS+fn5aG1txV/+8he991er1Rg7duwj1cHHxweMMchkskeKM5hdu3bhD3/4w6CvP+jvQoQ3pAmuUqnQ0dEBtVqNmpoapKSkwM7OTvT33bJlC+Lj40V/n/stX74cDQ0NcHR0xO9+9zu993dwcIBKpRKhZsJ58803sXnzZgBAeHg4KisrHykeYwxtbW1Qq9WoqqrCX//6V0ilv/zbnjhxAnfu3IFardaV0NDQfvv2lnffffeR6mNqhrwFf/HFF+Hg4ICAgAAEBgbivffeG+oqicbHxwdlZWVDXQ2TM2nSJDg4OCA8PBwLFy7EsmXL+ry+cuVKODg46MoPP/zQb9/e8vHHHxu6+kNqyBO8V11dHY4fP46AgADdcy+88AIKCwvR0tKCGzduYMOGDbrXeruaS5cuxfXr13Hr1i2sX79+wNhWVlZIS0vDwYMHIZfL+3QTueJYW1tjz549aGxsRFlZGd59990HtkrPPPMMzp07h+bmZpw7dw7PPPMMACAlJQWxsbFYu3Yt1Go1pk2b1m/flJQU7Nq1C9nZ2WhtbcXJkycxZswY3euMMTz++OOQy+UoKirCypUrAQBSqRSnT5/G+++/DwCQSCRYt24drly5goaGBnz11VdwcXEZsL6xsbG4evUqWltbce3aNbzyyiv9thk+fDg6Ojrg6uoKAPj973+Prq4uODg4AAA2bdqE7du36z7Dpk2bYGtri6NHj8LT01PXeo4aNQoAMGzYMKSmpqK1tRWlpaWYPHnyoL/Pe129ehVnzpzp8z9iLLQ/50N79yKvcvToUYPVy2gS3MvLC5GRkbhy5Yruufb2dixduhTOzs6IiorCm2++iXnz5vXZ79e//jX8/f0xbdo0fPDBB3jyySf7vG5tbY2vv/4anZ2dWLBgAbq6ugZ8/8HibNiwAb6+vhg3bhxmzJiBJUuWDPoZXFxccOTIEXzyySdwdXXFtm3bcOTIEYwYMQKvv/469u7diz//+c9wcHBAbm7ugDEWL16MTZs2wc3NDcXFxdi7d2+/bbq6urBkyRJs3LgRTz75JBITEyGTyfDhhx8CABISEhAdHY3w8HB4enqiqakJn332Wb84tra2+OSTTxAZGQlHR0dMnToVxcXF/bbr7OyEQqFAeHg4AOC5557D9evXERYWpnucn5/fZ5+Ojg5ERkaiurpa13rW1NQAAObOnYv9+/fD2dkZWVlZ+PTTTwf9nd7L398fzz77bJ//EaMhdYHm9ku8ipubm0GrxoaqqFQqplarWWtrK2OMse+++445OTkNuv327dvZtm3bGADm4+PDGGPMy8tL93pBQQFbuHAhA8A2bNjAMjMz2cmTJ9nOnTv7xNmwYQP78ssvecW5evUqmzlzpu61uLg4VllZOWD9lixZwgoKCvo89/3337PY2FgGgKWkpLBNmzYN+vlSUlLYvn37dI/t7OxYd3c38/b2ZgAYY4w9/vjjutdXr17NlEola2xsZE888YTu+bKyMhYREaF77OHhwe7evctkMpnu88pkMmZra8uamprYb3/7W2Ztbf3Av9XGjRvZzp07mUwmYzU1NSwhIYFt2bKFDR8+nHV0dDBXV9d+nzE8PLzf72rDhg0sJydH93j8+PGso6Nj0PdljLGWlhbW1tbGGGMsLS2NDRs2TPf6iRMnWHt7O2tqamJNTU3swoUL/fbtfa2pqanP31LIoum8yH6uHsurKBQKg+XYkLfg0dHRcHR0RHh4OJ588sk+324hISHIy8tDfX09mpub8cYbb/T79qutrdX93NHRAXt7e93j0NBQPP3009i6dStnPQaL4+np2adL/qDuuaenJ65fv97nuevXr8PLy4vz/QeK397ejsbGRnh6eg64bWpqKnx9ffHtt9/2adV8fHyQkZGBpqYmNDU1QalUQqPRwN3dvc/+HR0dWLhwId544w3U1NTgm2++gb+//4DvlZ+fj9/85jcICgrCpUuXkJOTg/DwcISGhuLKlSu4ffs27894/+/axsbmgaP6QUFBsLe3x4IFCzBlypR+A7EJCQlwcXGBi4tLv+5+UFCQ7jUXFxdkZ2fzrqd+GLQ8iyENeYL3+s9//oM9e/b0OX2UlpaGrKwsjB49Gs7Ozvj8888hkUh4x8zOzsaWLVuQm5uLkSNHPlS9ampq4O3trXs8evToQbetrq6Gj49Pn+fGjBmDmzdv8n6/e+Pb2dlhxIgRqK6uHnDbv/3tb/jmm28wa9YsXXcZ6PmSiIyM7POPbWNjM2Cc7OxszJw5E6NGjUJ5efmgpw+///57+Pv746WXXkJ+fj6USiXGjBmDqKioft3zXowJ+8+cnp6Os2fP4oMPPhA0rhAYgC6m4VUMacgSfNasWfDy8kJqairWrVsHANixYwdmzJiBSZMmAeg5LdTY2IjOzk4EBwcPOAB0v9deew11dXV46623AAAff/wx0tLSkJubqxsk0seBAwfw3nvvYeLEiTh9+jQ+/fRTuLu7IyEhod+23377Lfz8/LBo0SLIZDIsWLAAEyZMwDfffDNo/OHDh6OgoADFxcWYN28eXnrpJYSFhUEul2PTpk0oKChAVVVVv/2WLFmCyZMn47XXXkNCQgJSU1N1Ldvnn3+ODz/8EGPGjIFUKsXFixdx7ty5fjFGjhyJF198Eba2tujs7ERbWxs0moH/Ae/cuYMLFy5gxYoVyM/Ph0qlgr29PVavXo3IyMgB96mrq4OrqyscHR0H/fz3c3JyQnp6OpRK5YBnHLZu3Yrly5f3641w8fPzQ1FRka60tLRg1apVesV4EAZQC657U6kUn332Gerq6hAXF4dFixZh/PjxaGhowBdffKEbDX7rrbewceNGtLa24oMPPsCBAwc4Y585cwazZ8/u89zmzZvx9ddf47vvvht0NHkwGzduRFVVFc6cOQNPT09s3rwZlZWVWLFiBcaPH99n28bGRsyZMwdr1qzB7du3sXbtWsyZM+eB3dfOzk5EREQgICAAhw8fRnNzM7Zt24bGxkZMnjwZixcv7rfP6NGjsWPHDixduhTt7e3Yt28fzp8/rxvJ3rlzJ7KyspCdnY329nb4+vrC2dm5XxypVIo1a9aguroajY2NCA8P130xDiQ/Px9yuVz3ZfGnP/0JMpkMU6dOHXD7y5cvY9++fbh27Rqampp0o+gPsnPnThw7dgzjx4/XfdHfq7S0FPn5+bzPZ5eUlECtVuPChQt44okncPLkSUyePBkdHR3IyMjgFYMfBg3PYmgGO+DvLaGhoezYsWO6x4mJiSwxMVGw+D4+PuzSpUui1P2NN95gJ0+eZF9//TWbPn26oLG/+OILVl1dzUJCQgSJ5+Xlxb777jv2/PPPs8OHDwtaV5VKpRtYE6o4ODiwa9euifJ3u7fMmDGDnT59WtCYdzuLWf1NT17F7AfZvLy8+gwmVVVV6TUQZUgeHh6YOnUqJBIJ/Pz8sGbNGuTn5yMwMBAFBQWCvIdUKkVRURFiYmKgUqkG7E4/jB07dmDt2rXQarWCxLsXYwzZ2dk4f/68YLMCx40bh1u3biElJQWFhYVISkqCra2tILHvFRMTg3379gkeV8MYr2JIQ5LgAw2UCT0gI5Rhw4bh73//O9RqNfLy8vDtt99izpw5eOedd6BWqwV5D61Wi8DAQKSnp8PT0xMTJ0585JhRUVGor69HYWGhADXsLywsDJMnT0ZkZCRWrFiBZ5999pFjWllZISgoCLt27UJQUBDa29uRmJgoQG1/IZfLMXfuXKSnpwsalwHQ8iyGNCQJXlVV1We02Nvbe9CR4qF248YNPPXUU7C3t4evry/8/f3xr3/9S+Djtx6LFy9GSkpKvzGEhxEWFoa5c+dCpVJh//79iIiIEPQij95JK7du3UJGRgZCQkIeOWZVVRWqqqp0PZiDBw8iKCjokePeKzIyEoWFhaivrxc0LgBBj8HvH2wMDQ3VnearqKhAdnb2gOMq9xuSBFcoFPjVr34FX19fyOVyxMTEICsrayiqopfk5GQolUrdYJYQ3Nzc4OTkBKBn1t306dNRXl7+yHHXr1+P0aNHY+zYsYiJiUFeXh5effXVR44L9MyA650nYGtri5kzZ6K0tPSR49bV1aGyshJ+fn4AgGnTpgk+d3/RokWidM8ZgC7Gr/Bx/2CjUqlEYmIicnNz4efnh9zcXN69G4Md8N9bIiMj2eXLl9mVK1fY+vXrBYublpbGqqur2d27d1llZSVbtmyZIHHDwsIYY4yVlJSwoqIiVlRUxCIjIx857lNPPcUKCwtZSUkJu3TpEnv//fcF/12Hh4cLOsg2duxYVlxczIqLi1lpaamgf79JkyYxhULBSkpKWEZGBnN2dhYsto2NDWtoaGCOjo6C/47vdBazispRvArXINtgg43l5eXMw8ODAT2zE8vLyznrJfnvD4SQR3CnsxjX6waeD3A/dV0WgoODB3190qRJ2L17N8rKyjBp0iRcuHABq1atws2bN/uc5m1sbMSIESMe+F5GM5ONEFPGAGgg4VXc3NygUCh05f6zEEIONloJ8NkIsXi9Cc5HQ0PDA1vwgQYbExMTUVdXBw8PD9TW1sLDw4PXQCG14IQIgUmg5Vm4DDbYmJWVhdjYWAA91/FnZmZyxqIWnBAB6NOC8/H2229j7969GDZsGK5du4bXX38dUqkUBw4cQFxcHG7cuIH58+dzxqFBNkIEoP65BP9bO497QwCyW+kP7KILaci76GItfmhqccWMbWpxxYwtXp35DbAJ2crzMeQJvnz5coorcmxTiytmbLHiMgAaJuVVDImOwQkRiHbo28t+RElwreY2oOG5iolkBLR3L/LatLKL/5rp9jIHXG+/ynv7llZ+Vy2NsLHFxf/Ow+ZDepd/l8zFzgal12u5NwQgb+vmHdfJ2Q6X/7f/ohGD0Q7nd1MEJwcbKK/w/13cdeC9KUbY2KCkjn/sEfZtvLazs3JEZfuPvLZ1GTYS9nInXtuyIeh+8yFOC665CXb7t4KH/UvN/xE8Zq8jOeIMetjfEOeP7vF9syhxAaBtnB6ZqIeb/VeKFkzM1LOCx1z5q4/02t7Q3W8+qItOiAAYAK3FtOCEWBgGCe4y40sn46sRISbKYgbZCLE0jEmg4TEN1dAowQkRQM9UVeNrwXnVaNasWSgvL8ePP/6oW8OcENKXlkl5FUPifLfeNcwjIyMxYcIE3RrmhJBf9JwHl/IqhsT5biEhIbhy5QpUKhW6urqwf//+fnf4JMTSMQBdTMarGBJngpvSGuaEDB2Jac5F57uGeXx8/C8T+SUPXieKEHNjshNd+K5hnpSUpLszJd+55YSYE2OcqspZI1Ndw5wQQzLWQTbOFlyj0WDlypU4fvw4ZDIZ/vnPfwq+GD0hJo+B13prhsZrosvRo0dx9OhRsetCiMky1okuNJONEAEwSAx+CowPSnBCBGLoWWp8UIITIgDLWtGFEAtELTghZqp3VVVjI0qCt2qlONMxXPC4WZcmCR6zl1wjTtzhc4S/0TwA3L3MbzHAh1EzVZyupvcTdaLEBQBb2V3BY0ol+twTRGKaM9kIIdwYs6AWnBBLI/RpMpVKBbVaDY1Gg+7ubgQHB8PFxQVfffUVfH198dNPP2HBggVobn7w6rrG95VDiIkS6u6ivZ5//nkEBgbq7mOWmJiI3Nxc+Pn5ITc3l9c9wynBCRFAz9VkUl7lYc2bNw+pqakAgNTUVERHR3PuQwlOiEA0/114kavwwRhDdnY2zp8/r7thoru7O2pre+58U1tbi5EjR3LGoWNwQgTAwL/77ebmBoVCoXu8e/du3aXWvcLCwlBTU4PHHnsMOTk5KC8vf6h6UYITIgTGf6JLQ0MD5/3Ba/57/7tbt24hIyMDISEhqKurg4eHB2pra+Hh4YH6eu5TsNRFJ0QAPVeTCXN/cFtbW9jb2+t+njlzJkpLS5GVlYXY2FgAQGxsLDIzMzljcbbgycnJmDNnDurr6/HUU09xBiTEEjFI0K0V5jSZu7s7MjIyAABWVlZIS0vD8ePHoVAocODAAcTFxeHGjRuYP38+ZyzOBN+zZw8+/fRTfPHFF49ec0LMmFAz2VQqFQICAvo939jYiOnTp+sVizPBT506BR8fH72CEmJpeuai80xwA85oFWyQ7d5VVeVSF6HCEmIiJPyvJjPFBL93VdXmn0uFCkuISWCmvCYbIYQbXU1GiBkzxhac86AhLS0NZ8+ehb+/PyorK7Fs2TJD1IsQk9J7moxPMSTOFvyVV14xRD0IMWkme+siQgg/xthFpwQnRAh6XuttKJTghAiAgVpwQswaJTghZooB6LaURRc7tVb46a6b4HHtnO4IHrOXfbZclLjWp8SZtlszVZz6AoDfbnGWN77y/xxFiQsAtl4iLJsMPZdNphacEPNEU1UJMXOU4ISYMUYJToh5oplshJg1GmQjxGwxBmi0FnKajBBLZIzH4JxfOd7e3sjLy0NZWRlKS0uRkJBgiHoRYnKEvjeZEDhb8O7ubqxZswZFRUWwt7fHhQsXkJOTA6VSaYj6EWISGHq66caGswWvra1FUVERAKCtrQ1KpRJeXl6iV4wQ0yKBlmcxJL2OwX18fBAYGIiCggKx6kOIyTLGY3DeCW5nZ4dDhw7hnXfegVqt7vf6vcsm28ichKshISbApKeqWllZ4dChQ9i7d6/ulir3u3fZ5LqOh7sTIiGmTKs10QRPTk6GUqnE9u3bxa4PISbLGLvonINsYWFhWLp0KSIiIlBUVISioiJERkYaom6EmIze+4MLeZpMKpWisLAQhw8fBgD4+vrihx9+QEVFBfbv3w+5nPuSYc4EP3PmDCQSCSZNmoTAwEAEBgbi6NGjvCtJiEVgPcfhfApfq1at6nM6+qOPPsL27dvh5+eHpqYmxMXFccYwvrl1hJgoxiS8Ch9eXl6IiorCP/7xD91zEREROHjwIAAgNTUV0dHRnHEowQkRiJAJvmPHDqxduxZarRYA4OrqiubmZmg0GgBAVVUVr/kolOCECITxLG5ublAoFLoSHx/fJ05UVBTq6+tRWFioe04i6f/FwHj09+liE0IEwBjAeJ4ma2hoQHBw8KCvh4WFYe7cuXjhhRdgbW0NR0dH7NixA87OzpDJZNBoNPD29kZ1dTXne1ELTohAhOqir1+/HqNHj8bYsWMRExODvLw8LFmyBCdOnMDLL78MAIiNjUVmZiZnLFFacKlEC1tpp+BxPRz7z6ATirxEI0rclonirKrqcF28KxsufyDO6qdPLCoSJS4AhKuEn1zlIP1Zr+3Fvthk3bp12L9/PzZv3oyioiIkJydz7kNddEIEwX8ATR/5+fnIz88HAKhUKkyZMkWv/SnBCRGKEc5kowQnRAh6TmIxFEpwQoRCCU6IeWLgf5rMkCjBCRGIMV5NRglOiBB6p6kZGc6JLsOHD0dBQQGKi4tRWlqKP/7xjwaoFiGmSMKzGA5nC97Z2YmIiAi0t7fDysoKp0+fxtGjR2ldNkLuZ4QtOK8uent7OwBALpdDLpfzmuROiMUxwrTgNRddKpWiqKgI9fX1yMnJwblz58SuFyEmRgKm5VcMiVeCa7VaBAYGwtvbGyEhIZg4cWK/beLj43WXv1nLnAWvKCFGje+1ogZu5fW6mqylpQUnT57E7Nmz+72WlJSE4OBgBAcH42dNs2AVJMRkMAm/YkCcCe7m5gYnp551zq2trTF9+nSUl9OyyITcT8L4FUPiHGQbNWoUUlNTIZPJIJVKceDAARw5csQQdSPEtBjhIBtngl+6dAlBQUGGqAshpo1mshFipox0JhslOCFC0Q51BfqjBCdEKNRFJ8R8GXqEnA9KcEKEYEnH4C7SLrxsz71ms752tNsKHrNX9yxXUeKq/bpFiTu8TrzvZu/93De1exgS+TBR4gKAj1WX4DGHGWOTrCdqwQkRgATURSfEvNEgGyFmjE6TEWKmhmCeOR+U4IQIxQgTnG4+SIhQBLoefLB1EH19ffHDDz+goqIC+/fvh1zOfbaDEpwQgQh1uWjvOogBAQEICAjA7NmzMWXKFHz00UfYvn07/Pz80NTUhLi4OM5YlOCECILnYg88R9oHWgcxIiICBw8eBACkpqYiOjqaMw7vBJdKpSgsLMThw4f57kKI5RB4yab710G8evUqmpubodH03Oa6qqoKXl5e3HH41n/VqlVQKpV8NyfE4ki0/Iqbm5tu/UKFQoH4+Ph+se5fB3H8+PH9tuGzujGvUXQvLy9ERUXhww8/xOrVq/nsQojF4XuarKGhAcHBwby27V0HMTQ0FM7OzpDJZNBoNPD29kZ1Nfd0cF4t+I4dO7B27VpotYOfyb93VVWJdASvyhNiVgTqog+0DqJSqcSJEyfw8ssvAwBiY2ORmZnJGYszwaOiolBfX4/CwsIHbnfvqqpM28j9KQgxJwIeg48aNQonTpxASUkJFAoFcnJycOTIEaxbtw6rV6/Gjz/+CFdXVyQnJ3PG4uyih4WFYe7cuXjhhRdgbW0NR0dHfPnll3j11Ve5a0qIhRDyYpPB1kFUqVSYMmWKXrE4W/D169dj9OjRGDt2LGJiYpCXl0fJTYiJoKmqhAjFCKeq6pXg+fn5yM/PF6suhJg0CV1NRoiZsqQlmwixRHS5KCHmjBKcEPNFLTgh5sxSErwbDA3au4LHvdMp3rK7Y/deESWudoy7KHE1tuJ9Nw+7Wi9KXGZvJ0pcACjrEj72ZKbH1dSMRtEJMW+W0oITYpEowQkxT3TjA0LMGU10IcS8UQtOiDmjBCfEfJlsC65SqaBWq6HRaNDd3c17PSlCLIqpJjgAPP/887h9+7aYdSHEdNG9yQgxc0aY4Lzm4jHGkJ2djfPnzw+4hjMhBILe+EAovFrwsLAw1NTU4LHHHkNOTg7Ky8tx6tSpPtvEx8dj+fLlAAApLZtMLJAxdtF5teA1NTUAgFu3biEjIwMhISH9trl32WQtLZtMLJERtuCcCW5rawt7e3vdzzNnzkRpaanoFSPElEgY/1sXGRJnF93d3R0ZGRk9G1tZIS0tDcePHxe9YoSYGmPsonMmuEqlQkBAgCHqQohpM8UEJ4TwRAlOiJky0okueqxJQwh5IIFG0b29vZGXl4eysjKUlpYiISEBAODi4oLs7GxUVFQgOzsbzs7OnLEowQkRBIOE8Stcuru7sWbNGkyYMAGhoaFYsWIFxo8fj8TEROTm5sLPzw+5ublITEzkjEUJTohAhDpNVltbi6KiIgBAW1sblEolvLy8MG/ePKSmpgIAUlNTER0dzRlLlGPwdq0cip89BY870aNG8Ji9Lsf6iRK3zU/41WUBwOmieCvMjjrfLErc+sVPixIXAHbXCn8AvNnNgf/GekxicXNzg0Kh0D3evXs3kpKSBtzWx8cHgYGBKCgogLu7O2prawH0fAmMHDmS871okI0QAeizJltDQwOvS67t7Oxw6NAhvPPOO1Cr1Q9VL+qiEyIUAaeqWllZ4dChQ9i7d69uolldXR08PDwAAB4eHqiv516/nhKcEIFIGL/CR3JyMpRKJbZv3657LisrC7GxsQCA2NhYZGZmcsahLjohQhDwQpKwsDAsXboUFy9e1A22rV+/Hlu3bsWBAwcQFxeHGzduYP78+ZyxKMEJEYhQE13OnDkDiUQy4GvTp0/XKxYlOCECkWiNbyobJTghQjG+/OY3yObk5IT09HQolUqUlZUhNDRU7HoRYlpM9XpwANi5cyeOHTuG+fPnQy6Xw9bWVux6EWJ6jLAF50xwBwcHPPfcc3jttdcAAF1dXWhpaRG7XoSYFGO9+SBnF33cuHG4desWUlJSUFhYiKSkJGrBCRkIY/yKAXEmuJWVFYKCgrBr1y4EBQWhvb19wKtY4uPjoVAooFAoMFzGfRkbIWbFSI/BORO8qqoKVVVVOHfuHADg4MGDCAoK6rfdvauqdmrEuViBEGMm5Ew2oXAmeF1dHSorK+Hn13O11bRp01BWViZ6xQgxOUbYRec1iv72229j7969GDZsGK5du4bXX39d7HoRYnKMcZCNV4KXlJTQHUUJeZAhuKkBHzSTjRCBmGwLTgjhgeaiE2Keem9dZGwowQkRhOFHyPmgBCdEIHQMTog5s5QEt5F0YcKwWsHjug5vFzxmrzHpN0WJ+1OMlyhxWx8X74BPvlCc5Y1dSztEiQsAzXdtBI+pYfotWcjnpgaGRi04IUJgAGiQjRDzRS04IeaMzoMTYqaM9PbBlOCECIW66ISYJwloJhshZsw4Z7Jxnujz8/NDUVGRrrS0tGDVqlWGqBshpoPvjQeNbUWXiooKBAYGIjAwEJMnT0ZHR4fuboeEkF9IGONV+EhOTkZdXR0uXbqke87FxQXZ2dmoqKhAdnY2nJ251z7Ua6rOtGnTcPXqVdy4cUOf3QixDBrGr/CwZ88ezJ49u89ziYmJyM3NhZ+fH3Jzcwdc/PR+eiV4TEwM9u3bN+Br966qKpO56hOWENPHhG3BT506hcbGxj7PzZs3D6mpqQCA1NRUREdHc8bhneByuRxz585Fenr6gK/fu6qqRnObb1hCzIfIiy66u7ujtrbnGo/a2lqMHDmScx/eo+iRkZEoLCxEfX39Q1eQEPPFP3nd3NygUCh0j3fv3o2kpCRRasU7wRctWjRo95wQAt4XmzQ0NDzUIqZ1dXXw8PBAbW0tPDw8eDW2vLroNjY2mDFjBv7973/rXSlCLILAx+ADycrKQnOIla4AAAHLSURBVGxsLAAgNjYWmZmZnPvwSvA7d+7Azc0Nra2tD105QsyegMfgaWlpOHv2LPz9/VFZWYlly5Zh69atmDFjBioqKjBjxgxs3bqVMw7NZCNEKFrh5qq+8sorAz4/ffp0veJQghMiBFrwgRBz9mjH12KhBCdEKJTghJgxS1nRpa3VE3fquYfwgZ6T/g0NDby2/R896qBPXADAVyLF1YNYsfWO+4xIcfVgDL8L55/1WKmVwXJacD5T6HopFApR7lxqanHFjG1qccWMLWadLSbBCbE8DNAY3zA6JTghQmAAmPEluAzAH4e6EoWFhRRX5NimFlfM2GLEffvN/4tj+87y2jZiwdOiXVxyPwmM8o5KhJiWyyXXsWrONl7bbvr3EvHGAe5DXXRChEKDbISYKUs6TUaI5WGARjPUleiHEpwQoVALToiZoi46IWbOUuaiE2J5GJgRTnShBCdECAzUghNi1ugYnBAzxeg0GSFmjQm46KJQKMEJEQp10QkxU4zRIBshZo1OkxFivhi14ISYK0YtOCFmiwGMTpMRYp6uNfyILefW8dpWrKWmB0JLNhFixnjdPpgQYpoowQkxY5TghJgxSnBCzBglOCFm7P8Dg34M+eMR9wIAAAAASUVORK5CYII=" alt="" />

$ print(ranking)
[[ ]
[ ]
[ ]
[ ]
[ ]
[ ]
[ ]
[ ]]

Embedded

一、基于惩罚项的特征选择法

二、基于树模型的特征选择法

该话题独立成章,详见: [Feature] Feature selection - Embedded topic

集成 pipeline

如下代码片段中,

(1) 我们将 sklearn.svm.LinearSVC 和 sklearn.feature_selection.SelectFromModel 结合来评估特征的重要性,并选择最相关的特征。

(2) 之后 sklearn.ensemble.RandomForestClassifier 模型使用转换后的输出训练,即只使用被选出的相关特征。

Ref: sklearn.pipeline.Pipeline

clf = Pipeline([
('feature_selection', SelectFromModel(LinearSVC(penalty="l1"))),
('classification', RandomForestClassifier())
])
clf.fit(X, y)

降维


一、主成分分析法(PCA)

二、线性判别分析法(LDA)

Goto: [Scikit-learn] 4.4 Dimensionality reduction - PCA

Ref: [Scikit-learn] 2.5 Dimensionality reduction - Probabilistic PCA & Factor Analysis

Ref: [Scikit-learn] 2.5 Dimensionality reduction - ICA

Goto: [Scikit-learn] 1.2 Dimensionality reduction - Linear and Quadratic Discriminant Analysis

End.

[Feature] Feature selection的更多相关文章

  1. [Feature] Feature selection - Embedded topic

    基于惩罚项的特征选择法 一.直接对特征筛选 Ref: 1.13.4. 使用SelectFromModel选择特征(Feature selection using SelectFromModel) 通过 ...

  2. 如何设置Installshield中 feature的选中状态

    原文:如何设置Installshield中 feature的选中状态 上一篇: 使用strtuts2的iterator标签循环输出二维数组之前一直有筒子问如何设置Installshield中 feat ...

  3. [Feature] Final pipeline: custom transformers

    有视频:https://www.youtube.com/watch?v=BFaadIqWlAg 有代码:https://github.com/jem1031/pandas-pipelines-cust ...

  4. [ML] Feature Selectors

    SparkML中关于特征的算法可分为:Extractors(特征提取).Transformers(特征转换).Selectors(特征选择)三部分. Ref: SparkML中三种特征选择算法(Vec ...

  5. Multipart to single part feature

    Multipart to single part feature Explode Link: http://edndoc.esri.com/arcobjects/8.3/?URL=/arcobject ...

  6. Asp.net core 学习笔记 ( Area and Feature folder structure 文件结构 )

    2017-09-22 refer : https://msdn.microsoft.com/en-us/magazine/mt763233.aspx?f=255&MSPPError=-2147 ...

  7. Extjs4.2 Grid搜索Ext.ux.grid.feature.Searching的使用

    背景 Extjs4.2 默认提供的Search搜索,功能还是非常强大的,只是对于国内的用户来说,还是不习惯在每列里面单击好几下再筛选,于是相当当初2.2里面的搜索,更加的实用点,于是在4.2里面实现. ...

  8. Feature Toggle JUnit

    Feature Toggle,简单来说,就是一个开关,将未完成功能的代码屏蔽后发布到生产环境,从而避免多分支的情况.之所以有本文的产生,就是源于此情景.在引入Feature Toggle的同时,我们发 ...

  9. git分支开发,分支(feature)同步主干(master)代码,以及最终分支合并到主干的操作流程

    由于rebase执行速度慢,分支同步主干代码时,分支的每次提交都可能和主干产生冲突,需要解决的次数太多,影响提交效率. 同时,为了保证主干提交线干净(可以安全回溯),所以采用下面所说的merge法. ...

随机推荐

  1. React组件的定义、渲染和传值总结

    一.组件的定义 1.使用JavaScript函数定义 Welcome.js import React from 'react'; function Welcome() { return ( <d ...

  2. python(动态传参、命名空间、函数嵌套、global和nonlocal关键字)

    一.函数的动态传参 1.*args位置参数动态传参 def chi(*food): print(food) chi("烧烤","火锅","虾吃虾涮&q ...

  3. 大数据之路week06--day07(Hadoop生态圈的介绍)

    Hadoop 基本概念 一.Hadoop出现的前提环境 随着数据量的增大带来了以下的问题 (1)如何存储大量的数据? (2)怎么处理这些数据? (3)怎样的高效的分析这些数据? (4)在数据增长的情况 ...

  4. springboot+支付宝完成秒杀项目的初体验

    springboot+支付宝完成秒杀项目的初体验 思考的问题: 首先是秒杀的商品查询,考虑到是热点数据,所以写一个接口读取当日批次的秒杀商品到redis中(那么接下来对商品的操作都放入redis中). ...

  5. python开发全自动网站链接主动提交百度工具

    自己网站因数据比较多,趁晚上没事就写了一个通过python爬取url自动提交给百度,实现网站全站提交的思路,代码实现很简单,因为编写时间仓储,难免有些bug,可以放在服务器上配置下定时爬取提交. im ...

  6. vue 自定义全局组件

  7. mysql双主模式方案

    MySQL双主(主主)架构方案   在企业中,数据库高可用一直是企业的重中之重,中小企业很多都是使用mysql主从方案,一主多从,读写分离等,但是单主存在单点故障,从库切换成主库需要作改动.因此,如果 ...

  8. Selenium常用API的使用java语言之2-环境安装之IntelliJ IDEA

    1.安装IntelliJ IDEA 你可能会问,为什么不用Eclipse呢?随着发展IntelliJ IDEA有超越Eclipse的势头,JetBrains公司的IDE基本上已经一统了各家主流编程语言 ...

  9. 53、servlet3.0-简介&测试

    53.servlet3.0-简介&测试 Servlet 4.0 : https://www.jcp.org/en/jsr/summary?id=servlet+4.0

  10. php MySQL 选择数据库

    在你连接到 MySQL 数据库后,可能有多个可以操作的数据库,所以你需要选择你要操作的数据库. 从命令提示窗口中选择MySQL数据库 在 mysql> 提示窗口中可以很简单的选择特定的数据库.你 ...