随机森林

单颗树与随机森林的的分对比

# 导入包
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
# 实例化红酒数据集
wine = load_wine()
# 划分测试集和训练集
x_train, x_test, y_train, y_test = train_test_split(wine.data, wine.target, test_size=0.3)
# 实例化决策树和随机森林,random_state=0
clf = DecisionTreeClassifier(random_state=0)
rfc = RandomForestClassifier(random_state=0)
# 训练模型
clf.fit(x_train, y_train)
rfc.fit(x_train, y_train)

#sk-container-id-1 { color: rgba(0, 0, 0, 1); background-color: rgba(255, 255, 255, 1) }
#sk-container-id-1 pre { padding: 0 }
#sk-container-id-1 div.sk-toggleable { background-color: rgba(255, 255, 255, 1) }
#sk-container-id-1 label.sk-toggleable__label { cursor: pointer; display: block; width: 100%; margin-bottom: 0; padding: 0.3em; box-sizing: border-box; text-align: center }
#sk-container-id-1 label.sk-toggleable__label-arrow:before { content: "▸"; float: left; margin-right: 0.25em; color: rgba(105, 105, 105, 1) }
#sk-container-id-1 label.sk-toggleable__label-arrow:hover:before { color: rgba(0, 0, 0, 1) }
#sk-container-id-1 div.sk-estimator:hover label.sk-toggleable__label-arrow:before { color: rgba(0, 0, 0, 1) }
#sk-container-id-1 div.sk-toggleable__content { max-height: 0; max-width: 0; overflow: hidden; text-align: left; background-color: rgba(240, 248, 255, 1) }
#sk-container-id-1 div.sk-toggleable__content pre { margin: 0.2em; color: rgba(0, 0, 0, 1); border-radius: 0.25em; background-color: rgba(240, 248, 255, 1) }
#sk-container-id-1 input.sk-toggleable__control:checked~div.sk-toggleable__content { max-height: 200px; max-width: 100%; overflow: auto }
#sk-container-id-1 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before { content: "▾" }
#sk-container-id-1 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label { background-color: rgba(212, 235, 255, 1) }
#sk-container-id-1 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label { background-color: rgba(212, 235, 255, 1) }
#sk-container-id-1 input.sk-hidden--visually { border: 0; clip: rect(1px, 1px, 1px, 1px); height: 1px; margin: -1px; overflow: hidden; padding: 0; position: absolute; width: 1px }
#sk-container-id-1 div.sk-estimator { font-family: monospace; background-color: rgba(240, 248, 255, 1); border: 1px dotted rgba(0, 0, 0, 1); border-radius: 0.25em; box-sizing: border-box; margin-bottom: 0.5em }
#sk-container-id-1 div.sk-estimator:hover { background-color: rgba(212, 235, 255, 1) }
#sk-container-id-1 div.sk-parallel-item::after { content: ""; width: 100%; border-bottom: 1px solid rgba(128, 128, 128, 1); flex-grow: 1 }
#sk-container-id-1 div.sk-label:hover label.sk-toggleable__label { background-color: rgba(212, 235, 255, 1) }
#sk-container-id-1 div.sk-serial::before { content: ""; position: absolute; border-left: 1px solid rgba(128, 128, 128, 1); box-sizing: border-box; top: 0; bottom: 0; left: 50%; z-index: 0 }
#sk-container-id-1 div.sk-serial { display: flex; flex-direction: column; align-items: center; background-color: rgba(255, 255, 255, 1); padding-right: 0.2em; padding-left: 0.2em; position: relative }
#sk-container-id-1 div.sk-item { position: relative; z-index: 1 }
#sk-container-id-1 div.sk-parallel { display: flex; align-items: stretch; justify-content: center; background-color: rgba(255, 255, 255, 1); position: relative }
#sk-container-id-1 div.sk-item::before, #sk-container-id-1 div.sk-parallel-item::before { content: ""; position: absolute; border-left: 1px solid rgba(128, 128, 128, 1); box-sizing: border-box; top: 0; bottom: 0; left: 50%; z-index: -1 }
#sk-container-id-1 div.sk-parallel-item { display: flex; flex-direction: column; z-index: 1; position: relative; background-color: rgba(255, 255, 255, 1) }
#sk-container-id-1 div.sk-parallel-item:first-child::after { align-self: flex-end; width: 50% }
#sk-container-id-1 div.sk-parallel-item:last-child::after { align-self: flex-start; width: 50% }
#sk-container-id-1 div.sk-parallel-item:only-child::after { width: 0 }
#sk-container-id-1 div.sk-dashed-wrapped { border: 1px dashed rgba(128, 128, 128, 1); margin: 0 0.4em 0.5em; box-sizing: border-box; padding-bottom: 0.4em; background-color: rgba(255, 255, 255, 1) }
#sk-container-id-1 div.sk-label label { font-family: monospace; font-weight: bold; display: inline-block; line-height: 1.2em }
#sk-container-id-1 div.sk-label-container { text-align: center }
#sk-container-id-1 div.sk-container { display: inline-block !important; position: relative }
#sk-container-id-1 div.sk-text-repr-fallback { display: none }

RandomForestClassifier(random_state=0)

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

RandomForestClassifier
RandomForestClassifier(random_state=0)
# 返回测试集的分
clf_score = clf.score(x_test, y_test)
rfc_score = rfc.score(x_test, y_test)
print("sinle tree: {0}\nrandom tree: {1}".format(clf_score, rfc_score))
sinle tree: 0.9074074074074074
random tree: 0.9629629629629629

单颗树与随机森林在交叉验证下的对比图

# 导入交叉验证和画图工具
%matplotlib inline
from sklearn.model_selection import cross_val_score
import matplotlib.pyplot as plt
# 实例化决策树和随机森林
clf = DecisionTreeClassifier()
rfc = RandomForestClassifier(n_estimators=25) #创建25棵树组成的随机森林
# 实例化交叉验证 10次
clf_corss = cross_val_score(clf, wine.data, wine.target, cv=10)
rfc_corss = cross_val_score(rfc, wine.data, wine.target, cv=10)
# 查看决策树和随机森林的最好结果
print("single tree mean socre: {}\nrandom tree mean socre {}".format(clf_corss.mean(), rfc_corss.mean()))
single tree mean socre: 0.8705882352941178
random tree mean socre 0.9722222222222221
# 画出决策树和随机森林对比图
plt.plot(range(1, 11), clf_corss, label="single tree")
plt.plot(range(1, 11), rfc_corss, label="random tree")
plt.xticks(range(1, 11))
plt.legend()
<matplotlib.legend.Legend at 0x7ff6f4815d50>



clf_corss = cross_val_score(clf, wine.data, wine.target, cv=10)
clf_corss
array([0.88888889, 0.88888889, 0.72222222, 0.88888889, 0.83333333,
0.83333333, 1. , 0.94444444, 0.94117647, 0.76470588])
rfc_corss = cross_val_score(rfc, wine.data, wine.target, cv=10)
rfc_corss
array([1.        , 1.        , 0.94444444, 0.94444444, 0.88888889,
1. , 1. , 1. , 1. , 1. ])

十次交叉验证下决策树和随机森林的对比

# 创建分数列表
clf_list = []
rfc_list = []
for i in range(10):
clf = DecisionTreeClassifier()
rfc = RandomForestClassifier(n_estimators=25)
clf_corss_mean = cross_val_score(clf, wine.data, wine.target, cv=10).mean()
rfc_corss_mean = cross_val_score(rfc, wine.data, wine.target, cv=10).mean()
clf_list.append(clf_corss_mean)
rfc_list.append(rfc_corss_mean)
# 画出决策树和随机森林对比图
plt.plot(range(1, 11), clf_list, label="single tree")
plt.plot(range(1, 11), rfc_list, label="random tree")
plt.xticks(range(1, 11))
plt.legend()
<matplotlib.legend.Legend at 0x7ff6f490f670>

n_estimators 学习曲线

# 1-200颗树的学习曲线
superpa = []
for i in range(200):
rfc = RandomForestClassifier(n_estimators=i+1, n_jobs=-1)
rfc_cross = cross_val_score(rfc, wine.data, wine.target, cv=10).mean()
superpa.append(rfc_cross)
print(max(superpa), superpa.index(max(superpa)))
plt.figure(figsize=(20,8))
plt.plot(range(1,201), superpa, label="rfc_cross_mean")
plt.legend()
0.9888888888888889 20

<matplotlib.legend.Legend at 0x7ff6f540f100>



随机森林n_estimators 学习曲线的更多相关文章

  1. #调整随机森林的参数(调整n_estimators随机森林中树的数量默认10个树,精度递增显著,但并不是越多越好),加上verbose=True,显示进程使用信息

    #调整随机森林的参数(调整n_estimators随机森林中树的数量默认10个树,精度递增显著) from sklearn import datasets X, y = datasets.make_c ...

  2. Python机器学习笔记——随机森林算法

    随机森林算法的理论知识 随机森林是一种有监督学习算法,是以决策树为基学习器的集成学习算法.随机森林非常简单,易于实现,计算开销也很小,但是它在分类和回归上表现出非常惊人的性能,因此,随机森林被誉为“代 ...

  3. 机器学习实战基础(三十五):随机森林 (二)之 RandomForestClassifier 之重要参数

    RandomForestClassifier class sklearn.ensemble.RandomForestClassifier (n_estimators=’10’, criterion=’g ...

  4. python的随机森林模型调参

    一.一般的模型调参原则 1.调参前提:模型调参其实是没有定论,需要根据不同的数据集和不同的模型去调.但是有一些调参的思想是有规律可循的,首先我们可以知道,模型不准确只有两种情况:一是过拟合,而是欠拟合 ...

  5. scikit-learn随机森林调参小结

    在Bagging与随机森林算法原理小结中,我们对随机森林(Random Forest, 以下简称RF)的原理做了总结.本文就从实践的角度对RF做一个总结.重点讲述scikit-learn中RF的调参注 ...

  6. [Machine Learning & Algorithm] 随机森林(Random Forest)

    1 什么是随机森林? 作为新兴起的.高度灵活的一种机器学习算法,随机森林(Random Forest,简称RF)拥有广泛的应用前景,从市场营销到医疗保健保险,既可以用来做市场营销模拟的建模,统计客户来 ...

  7. kaggle数据挖掘竞赛初步--Titanic<随机森林&特征重要性>

    完整代码: https://github.com/cindycindyhi/kaggle-Titanic 特征工程系列: Titanic系列之原始数据分析和数据处理 Titanic系列之数据变换 Ti ...

  8. 机器学习——随机森林,RandomForestClassifier参数含义详解

    1.随机森林模型 clf = RandomForestClassifier(n_estimators=200, criterion='entropy', max_depth=4) rf_clf = c ...

  9. sklearn_随机森林random forest原理_乳腺癌分类器建模(推荐AAA)

     sklearn实战-乳腺癌细胞数据挖掘(博主亲自录制视频) https://study.163.com/course/introduction.htm?courseId=1005269003& ...

  10. Python中随机森林的实现与解释

    使用像Scikit-Learn这样的库,现在很容易在Python中实现数百种机器学习算法.这很容易,我们通常不需要任何关于模型如何工作的潜在知识来使用它.虽然不需要了解所有细节,但了解机器学习模型是如 ...

随机推荐

  1. mockjs 加上 json-server 快速生成前端数据

    const mock = require('mockjs'); // 引入mockjs const data = mock.mock({ "data|20": [{ "i ...

  2. ubuntu | virtualbox报错:不能为虚拟电脑打开一个新任务

    百度了几个办法 都不行. 还得是gxd,说在vmware虚拟机设置勾上这个就行了

  3. java-------token

    https://el-admin.vip/guide/hdsc.html#%E6%96%B0%E5%BB%BA%E6%A8%A1%E5%9D%97

  4. FII-PRA006/010开发板硬件实验一

    FII-PRA006/010开发板硬件实验一 以一位全加器为例介绍如何利用开发板进行板载实验.一位全加器的Verilog代码如下: 1 2 3 4 5 6 7 8 9 10 module fadd1 ...

  5. numpy基本使用(一)

    一.简介  NumPy(Numerical Python) 是用于科学计算及数据处理的Python扩展程序库,支持大量的维度数组与矩阵运算,此外也针对数组运算提供大量的数学函数库. 二.数据结构  n ...

  6. 如何在Windows下使用WebMatrix+IIS开发PHP程序

    最近接收一个新项目,领导要求对客户端的接口采用PHP开发,为了方便,我就采用 Windows7专业版64位 + IIS7.5 + PHP5.5 + WebMatrix 作为开发环境进行开发: 首先下载 ...

  7. Centos8 中安装GitLab

    Centos8 中安装GitLab 1,安装依赖 yum install -y curl policycoreutils-python openssh-server centos8没有policyco ...

  8. vue中router.resolve

    resolve是router的一个方法, 返回路由地址的标准化版本.该方法适合编程式导航. let router = this.$router.resolve({ path: '/home', que ...

  9. Hadoop之HDFS优缺点、设计原理、框架

    如需大数据开发整套视频(hadoop\hive\hbase\flume\sqoop\kafka\zookeeper\presto\spark):请联系QQ:1974983704  Hadoop的前世今 ...

  10. linux软件安装篇

    nginx篇 第一件事情 cd /etc/yum.repo.d mv CentOS-Base.repo CentOS-Base.repo.bak wget -O CentOS-Base.repo ht ...