随机森林

单颗树与随机森林的的分对比

# 导入包

from sklearn.datasets import load_wine

from sklearn.model_selection import train_test_split

from sklearn.tree import DecisionTreeClassifier

from sklearn.ensemble import RandomForestClassifier

# 实例化红酒数据集

wine = load_wine()

# 划分测试集和训练集

x_train, x_test, y_train, y_test = train_test_split(wine.data, wine.target, test_size=0.3)

# 实例化决策树和随机森林，random_state=0

clf = DecisionTreeClassifier(random_state=0)

rfc = RandomForestClassifier(random_state=0)

# 训练模型

clf.fit(x_train, y_train)

rfc.fit(x_train, y_train)

#sk-container-id-1 { color: rgba(0, 0, 0, 1); background-color: rgba(255, 255, 255, 1) }
#sk-container-id-1 pre { padding: 0 }
#sk-container-id-1 div.sk-toggleable { background-color: rgba(255, 255, 255, 1) }
#sk-container-id-1 label.sk-toggleable__label { cursor: pointer; display: block; width: 100%; margin-bottom: 0; padding: 0.3em; box-sizing: border-box; text-align: center }
#sk-container-id-1 label.sk-toggleable__label-arrow:before { content: "▸"; float: left; margin-right: 0.25em; color: rgba(105, 105, 105, 1) }
#sk-container-id-1 label.sk-toggleable__label-arrow:hover:before { color: rgba(0, 0, 0, 1) }
#sk-container-id-1 div.sk-estimator:hover label.sk-toggleable__label-arrow:before { color: rgba(0, 0, 0, 1) }
#sk-container-id-1 div.sk-toggleable__content { max-height: 0; max-width: 0; overflow: hidden; text-align: left; background-color: rgba(240, 248, 255, 1) }
#sk-container-id-1 div.sk-toggleable__content pre { margin: 0.2em; color: rgba(0, 0, 0, 1); border-radius: 0.25em; background-color: rgba(240, 248, 255, 1) }
#sk-container-id-1 input.sk-toggleable__control:checked~div.sk-toggleable__content { max-height: 200px; max-width: 100%; overflow: auto }
#sk-container-id-1 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before { content: "▾" }
#sk-container-id-1 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label { background-color: rgba(212, 235, 255, 1) }
#sk-container-id-1 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label { background-color: rgba(212, 235, 255, 1) }
#sk-container-id-1 input.sk-hidden--visually { border: 0; clip: rect(1px, 1px, 1px, 1px); height: 1px; margin: -1px; overflow: hidden; padding: 0; position: absolute; width: 1px }
#sk-container-id-1 div.sk-estimator { font-family: monospace; background-color: rgba(240, 248, 255, 1); border: 1px dotted rgba(0, 0, 0, 1); border-radius: 0.25em; box-sizing: border-box; margin-bottom: 0.5em }
#sk-container-id-1 div.sk-estimator:hover { background-color: rgba(212, 235, 255, 1) }
#sk-container-id-1 div.sk-parallel-item::after { content: ""; width: 100%; border-bottom: 1px solid rgba(128, 128, 128, 1); flex-grow: 1 }
#sk-container-id-1 div.sk-label:hover label.sk-toggleable__label { background-color: rgba(212, 235, 255, 1) }
#sk-container-id-1 div.sk-serial::before { content: ""; position: absolute; border-left: 1px solid rgba(128, 128, 128, 1); box-sizing: border-box; top: 0; bottom: 0; left: 50%; z-index: 0 }
#sk-container-id-1 div.sk-serial { display: flex; flex-direction: column; align-items: center; background-color: rgba(255, 255, 255, 1); padding-right: 0.2em; padding-left: 0.2em; position: relative }
#sk-container-id-1 div.sk-item { position: relative; z-index: 1 }
#sk-container-id-1 div.sk-parallel { display: flex; align-items: stretch; justify-content: center; background-color: rgba(255, 255, 255, 1); position: relative }
#sk-container-id-1 div.sk-item::before, #sk-container-id-1 div.sk-parallel-item::before { content: ""; position: absolute; border-left: 1px solid rgba(128, 128, 128, 1); box-sizing: border-box; top: 0; bottom: 0; left: 50%; z-index: -1 }
#sk-container-id-1 div.sk-parallel-item { display: flex; flex-direction: column; z-index: 1; position: relative; background-color: rgba(255, 255, 255, 1) }
#sk-container-id-1 div.sk-parallel-item:first-child::after { align-self: flex-end; width: 50% }
#sk-container-id-1 div.sk-parallel-item:last-child::after { align-self: flex-start; width: 50% }
#sk-container-id-1 div.sk-parallel-item:only-child::after { width: 0 }
#sk-container-id-1 div.sk-dashed-wrapped { border: 1px dashed rgba(128, 128, 128, 1); margin: 0 0.4em 0.5em; box-sizing: border-box; padding-bottom: 0.4em; background-color: rgba(255, 255, 255, 1) }
#sk-container-id-1 div.sk-label label { font-family: monospace; font-weight: bold; display: inline-block; line-height: 1.2em }
#sk-container-id-1 div.sk-label-container { text-align: center }
#sk-container-id-1 div.sk-container { display: inline-block !important; position: relative }
#sk-container-id-1 div.sk-text-repr-fallback { display: none }

RandomForestClassifier(random_state=0)

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

RandomForestClassifier

RandomForestClassifier(random_state=0)

# 返回测试集的分

clf_score = clf.score(x_test, y_test)

rfc_score = rfc.score(x_test, y_test)

print("sinle tree: {0}\nrandom tree: {1}".format(clf_score, rfc_score))

sinle tree: 0.9074074074074074

random tree: 0.9629629629629629

单颗树与随机森林在交叉验证下的对比图

# 导入交叉验证和画图工具

%matplotlib inline

from sklearn.model_selection import cross_val_score

import matplotlib.pyplot as plt

# 实例化决策树和随机森林

clf = DecisionTreeClassifier()

rfc = RandomForestClassifier(n_estimators=25) #创建25棵树组成的随机森林

# 实例化交叉验证 10次

clf_corss = cross_val_score(clf, wine.data, wine.target, cv=10)

rfc_corss = cross_val_score(rfc, wine.data, wine.target, cv=10)

# 查看决策树和随机森林的最好结果

print("single tree mean socre: {}\nrandom tree mean socre {}".format(clf_corss.mean(), rfc_corss.mean()))

single tree mean socre: 0.8705882352941178

random tree mean socre 0.9722222222222221

# 画出决策树和随机森林对比图

plt.plot(range(1, 11), clf_corss, label="single tree")

plt.plot(range(1, 11), rfc_corss, label="random tree")

plt.xticks(range(1, 11))

plt.legend()

<matplotlib.legend.Legend at 0x7ff6f4815d50>

clf_corss = cross_val_score(clf, wine.data, wine.target, cv=10)

clf_corss

array([0.88888889, 0.88888889, 0.72222222, 0.88888889, 0.83333333,

       0.83333333, 1.        , 0.94444444, 0.94117647, 0.76470588])

rfc_corss = cross_val_score(rfc, wine.data, wine.target, cv=10)

rfc_corss

array([1.        , 1.        , 0.94444444, 0.94444444, 0.88888889,

       1.        , 1.        , 1.        , 1.        , 1.        ])

十次交叉验证下决策树和随机森林的对比

# 创建分数列表

clf_list = []

rfc_list = []

for i in range(10):

    clf = DecisionTreeClassifier()

    rfc = RandomForestClassifier(n_estimators=25)

    clf_corss_mean = cross_val_score(clf, wine.data, wine.target, cv=10).mean()

    rfc_corss_mean = cross_val_score(rfc, wine.data, wine.target, cv=10).mean()

    clf_list.append(clf_corss_mean)

    rfc_list.append(rfc_corss_mean)

# 画出决策树和随机森林对比图

plt.plot(range(1, 11), clf_list, label="single tree")

plt.plot(range(1, 11), rfc_list, label="random tree")

plt.xticks(range(1, 11))

plt.legend()

<matplotlib.legend.Legend at 0x7ff6f490f670>

n_estimators 学习曲线

# 1-200颗树的学习曲线

superpa = []

for i in range(200):

    rfc = RandomForestClassifier(n_estimators=i+1, n_jobs=-1)

    rfc_cross = cross_val_score(rfc, wine.data, wine.target, cv=10).mean()

    superpa.append(rfc_cross)

print(max(superpa), superpa.index(max(superpa)))

plt.figure(figsize=(20,8))

plt.plot(range(1,201), superpa, label="rfc_cross_mean")

plt.legend()

0.9888888888888889 20

<matplotlib.legend.Legend at 0x7ff6f540f100>

随机森林n_estimators 学习曲线的更多相关文章

#调整随机森林的参数(调整n_estimators随机森林中树的数量默认10个树，精度递增显著，但并不是越多越好)，加上verbose=True，显示进程使用信息
#调整随机森林的参数(调整n_estimators随机森林中树的数量默认10个树,精度递增显著) from sklearn import datasets X, y = datasets.make_c ...
Python机器学习笔记——随机森林算法
随机森林算法的理论知识随机森林是一种有监督学习算法,是以决策树为基学习器的集成学习算法.随机森林非常简单,易于实现,计算开销也很小,但是它在分类和回归上表现出非常惊人的性能,因此,随机森林被誉为“代 ...
机器学习实战基础（三十五）：随机森林（二）之 RandomForestClassiﬁer 之重要参数
RandomForestClassiﬁer class sklearn.ensemble.RandomForestClassifier (n_estimators=’10’, criterion=’g ...
python的随机森林模型调参
一.一般的模型调参原则 1.调参前提:模型调参其实是没有定论,需要根据不同的数据集和不同的模型去调.但是有一些调参的思想是有规律可循的,首先我们可以知道,模型不准确只有两种情况:一是过拟合,而是欠拟合 ...
scikit-learn随机森林调参小结
在Bagging与随机森林算法原理小结中,我们对随机森林(Random Forest, 以下简称RF)的原理做了总结.本文就从实践的角度对RF做一个总结.重点讲述scikit-learn中RF的调参注 ...
[Machine Learning & Algorithm] 随机森林（Random Forest）
1 什么是随机森林? 作为新兴起的.高度灵活的一种机器学习算法,随机森林(Random Forest,简称RF)拥有广泛的应用前景,从市场营销到医疗保健保险,既可以用来做市场营销模拟的建模,统计客户来 ...
kaggle数据挖掘竞赛初步--Titanic<随机森林&特征重要性>
完整代码: https://github.com/cindycindyhi/kaggle-Titanic 特征工程系列: Titanic系列之原始数据分析和数据处理 Titanic系列之数据变换 Ti ...
机器学习——随机森林，RandomForestClassifier参数含义详解
1.随机森林模型 clf = RandomForestClassifier(n_estimators=200, criterion='entropy', max_depth=4) rf_clf = c ...
sklearn_随机森林random forest原理_乳腺癌分类器建模(推荐AAA)
sklearn实战-乳腺癌细胞数据挖掘(博主亲自录制视频) https://study.163.com/course/introduction.htm?courseId=1005269003& ...
Python中随机森林的实现与解释
使用像Scikit-Learn这样的库,现在很容易在Python中实现数百种机器学习算法.这很容易,我们通常不需要任何关于模型如何工作的潜在知识来使用它.虽然不需要了解所有细节,但了解机器学习模型是如 ...

随机推荐

Debug --> CICFlowMeter的java版本安装及使用
一. 首先,给出一个很详细的配置链接!使用IDEA进行配置~ https://blog.csdn.net/BananaMan45/article/details/105473151?utm_mediu ...
HTML5第五章作业
5.1.3 html 1 <!DOCTYPE html> 2 <html> 3 <head> 4 <meta charset="utf-8" ...
UDP与TCP ---FundeBug
UDP 面向无连接首先 UDP 是不需要和 TCP一样在发送数据前进行三次握手建立连接的,想发数据就可以开始发送了.并且也只是数据报文的搬运工,不会对数据报文进行任何拆分和拼接操作. 具体来说就是: ...
MP逻辑删除
在实体类中加上@TableLogic注解在配置类中加入逻辑删除插件
Window10设置技巧
1.关闭应用程序上的最近打开文件,效果图 2.固定到"开始"屏幕,效果图 3.任务栏图标太大了
探测域名解析依赖关系（运行问题解决No module named 'DNS'）
探测域名解析依赖关系最近很懒,今天晚上才开始这个任务,然后发现我原来能跑起来的程序跑不起来了. 一直报错 ModuleNotFoundError: No module named 'DNS' 这个应 ...
剑指offer----1.二维数组查找
题目:在一个二维数组中(每个一维数组的长度相同),每一行都按照从左到右递增的顺序排序,每一列都按照从上到下递增的顺序排序.请完成一个函数,输入这样的一个二维数组和一个整数,判断数组中是否含有该整数. ...
windows安装WinDump
1.下载软件,放在C盘: WinDump.exehttps://www.winpcap.org/windump/install/default.htmWinPcap_4_1_3.exe(windows ...
c# 数组集合属性访问设置
当只修改数组或者集合的某一个特定值时不会经过CLR属性封装器
Springboot ehcache/redis双缓存问题
问题1:两个CacheManager 会报两个相同的实现类错误需要继承CachingConfigurerSupport 重写cacheManager方法,指定默认一个返回缓存提供者 @Configur ...

随机森林n_estimators 学习曲线