Coursera, Big Data 4, Machine Learning With Big Data (week 3/4/5)
week 3 Classification

KNN :基本思想是 input value 类似,就可能是同一类的


Decision Tree




Naive Bayes







Week 4 Evaluating model
Over-fitting
怎么在Decision Tree 训练时避免 overfitting: Pre-Pruning 和 Post-Pruning

pre-pruning 两个停止条件:1. 某个node上的record数目小于一定量,比如 <20个, 2. 纯度到达一定数值,比如80%, 就不再split了.




怎么取 validation set

holdout 方法如下表示,为了解决training set 和validation set 可能distribution 不同,还有一个引申出来的repeated-holdout



除了 accuracy, error rate, F1, Confusion Matrix

Week 5 Regression, Cluster, Association
Association:










Coursera, Big Data 4, Machine Learning With Big Data (week 3/4/5)的更多相关文章
- Coursera, Big Data 4, Machine Learning With Big Data (week 1/2)
Week 1 Machine Learning with Big Data KNime - GUI based Spark MLlib - inside Spark CRISP-DM Week 2, ...
- In machine learning, is more data always better than better algorithms?
In machine learning, is more data always better than better algorithms? No. There are times when mor ...
- [Javascript] Classify JSON text data with machine learning in Natural
In this lesson, we will learn how to train a Naive Bayes classifier and a Logistic Regression classi ...
- Coursera 学习笔记|Machine Learning by Standford University - 吴恩达
/ 20220404 Week 1 - 2 / Chapter 1 - Introduction 1.1 Definition Arthur Samuel The field of study tha ...
- [Machine Learning with Python] Data Preparation through Transformation Pipeline
In the former article "Data Preparation by Pandas and Scikit-Learn", we discussed about a ...
- [Machine Learning with Python] Data Preparation by Pandas and Scikit-Learn
In this article, we dicuss some main steps in data preparation. Drop Labels Firstly, we drop labels ...
- 斯坦福大学公开课机器学习:machine learning system design | data for machine learning(数据量很大时,学习算法表现比较好的原理)
下图为四种不同算法应用在不同大小数据量时的表现,可以看出,随着数据量的增大,算法的表现趋于接近.即不管多么糟糕的算法,数据量非常大的时候,算法表现也可以很好. 数据量很大时,学习算法表现比较好的原理: ...
- [Machine Learning with Python] Data Visualization by Matplotlib Library
Before you can plot anything, you need to specify which backend Matplotlib should use. The simplest ...
- Coursera《machine learning》--(14)数据降维
本笔记为Coursera在线课程<Machine Learning>中的数据降维章节的笔记. 十四.降维 (Dimensionality Reduction) 14.1 动机一:数据压缩 ...
随机推荐
- Nginx作为HTTP服务器--Nginx配置图片服务器
首先安装nginx安装环境 nginx是C语言开发,建议在linux上运行,本教程使用Centos6.5作为安装环境. --> gcc 安装nginx需要先将官网下载的源码进行编译,编译依赖 ...
- [SNOI2017]炸弹
嘟嘟嘟 这题有一些别的瞎搞神奇做法,而且复杂度似乎更优,不过我为了练线段树,就乖乖的官方正解了. 做法就是线段树优化建图+强连通分量缩点+DAGdp. 如果一个炸弹\(i\)能引爆另一个炸弹\(j\) ...
- sqlplus: error while loading shared libraries: libsqlplus.so: cannot open shared object file
sqlplus: error while loading shared libraries: libsqlplus.so: cannot open shared object file 1. 权限问题 ...
- Neutron:浮动ip
如果需要从外网直接访问 instance,则可以利用 floating IP. 下面是关于 floating IP 必须知道的事实: 1. floating IP 提供静态 NAT 功能,建立外网 ...
- FLOAT 和 DOUBLE区别
以下是 FLOAT 和 DOUBLE 的区别: float : 单精度浮点数 double : 双精度浮点数 ·浮点数以 8 位精度存储在 FLOAT 中,并且有四个字节. ·浮点数存储在 DOUBL ...
- Python测试模块doctest
面试被问到了却没有用过,很尴尬:今天看了一下,真的是一个很简单的测试模块 方便起见,这里直接拿菜鸟教程的介绍和例子过来 开发高质量软件的方法之一是为每一个函数开发测试代码,并且在开发过程中经常进行测试 ...
- Windows平台安装TensorFlow Q&A
·本文讲的是Windows平台使用原生pip进行TensorFlow(CPU版本)安装的注意事项及常见问题解决方法 ·这是TensorFlow官网的安装介绍:在 Windows 上安装 TensorF ...
- SpringBoot配置日志logback
1.这里我们选择logback,首先加入pom依赖 <dependency> <groupId>ch.qos.logback</groupId> <artif ...
- CentOS安装python3.6
下载Python安装包 cd /usr/local/src 编译时要提前装好gcc编译器和zlib zlib-devel 1.下载文件 wget https://www.python.org/ftp/ ...
- bzoj 3282: Tree (Link Cut Tree)
链接:https://www.lydsy.com/JudgeOnline/problem.php?id=3282 题面: 3282: Tree Time Limit: 30 Sec Memory L ...