week 3 Classification

  

KNN :基本思想是 input value 类似,就可能是同一类的

  

  

Decision Tree

  

  

  

  

Naive Bayes

  

  

Week 4 Evaluating model


Over-fitting

怎么在Decision Tree 训练时避免 overfitting: Pre-Pruning 和 Post-Pruning

pre-pruning 两个停止条件:1. 某个node上的record数目小于一定量,比如 <20个, 2. 纯度到达一定数值,比如80%, 就不再split了.

怎么取 validation set

holdout 方法如下表示,为了解决training set 和validation set 可能distribution 不同,还有一个引申出来的repeated-holdout

除了 accuracy, error rate, F1, Confusion Matrix

Week 5 Regression, Cluster, Association

Association:

Coursera, Big Data 4, Machine Learning With Big Data (week 3/4/5)的更多相关文章

  1. Coursera, Big Data 4, Machine Learning With Big Data (week 1/2)

    Week 1 Machine Learning with Big Data KNime - GUI based Spark MLlib - inside Spark CRISP-DM Week 2, ...

  2. In machine learning, is more data always better than better algorithms?

    In machine learning, is more data always better than better algorithms? No. There are times when mor ...

  3. [Javascript] Classify JSON text data with machine learning in Natural

    In this lesson, we will learn how to train a Naive Bayes classifier and a Logistic Regression classi ...

  4. Coursera 学习笔记|Machine Learning by Standford University - 吴恩达

    / 20220404 Week 1 - 2 / Chapter 1 - Introduction 1.1 Definition Arthur Samuel The field of study tha ...

  5. [Machine Learning with Python] Data Preparation through Transformation Pipeline

    In the former article "Data Preparation by Pandas and Scikit-Learn", we discussed about a ...

  6. [Machine Learning with Python] Data Preparation by Pandas and Scikit-Learn

    In this article, we dicuss some main steps in data preparation. Drop Labels Firstly, we drop labels ...

  7. 斯坦福大学公开课机器学习:machine learning system design | data for machine learning(数据量很大时,学习算法表现比较好的原理)

    下图为四种不同算法应用在不同大小数据量时的表现,可以看出,随着数据量的增大,算法的表现趋于接近.即不管多么糟糕的算法,数据量非常大的时候,算法表现也可以很好. 数据量很大时,学习算法表现比较好的原理: ...

  8. [Machine Learning with Python] Data Visualization by Matplotlib Library

    Before you can plot anything, you need to specify which backend Matplotlib should use. The simplest ...

  9. Coursera《machine learning》--(14)数据降维

    本笔记为Coursera在线课程<Machine Learning>中的数据降维章节的笔记. 十四.降维 (Dimensionality Reduction) 14.1 动机一:数据压缩 ...

随机推荐

  1. Nginx作为HTTP服务器--Nginx配置图片服务器

      首先安装nginx安装环境 nginx是C语言开发,建议在linux上运行,本教程使用Centos6.5作为安装环境. --> gcc 安装nginx需要先将官网下载的源码进行编译,编译依赖 ...

  2. [SNOI2017]炸弹

    嘟嘟嘟 这题有一些别的瞎搞神奇做法,而且复杂度似乎更优,不过我为了练线段树,就乖乖的官方正解了. 做法就是线段树优化建图+强连通分量缩点+DAGdp. 如果一个炸弹\(i\)能引爆另一个炸弹\(j\) ...

  3. sqlplus: error while loading shared libraries: libsqlplus.so: cannot open shared object file

    sqlplus: error while loading shared libraries: libsqlplus.so: cannot open shared object file 1. 权限问题 ...

  4. Neutron:浮动ip

    如果需要从外网直接访问 instance,则可以利用 floating IP.   下面是关于 floating IP 必须知道的事实: 1. floating IP 提供静态 NAT 功能,建立外网 ...

  5. FLOAT 和 DOUBLE区别

    以下是 FLOAT 和 DOUBLE 的区别: float : 单精度浮点数 double : 双精度浮点数 ·浮点数以 8 位精度存储在 FLOAT 中,并且有四个字节. ·浮点数存储在 DOUBL ...

  6. Python测试模块doctest

    面试被问到了却没有用过,很尴尬:今天看了一下,真的是一个很简单的测试模块 方便起见,这里直接拿菜鸟教程的介绍和例子过来 开发高质量软件的方法之一是为每一个函数开发测试代码,并且在开发过程中经常进行测试 ...

  7. Windows平台安装TensorFlow Q&A

    ·本文讲的是Windows平台使用原生pip进行TensorFlow(CPU版本)安装的注意事项及常见问题解决方法 ·这是TensorFlow官网的安装介绍:在 Windows 上安装 TensorF ...

  8. SpringBoot配置日志logback

    1.这里我们选择logback,首先加入pom依赖 <dependency> <groupId>ch.qos.logback</groupId> <artif ...

  9. CentOS安装python3.6

    下载Python安装包 cd /usr/local/src 编译时要提前装好gcc编译器和zlib zlib-devel 1.下载文件 wget https://www.python.org/ftp/ ...

  10. bzoj 3282: Tree (Link Cut Tree)

    链接:https://www.lydsy.com/JudgeOnline/problem.php?id=3282 题面: 3282: Tree Time Limit: 30 Sec  Memory L ...