week 3 Classification

  

KNN :基本思想是 input value 类似,就可能是同一类的

  

  

Decision Tree

  

  

  

  

Naive Bayes

  

  

Week 4 Evaluating model


Over-fitting

怎么在Decision Tree 训练时避免 overfitting: Pre-Pruning 和 Post-Pruning

pre-pruning 两个停止条件:1. 某个node上的record数目小于一定量,比如 <20个, 2. 纯度到达一定数值,比如80%, 就不再split了.

怎么取 validation set

holdout 方法如下表示,为了解决training set 和validation set 可能distribution 不同,还有一个引申出来的repeated-holdout

除了 accuracy, error rate, F1, Confusion Matrix

Week 5 Regression, Cluster, Association

Association:

Coursera, Big Data 4, Machine Learning With Big Data (week 3/4/5)的更多相关文章

  1. Coursera, Big Data 4, Machine Learning With Big Data (week 1/2)

    Week 1 Machine Learning with Big Data KNime - GUI based Spark MLlib - inside Spark CRISP-DM Week 2, ...

  2. In machine learning, is more data always better than better algorithms?

    In machine learning, is more data always better than better algorithms? No. There are times when mor ...

  3. [Javascript] Classify JSON text data with machine learning in Natural

    In this lesson, we will learn how to train a Naive Bayes classifier and a Logistic Regression classi ...

  4. Coursera 学习笔记|Machine Learning by Standford University - 吴恩达

    / 20220404 Week 1 - 2 / Chapter 1 - Introduction 1.1 Definition Arthur Samuel The field of study tha ...

  5. [Machine Learning with Python] Data Preparation through Transformation Pipeline

    In the former article "Data Preparation by Pandas and Scikit-Learn", we discussed about a ...

  6. [Machine Learning with Python] Data Preparation by Pandas and Scikit-Learn

    In this article, we dicuss some main steps in data preparation. Drop Labels Firstly, we drop labels ...

  7. 斯坦福大学公开课机器学习:machine learning system design | data for machine learning(数据量很大时,学习算法表现比较好的原理)

    下图为四种不同算法应用在不同大小数据量时的表现,可以看出,随着数据量的增大,算法的表现趋于接近.即不管多么糟糕的算法,数据量非常大的时候,算法表现也可以很好. 数据量很大时,学习算法表现比较好的原理: ...

  8. [Machine Learning with Python] Data Visualization by Matplotlib Library

    Before you can plot anything, you need to specify which backend Matplotlib should use. The simplest ...

  9. Coursera《machine learning》--(14)数据降维

    本笔记为Coursera在线课程<Machine Learning>中的数据降维章节的笔记. 十四.降维 (Dimensionality Reduction) 14.1 动机一:数据压缩 ...

随机推荐

  1. vmware 14 新安装centos7 没法联网

    vmware14 刚安装好centos7后,想下载安装一些软件发现无法联网,于是就百度了一下.下面 记录下解决方法. 1 确报主机能上网. 2 设置虚拟机网络适配器 3 设置虚拟机网卡 4 修改cen ...

  2. 24G的SSD有什么用

    有台12G内存,带24G的SSD的笔记本,系统自带WINDOWS8,最近感觉很慢,就动手把1T的硬盘升级到512的SSD. BIOS里面明明看到24G的SSD,Windows里面就消失了(应该是坏掉了 ...

  3. 订制rpm包到Centos7镜像中

    本文以CentOS 7.4 最小化镜像(CentOS-7-x86_64-Minimal-1708.iso)为模版 要达到的目的: 1.订制所需的rpm软件包集成到iso文件中 2.制作完成的ISO全自 ...

  4. js操作文章、字符串换行

    操作前: 操作后: 第一步: 把中英文的逗号和顿号置换为 '\n’ support_unit = support_unit.replace(/,|,|./g, '\n')   第二步: //为了使\n ...

  5. 【spring源码分析】IOC容器初始化(六)

    前言:经过前几篇文章的讲解,我们已经得到了BeanDefinition,接下来将分析Bean的加载. 获取Bean的入口:AbstractApplicationContext#getBean publ ...

  6. 013_针对单个pid的cpu/内存/io的资源占用统计

    #!/usr/bin/env python import sys import os import subprocess from decimal import Decimal from decima ...

  7. soamanager发布的Webservice服务,调用时出现http500报错

    最近再给薪酬那边发布ws服务时出现了报错,调用方反馈了errorCode:BEA-380002.在使用XMLspy工具去调用这个WSDL时候,则反馈http500的错误消息.如下图: 遇到这种问题我通 ...

  8. keepalived的主从备份服务器

    一.环境说明 1.操作系统内核版本:linux 6.0 2.Keepalived软件版本:keepalived-1.1.20.tar.gz 二.环境配置 1.主Keepalived服务器IP地址 19 ...

  9. 如何把Office365的更新从半年通道改成月度通道

    转自msdn,转发链接:www.cnblogs.com/Charltsing/p/Office365month.html 作者QQ: 564955427 建立一个Bat文件,写入 下面内容 setlo ...

  10. 菜鸟学IT之python词云初体验

    作业来源:https://edu.cnblogs.com/campus/gzcc/GZCC-16SE1/homework/2822 1. 下载一长篇中文小说. 2. 从文件读取待分析文本. txt = ...