Coursera, Big Data 4, Machine Learning With Big Data (week 1/2)
Week 1 Machine Learning with Big Data
KNime - GUI based
Spark MLlib - inside Spark
CRISP-DM



Week 2, Data Exploration
一般有两种方法,summary statistics 和 visualization

Summary statistics (mean 平均数,median 中位数, mode 最常见的数)




high Kurtosis 预示着有outlier的存在

visualization

这里详细讲一下 box plot
下图的 upper quartile 和 lower quartile 分别指的是 75% 和 25% 的点, median 很明显是中位数点,中间柱状部分的数据占了总数据的50%. Upper extreme 和 Lower extreme 分别是90% 和 10% 数据的点,超出部分就是outliers.

Data preparing


data wrangling 主要是transformation

Coursera, Big Data 4, Machine Learning With Big Data (week 1/2)的更多相关文章
- Coursera, Big Data 4, Machine Learning With Big Data (week 3/4/5)
week 3 Classification KNN :基本思想是 input value 类似,就可能是同一类的 Decision Tree Naive Bayes Week 4 Evaluating ...
- In machine learning, is more data always better than better algorithms?
In machine learning, is more data always better than better algorithms? No. There are times when mor ...
- [Javascript] Classify JSON text data with machine learning in Natural
In this lesson, we will learn how to train a Naive Bayes classifier and a Logistic Regression classi ...
- Coursera 学习笔记|Machine Learning by Standford University - 吴恩达
/ 20220404 Week 1 - 2 / Chapter 1 - Introduction 1.1 Definition Arthur Samuel The field of study tha ...
- [Machine Learning with Python] Data Preparation through Transformation Pipeline
In the former article "Data Preparation by Pandas and Scikit-Learn", we discussed about a ...
- [Machine Learning with Python] Data Preparation by Pandas and Scikit-Learn
In this article, we dicuss some main steps in data preparation. Drop Labels Firstly, we drop labels ...
- 斯坦福大学公开课机器学习:machine learning system design | data for machine learning(数据量很大时,学习算法表现比较好的原理)
下图为四种不同算法应用在不同大小数据量时的表现,可以看出,随着数据量的增大,算法的表现趋于接近.即不管多么糟糕的算法,数据量非常大的时候,算法表现也可以很好. 数据量很大时,学习算法表现比较好的原理: ...
- [Machine Learning with Python] Data Visualization by Matplotlib Library
Before you can plot anything, you need to specify which backend Matplotlib should use. The simplest ...
- Coursera《machine learning》--(14)数据降维
本笔记为Coursera在线课程<Machine Learning>中的数据降维章节的笔记. 十四.降维 (Dimensionality Reduction) 14.1 动机一:数据压缩 ...
随机推荐
- Windows -- 从注册表删除IE浏览器加载项
Windows -- 从注册表删除IE浏览器加载项 1. 一部分加载项从注册表以下位置直接删除 2. 一部分扩展项从注册表以下位置直接删除
- 「插件」Runner更新Pro版,帮助设计师远离996
三年多前Runner团队在德国汉堡的骇客松上第一次发布了Sketch插件Runner的beta版本.从那以后,这个团队的目标一直很清晰: 创造一个加速设计工作流的工具. 他们只给Runner添加真正能 ...
- 数论 C - Aladdin and the Flying Carpet
It's said that Aladdin had to solve seven mysteries before getting the Magical Lamp which summons a ...
- C++ 精英化趋势
精英化趋势 C++ 是一门引起无数争议的语言.眼下最常听到的声音则是 C++ 将趋于没落,会被某某语言取代.我很怀疑这种论调的起点是商业宣传,C++ 的真实趋势应该是越来越倾向于精英化. 精英化是指在 ...
- 解决mySQL数据库锁表问题。
先用这条命令查询数据库阻塞的进程 SELECT * FROM information_schema.innodb_trx 找到后在根据下图这个字段:try_mysql_thread_id 作为这条数据 ...
- C语言函数的格式
#include <stdio.h>#include <stdlib.h>extern int addf(int a,int b);//函数能多次声明//int addf(in ...
- iOS 关于监听手机截图,UIView生成UIImage, UIImage裁剪与压缩的总结
一. 关于监听手机截图 1. 背景: 发现商品的售价页总是被人转发截图,为了方便用户添加截图分享的小功能 首先要注册用户截屏操作的通知 - (void)viewDidLoad { [super vi ...
- CRM公海自动回收规则
企微云CRM操作指南 – 道一云|企微https://wbg.do1.com.cn/xueyuan/2568.html 销售云 - 美洽 - 连接客户,亲密无间https://meiqia.com/s ...
- ABP拦截器之UnitOfWorkRegistrar(二)
在上面一篇中我们主要是了解了在ABP系统中是如何使用UnitOfWork以及整个ABP系统中如何执行这些过程的,那么这一篇就让我们来看看UnitOfWorkManager中在执行Begin和Compl ...
- document对象获取例子
<!DOCTYPE html> <html> <head> <meta charset="utf-8" /> <title&g ...