The 10 Statistical Techniques Data Scientists Need to Master
就我个人所知有太多的软件工程师尝试转行到数据科学家而盲目地使用机器学习框架来处理数据,例如,TensorFlow或者Apache Spark,但是对于这些框架背后的统计理论没有完全的理解。所以提起 statistical learning,这是机器学习的理论框架,是从统计学和泛函分析(functional analysis)的领域中发展出来的。
推荐的三本书:
- Intro to Statistical Learning (Hastie, Tibshirani, Witten, James)
- Doing Bayesian Data Analysis(Kruschke)
- Time Series Analysis and Applications (Shumway, Stoffer)
我在下面的这些内容上做了很多的练习:
Bayesian Analysis, Markov Chain Monte Carlo, Hierarchical Modeling, Supervised and Unsupervised Learning
推荐的课程:
Recently, I completed the Statistical Learning online course on Stanford Lagunita, which covers all the material in the Intro to Statistical Learning book I read in my Independent Study. Now being exposed to the content twice, I want to share the 10 statistical techniques from the book that I believe any data scientists should learn to be more effective in handling big datasets.
The 10 Statistical Techniques Data Scientists Need to Master的更多相关文章
- Why Apache Spark is a Crossover Hit for Data Scientists [FWD]
Spark is a compelling multi-purpose platform for use cases that span investigative, as well as opera ...
- Seven Python Tools All Data Scientists Should Know How to Use
Seven Python Tools All Data Scientists Should Know How to Use If you’re an aspiring data scientist, ...
- 8 Productivity hacks for Data Scientists & Business Analysts
8 Productivity hacks for Data Scientists & Business Analysts Introduction I was catching up with ...
- Software development skills for data scientists
Software development skills for data scientists Data scientists often come from diverse backgrounds ...
- 18 Candidates for the Top 10 Algorithms in Data Mining
Classification============== #1. C4.5 Quinlan, J. R. 1993. C4.5: Programs for Machine Learning.Morga ...
- 【转】深受开发者喜爱的10大Core Data工具和开源库
http://www.cocoachina.com/ios/20150902/13304.html 在iOS和OSX应用程序中存储和查询数据,Core Data是一个很好的选择.它不仅可以减少内存使用 ...
- [Android Tips] 10. Pull out /data/data/${package_name} files without root access
#!/usr/bin/env bash PACKAGE_NAME=com.your.package DB_NAME=data.db rm -rf ${DB_NAME} adb shell " ...
- Top Data Scientists to Follow & Best Data Science Tutorials on GitHub
http://www.analyticsvidhya.com/blog/2015/07/github-special-data-scientists-to-follow-best-tutorials/ ...
- 10 Big Data Possibilities for 2017 Based on Oracle's Predictions
2017 will see a host of informed predictions, lower costs, and even business-centric gains, courtesy ...
随机推荐
- 【原创】运维基础之Nginx(3)location和rewrite
nginx location =:精确匹配(必须全部相等) ~:大小写敏感,正则匹配 ~*:忽略大小写,正则匹配 ^~:只需匹配uri部分,精确匹配 @:内部服务跳转,精确匹配 优先级: Exact ...
- java实现生产者和消费者问题
Java实现生产者和消费者问题 欢迎访问我的个人博客,获取更多有用的东西 链接一 链接二 也可以关注我的微信订阅号:CN丶Moti
- 企业QQ在线咨询接入
普通QQ在线咨询接入 http://wpa.qq.com/msgrd?v=3&uin=4009603616&site=qq&menu=yes; 企业QQ在线咨询接入 ...
- CDH5.13.3安装手册
Server端需要打开端口 7180 7182 选址正确的版本,cdh版本不要高于cm版本 CM下载地址 http://archive.cloudera.com/cm5/cm/5/cloudera-m ...
- 华为ensp问题:云映射本地网卡,直连路由器可以ping通,pc却不行?
拓扑图:cloud 云映射本机物理网卡:192.168.56.1 R1可以Ping通,所有Pc都不行,路由表也存在路由信息,不知道什么问题?
- Cocos Creator 热更新文件MD5计算和需要注意的问题
Creator的热更新使用jsb.热更新基本按照 http://docs.cocos.com/creator/manual/zh/advanced-topics/hot-update.html?h=% ...
- 分布式特点理解-Zookeeper准备
分布式环境特点 分布性 地域,区域,机房,服务器不同导致分布性 并发性 程序运行中,并发性操作很常见,比如同一个分布式系统中的多个节点,同时访问一个共享资源(数据库,分布式存储) 无序性 进程之间的消 ...
- java执行字符串中的运算公式
import javax.script.ScriptEngine; import javax.script.ScriptEngineManager; import javax.script.Scrip ...
- Atcoder Regular 099 暴力区间扩张 n/dig(n)极值打表 团分割背包
C 直接把第一次加在哪里for一遍即可 /*Huyyt*/ #include<bits/stdc++.h> #define mem(a,b) memset(a,b,sizeof(a)) u ...
- 格式化输出的三种方式,运算符及流程控制之if判断
''' 格式化输出的三种方式,运算符及流程控制之if判断 ''' # 格式化输出的三种方式 # 一.占位符 程序中经常会有这样场景:要求用户输入信息,然后打印成固定的格式 比如要求用户输入用户名和年龄 ...