From: http://nirvacana.com/thoughts/becoming-a-data-scientist/

Data Science, Machine Learning, Big Data Analytics, Cognitive Computing …. well all of us have been avalanched with articles, skills demand info graph’s and point of views on these topics (yawn!). One thing is for sure; you cannot become a data scientist overnight. Its a journey, for sure a challenging one. But how do you go about becoming one? Where to start? When do you start seeing light at the end of the tunnel? What is the learning roadmap? What tools and techniques do I need to know? How will you know when you have achieved your goal?

Given how critical visualization is for data science, ironically I was not able to find (except for a few), pragmatic and yet visual representation of what it takes to become a data scientist. So here is my modest attempt at creating a curriculum, a learning plan that one can use in this becoming a data scientist journey. I took inspiration from the metro maps and used it to depict the learning path. I organized the overall plan progressively into the following areas / domains,

  1. Fundamentals
  2. Statistics
  3. Programming
  4. Machine Learning
  5. Text Mining / Natural Language Processing
  6. Data Visualization
  7. Big Data
  8. Data Ingestion
  9. Data Munging
  10. Toolbox

Each area  / domain is represented as a “metro line”, with the stations depicting the topics you must learn / master / understand in a progressive fashion. The idea is you pick a line, catch a train and go thru all the stations (topics) till you reach the final destination (or) switch to the next line. I have progressively marked each station (line) 1 thru 10 to indicate the order in which you travel. You can use this as an individual learning plan to identify the areas you most want to develop and the acquire skills. By no means this is the end; but a solid start. Feel free to leave your comments and constructive feedback.

Becoming a Data Scientist – Curriculum via Metromap的更多相关文章

  1. What do data scientist do?

    What do data scientist do? 1. Define the question 2.Define the ideal data set 3.Determine what data ...

  2. 现在很火的数据科学到底是什么?你对做DATA SCIENTIST感兴趣吗?

    转自– Warald (Email: iamxiaoning@gmail.com) 博客: http://www.1point3acres.com,微博:http://www.weibo.com/wa ...

  3. 记录一下我做Udacity 的Data Scientist Nano Degree Project

    做项目的时候看了别人的blog,决定自己也随手记录下在做项目中遇到的好的小知识点. 最近在做Udacity的Data Scientist Nano Degree Project的Customer_Se ...

  4. 数据分析师(Data Analyst),数据工程师(Data Engineer),数据科学家(Data Scientist)的区别

    数据分析师(Data Analyst):负责从数据中提取出有用的信息,以帮助公司形成业务决策.工作内容包括:对数据进行提取,清洗,分析(用描述统计量,趋势分析,多维度分析,假设检验等统计常用方法对数据 ...

  5. 数据科学工作者(Data Scientist) 的日常工作内容包括什么

    数据科学工作者(Data Scientist) 的日常工作内容包括什么 众所周知,数据科学是这几年才火起来的概念,而应运而生的数据科学家(data scientist)明显缺乏清晰的录取标准和工作内容 ...

  6. Principal Data Scientist

    http://stackoverflow.com/jobs/124781/principal-data-scientist-concur-technologies-inc?med=clc&re ...

  7. 微软职位内部推荐-Senior Data Scientist

    微软近期Open的职位: Extracting accurate, insightful and actionable information from data is part art and pa ...

  8. 微软职位内部推荐-Data Scientist

    微软近期Open的职位: Job Description:Extracting accurate, insightful and actionable information from data is ...

  9. Data scientist———java实现常见的机器学习代码(跟百度深度学习研究院师兄学机器学习)

    2016-05-02开始决定好好记录一切有关<数据科学家>的学习过程.记录学习笔记. --------------------------------------------------- ...

随机推荐

  1. C++字符串常量

    C++字符串常量 当一个字符串常量出现于表达式中时,它的值是个指针常量.编译器把这个指定字符的一份copy存储在内存的某个位置(全局区),并存储一个指向第一个字符的指针. #include <i ...

  2. 人脸识别经典算法二:LBP方法

    与第一篇博文特征脸方法不同,LBP(Local Binary Patterns,局部二值模式)是提取局部特征作为判别依据的.LBP方法显著的优点是对光照不敏感,但是依然没有解决姿态和表情的问题.不过相 ...

  3. [ASE][Daily Scrum]11.24

    今天开会总结了一下第一周的进度,讨论了无限地图的访存方法,做了简单的人员调整, Client的包接收分析与服务器通信这块基本上完成了, 之后Jiafan Zhu会开始和Junbei以及Songtao一 ...

  4. JSON与XML的区别

    1.定义介绍 (1).XML定义扩展标记语言 (Extensible Markup Language, XML) ,用于标记电子文件使其具有结构性的标记语言,可以用来标记数据.定义数据类型,是一种允许 ...

  5. C#设计模式(18)——中介者模式(Mediator Pattern)

    一.引言 在现实生活中,有很多中介者模式的身影,例如QQ游戏平台,聊天室.QQ群和短信平台,这些都是中介者模式在现实生活中的应用,下面就具体分享下我对中介者模式的理解. 二. 中介者模式的介绍 2.1 ...

  6. 基于 IdentityServer3 实现 OAuth 2.0 授权服务数据持久化

    最近花了一点时间,阅读了IdentityServer的源码,大致了解项目整体的抽象思维.面向对象的重要性; 生产环境如果要使用 IdentityServer3 ,主要涉及授权服务,资源服务的部署负载的 ...

  7. 学习WPF——初识依赖项属性

    入门 首先创建一个依赖项属性 然后绑定父容器的DataContext到这个依赖项的实例 接着绑定子元素的属性到依赖项属性(注意Button的Content属性) 程序最终的运行结果:   说明 首先是 ...

  8. Python模拟用户登录

    # coding=utf8 import hashlib db = { 'michael':'e10adc3949ba59abbe56e057f20f883e', 'bob':'878ef96e861 ...

  9. IBM的IT战略规划方法论

    IBM的IT战略规划方法论 http://wenku.baidu.com/view/42489e21aaea998fcc220e92.html?re=view http://wenku.baidu.c ...

  10. AsyncTask实现多线程断点续传

    前面一篇博客<AsyncTask实现断点续传>讲解了如何实现单线程下的断点续传,也就是一个文件只有一个线程进行下载.   对于大文件而言,使用多线程下载就会比单线程下载要快一些.多线程下载 ...