Comprehensive learning path – Data Science in Python深入学习路径-使用python数据中学习
http://blog.csdn.net/pipisorry/article/details/44245575
关于怎么学习python,并将python用于数据科学、数据分析、机器学习中的一篇非常好的文章
Comprehensive learning path – Data Science in Python
深度学习路径-用python进行数据学习
Journey from a Pythonnoob(新手) to a Kaggler on Python
So, you want to become a data scientist or may be you are already one and want toexpand(扩张) your toolrepository(贮藏室).
You have landed at the right place. The aim of this page is to provide a comprehensive learning path to people new to python for data analysis. This path provides a comprehensiveoverview(综述)
of steps you need to learn to use Python for data analysis. If you already have some background, or don’t need all thecomponents(成分), feel free toadapt(适应)
your own paths and let us know how you made changes in the path.

Step 0: Warming up
Before starting your journey, the first question to answer is:
Why use Python?
or
How would Python be useful?
Watch the first 30 minutes of this v=CoxjADZHUQA">talk from Jeremy
Step 1: Setting up your machine
Now that you have made up your mind, it is time to set up your machine. The easiest way toproceed(開始) is to justdownload
Anaconda from Continuum.io . It comes packaged with most of the things you will need ever. The majordownside(下降趋势) of taking thisroute(路线)
is that you will need to wait for Continuum to update their packages, even when there might be an update available to theunderlying(潜在的) libraries.
If you are a starter, that should hardly matter.
If you face any challenges in installing(安装), you can find moredetailed
instructions for various OS here
Step 2: Learn the basics of Python language
You should start by understanding the basics of the language, libraries and datastructure(结构). The python track fromCodecademy
is one of the best places to start your journey. By end of this course, you should be comfortable writing small scripts on Python, but also understand classes and objects.
Specifically learn: Lists, Tuples, Dictionaries, List
comprehensions(理解), Dictionary comprehensions
Assignment: Solve the python
tutorial(辅导的) questions on HackerRank. These should get your brain thinking on Python scripting
Alternate resources: If
interactive(交互式的) coding is not your style of learning, you can also look at TheGoogle Class for Python.
It is a 2 day class series and also covers some of the parts discussed later.
Step 3: Learn Regular Expressions in Python
You will need to use them a lot for data
cleansing(净化), especially if you are working on text data. The best way tolearn Regular
expressions is to go through the Google class and keep this cheat sheet handy.
Assignment: Do the baby names exercise
If you still need more practice, follow this tutorial(个别指导) for text cleaning. It will challenge you on various stepsinvolved(包括)
in datawrangling(争论).
Step 4: Learn Scientific libraries in Python – NumPy, SciPy, Matplotlib and Pandas
This is where fun begins! Here is a brief introduction to various libraries. Let’s start practicing some common operations.
- Practice the NumPy tutorial thoroughly, especially NumPy
arrays(数组). This will form a goodfoundation(基础) for things to come. - Next, look at the SciPy tutorials. Go through the introduction and the basics and do the remaining onesbasis(基础) your needs.
- If you guessed Matplotlib tutorials next, you are wrong! They are too
comprehensive(综合的) for our need here. Instead look at thisipython
notebook till Line 68 (i.e. till
animations(活泼)) - Finally, let us look at Pandas. Pandas provide DataFrame
functionality(功能) (like R) for Python. This is also where you should spend good time practicing. Pandas would become the mosteffective(有效的)
tool for all mid-size data analysis. Start with a short introduction,10 minutes to pandas. Then move on to a more detailedtutorial
on pandas.
You can also look at Exploratory(勘探的) Data Analysis with Pandas andData
munging with Pandas
Additional Resources:
- If you need a book on Pandas and NumPy, “Python(巨蟒)
for Data Analysis by Wes McKinney” - There are a lot of tutorials(个别指导) as part of Pandasdocumentation(文件材料).
You can have a look at themhere
Assignment: Solve this assignment(分配) from CS109 course from Harvard.
Step 5: Effective Data Visualization
Go through this lecture form CS109. You can ignore(驳回诉讼) the initial 2 minutes, but what follows after that isawesome(可怕的)!
Follow this lecture up withthis assignment
Step 6: Learn Scikit-learn and Machine Learning
Now, we come to the meat of this entire process. Scikit-learn is the most useful library onpython(巨蟒) for machine learning.
Here is abriefoverview(综述)
of the library. Go through lecture 10 to lecture 18 fromCS109 course from Harvard. You will go through an overview of machine learning, Supervised learningalgorithms(算法)
likeregressions(回归), decision trees,ensemble(全体)
modeling and non-supervised learning algorithms likeclustering(聚集). Followindividual(个人的)
lectures with theassignments from those lectures.
Additional Resources:
- If there is one book, you must read, it is Programming Collective Intelligence – a
classic(经典的), but still one of the best books on the subject. - Additionally(附加的), you can also follow one of the best courses onMachine
Learning course from Yaser Abu-Mostafa. If you need more
lucid(明晰的) explanation for the techniques, you can opt for theMachine learning course from Andrew Ng and follow
the exercises on Python. - Tutorials(个别指导) on Scikit learn
Assignment: Try out this challenge on Kaggle
Step 7: Practice, practice and Practice
Congratulations, you made it!
You now have all what you need in technical skills. It is a matter of practice and what better place to practice than compete with fellow Data Scientists on Kaggle. Go, dive into one of the live competitions currently running onKaggle
and give all what you have learnt a try!
Step 8: Deep Learning
Now that you have learnt most of machine learning techniques, it is time to give Deep Learning a shot. There is a good chance that you already know what is Deep Learning, but if you still need a briefintro(介绍),here
it is.
I am myself new to deep learning, so please take these suggestions with apinch(匮乏) of salt. The mostcomprehensive(综合的)
resource isdeeplearning.net. You will find everything here – lectures, datasets, challenges, tutorials. You can also try thecourse
from Geoff Hinton a try in a bid to understand the basics of Neural Networks.
P.S. In case you need to use Big Data libraries, give
Pydoop and PyMongo a try. They are not included here as Big Data learning path is an entire topic in itself.
from:http://blog.csdn.net/pipisorry/article/details/44245575
ref:http://www.analyticsvidhya.com/learning-paths-data-science-business-analytics-business-intelligence-big-data/learning-path-data-science-python/
Comprehensive learning path – Data Science in Python深入学习路径-使用python数据中学习的更多相关文章
- 【转】Comprehensive learning path – Data Science in Python
Journey from a Python noob to a Kaggler on Python So, you want to become a data scientist or may be ...
- Machine Learning and Data Science 教授大师
http://www.cs.cmu.edu/~avrim/courses.html Foundations of Data Science Avrim Blum, www.cs.cornell.edu ...
- 数据科学的完整学习路径(Python版)
转载自:http://python.jobbole.com/80981/ 英文(原文)连接:https://www.analyticsvidhya.com/learning-paths-data-sc ...
- Python学习路径8——Python对象2
1.标准型运营商 1.1对象值对照 比较运算符用于如果相同类型的对象是相等.所有的内建类型的是在比较操作中支持,返回布尔比较操作值True 或 False. <span style=" ...
- 在windows中python安装sit-packages路径位置 在Pycharm中导入opencv不能自动代码补全问题
在Pycharm中导入opencv不能自动代码补全问题 近期学习到计算机视觉库的相关知识,经过几个小时的探讨,终于解决了opencv不能自动补全代码的困惑, 我们使用pycharm安装配置可能会添加多 ...
- R8:Learning paths for Data Science[continuous updating…]
Comprehensive learning path – Data Science in Python Journey from a Python noob to a Kaggler on Pyth ...
- 【转】The most comprehensive Data Science learning plan for 2017
I joined Analytics Vidhya as an intern last summer. I had no clue what was in store for me. I had be ...
- 【Repost】A Practical Intro to Data Science
Are you a interested in taking a course with us? Learn about our programs or contact us at hello@zip ...
- Data science blogs
Data science blogs A curated list of data science blogs Agile Data Science http://blog.sense.io/ (RS ...
随机推荐
- C#判断程序是否以管理员身份运行,否则以管理员身份重新打开
/// <summary> /// 判断程序是否是以管理员身份运行. /// </summary> public static bool IsRunAsAdmin() { Wi ...
- 还是把一个课程设计作为第一篇文章吧——学生学籍管理系统(C语言)
#include <stdio.h> #include<stdlib.h> #include<string.h> typedef struct student { ...
- SQL从入门到基础 - 04 SQLServer基础2(数据删除、数据检索、数据汇总、数据排序、通配符过滤、空值处理、多值匹配)
一.数据删除 1. 删除表中全部数据:Delete from T_Person. 2. Delete 只是删除数据,表还在,和Drop Table(数据和表全部删除)不同. 3. Delete 也可以 ...
- C#方法的使用
static void Main(string[] arr) { , ); Console.WriteLine(max); Console.ReadKey(); } /// <summary&g ...
- jsp中的动作元素:<jsp:plugin>
<jsp:plugin>用来产生客户端浏览器的特别标签(object或embed),可以使用它来插入Applet或JavaBean. 当jsp文件被编译把结果发给浏览器是,<jsp: ...
- linux inode已满解决方法
今天login server的一个网站,发现login后没有生成session.根据以往经验,一般是空间已满导致session文件生成失败. df -h Filesystem Size Used Av ...
- switch case加条件语句(非等值) php
<?php $s_level=rand(1,6); echo $s_level.'<br/>'; switch(true){ case $s_level<3 : echo 'l ...
- Python自动化运维之10、模块之json、pickle、XML、PyYAML、configparser、shutil
序列化 Python中用于序列化的两个模块 json 用于[字符串]和 [python基本数据类型] 间进行转换 pickle 用于[python特有的类型] 和 [python基本数据类 ...
- Swift—析构函数-备
与构造过程相反,实例最后释放的时候,需要清除一些资源,这个过程就是析构过程.在析构过程中也会调用一种特殊的方法deinit,称为析构函数.析构函数deinit没有返回值,也没有参数,也不需要参数的小括 ...
- 转:C语言申请内存时堆栈大小限制
一直都有一个疑问,一个进程可以使用多大的内存空间,swap交换空间以及物理内存的大小,ulimit的stack size对进程的内存使用有怎样的限制?今天特亲自动手实验了一次,总结如下: 开辟一片内存 ...