Using Pandas Library

The simplest way is to read data from .csv files and store it as a data frame object:

import pandas as pd

df = pd.read_csv('olympics.csv', index_col=0, skiprows=1)

You can also read .xsl files and directly select the rows and columns you are interested in by setting parameters skiprows, usecols. Also, you can indicate index column by parameter index_col.

energy=pd.read_excel('Energy Indicators.xls', sheet_name='Energy',skiprows=8,usecols='E,G', index_col=None, na_values=['NA'])

For .txt files, you can also use read_csv function by defining the separation symbol:

university_towns=pd.read_csv('university_towns.txt',sep='\n',header=None)

See more about pandas io operations in http://pandas.pydata.org/pandas-docs/stable/io.html

Using os Module

Read .csv files:

import os

import csv

for file in os.listdir("objective_folder"):

	with open('objective_folder/'+file, newline='') as csvfile:

	rows = csv.reader(csvfile) # read csc file

	for row in rows: # print each line in the file

		print(row)

Read .xsl files:

import os

import xlrd

for file in os.listdir("objective_folder/"):

	data = xlrd.open_workbook('objective_folder/'+file)

	table = sheel_1 = data.sheet_by_index(0)#the first sheet in Excel

	nrows = table.nrows #row number

	for i in range(nrows):

		if i == 0: # skip the first row if it defines variable names

		continue

		row_values = table.row_values(i) #read each row value

		print(row_values)

Download from Website Automatically

We can also try to read data directly from url link. This time, the .csv file is compressed as housing.tgz. We need to download the file and then decompress it. So you can write a small function as below to realize it. It is a worthy effort because you can get the most recent data every time you run the function.

 import os

 import tarfile

 from six.moves import urllib

 DOWNLOAD_ROOT = "https://raw.githubusercontent.com/ageron/handson-ml/master/"

 HOUSING_PATH = "datasets/housing"

 HOUSING_URL = DOWNLOAD_ROOT + HOUSING_PATH + "/housing.tgz"

 def fetch_housing_data(housing_url=HOUSING_URL, housing_path=HOUSING_PATH):

     if not os.path.isdir(housing_path):

         os.makedirs(housing_path)

         tgz_path = os.path.join(housing_path, "housing.tgz")

         urllib.request.urlretrieve(housing_url, tgz_path)

         housing_tgz = tarfile.open(tgz_path)

         housing_tgz.extractall(path=housing_path)

         housing_tgz.close()

when you call fetch_housing_data(), it creates a datasets/housing directory in your workspace, downloads the housing.tgz file, and extracts the housing.csv from it in this directory.
Now let’s load the data using Pandas. Once again you should write a small function to load the data:

import pandas as pd

def load_housing_data(housing_path=HOUSING_PATH):

    csv_path = os.path.join(housing_path, "housing.csv")

    return pd.read_csv(csv_path)

What’s more?

These methods are what I have met so far. In typical environments your data would be available in a relational database (or some other common datastore) and spread across multiple tables/documents/files. To access it, you would first need to get your credentials and access authorizations, and familiarize yourself with the data schema. I will supplement more methods if I encounter in the future.

[Machine Learning with Python] How to get your data?的更多相关文章

【Machine Learning】Python开发工具：Anaconda+Sublime
Python开发工具:Anaconda+Sublime 作者:白宁超 2016年12月23日21:24:51 摘要:随着机器学习和深度学习的热潮,各种图书层出不穷.然而多数是基础理论知识介绍,缺乏实现 ...
Python (1) - 7 Steps to Mastering Machine Learning With Python
Step 1: Basic Python Skills install Anacondaincluding numpy, scikit-learn, and matplotlib Step 2: Fo ...
Getting started with machine learning in Python
Getting started with machine learning in Python Machine learning is a field that uses algorithms to ...
《Learning scikit-learn Machine Learning in Python》chapter1
前言由于实验原因,准备入坑 python 机器学习,而 python 机器学习常用的包就是 scikit-learn ,准备先了解一下这个工具.在这里搜了有 scikit-learn 关键字的书,找 ...
Machine Learning的Python环境设置
Machine Learning目前经常使用的语言有Python.R和MATLAB.如果采用Python,需要安装大量的数学相关和Machine Learning的包.一般安装Anaconda,可以把 ...
[Machine Learning with Python] Familiar with Your Data
Here I list some useful functions in Python to get familiar with your data. As an example, we load a ...
[Machine Learning with Python] My First Data Preprocessing Pipeline with Titanic Dataset
The Dataset was acquired from https://www.kaggle.com/c/titanic For data preprocessing, I firstly def ...
[Machine Learning with Python] Data Preparation through Transformation Pipeline
In the former article "Data Preparation by Pandas and Scikit-Learn", we discussed about a ...
[Machine Learning with Python] Data Preparation by Pandas and Scikit-Learn
In this article, we dicuss some main steps in data preparation. Drop Labels Firstly, we drop labels ...

随机推荐

【整理】虚拟机和主机ping不通解决办法，虚拟机ping不通外网的解决方法
检查几个方面: 1.检查虚拟网卡有没有被禁用2.检查虚拟机与物理机是否在一个VMNet中3.检查虚拟机的IP地址与物理机对应的VMNet是否在一个网段4.检查虚拟机与物理机的防火墙是否允许PING, ...
跟踪路由 tracert
由于最近遇到网络出现故障的问题,便使用到Tracert来确定了下出现故障的网络节点记录下tracert命令相关内容 1. 简介 2. Tracert工作原理... 3. 常用参数 4. 使用示例与输 ...
中国电信物联网平台入门学习笔记7：NB-IOT信号如何检测
NB-IOT设备会因为信号的原因,数据发不出.但数据发不出的原因有很多,这么排除是NB-IOT信号的问题呢?那就需要NB-IOT信号检测装置. 网上的信号检测设备作为一个常年蜗居在实验室的穷屌丝而言 ...
线段树：CDOJ1592-An easy problem B （线段树的区间合并）
An easy problem B Time Limit: 2000/1000MS (Java/Others) Memory Limit: 65535/65535KB (Java/Others) Pr ...
Cplex: MIP Control Callback
*本文主要记录和分享学习到的知识,算不上原创 *参考文献见链接之前,我们有简单提到Cplex中的MIP Callback Interface,包括了Informational callback, q ...
如何利用App打造自明星实现自盈利
1.了解各个概念为了大家都能看懂这篇文章,先说明几个概念. App(Application):可以在移动设备上使用,满足人们咨询.购物.社交.娱乐.搜索等需求的一切应用程序. ...
[git 学习篇] 创建公钥
http://riny.net/2014/git-ssh-key/ 1 安装 windows gitbash msysgit是Windows版的Git,从https://git-for-wind ...
[错误解决]刚拿到的服务器vim退格键(backspace)失灵
刚拿到的服务器vim退格键(backspace)失灵: 解决方案: 在主目录下建立.vimrc 覆盖/etc/vimrc的配置 .vimrc 与 /etc/vimrc的区别: 在启动的时候vim会读取 ...
DS博客作业-05--树
1.本周学习总结 1.1思维导图 1.2学习体会 1.课堂上的知识也很难听懂,打代码就更难听懂了,真的需要不断练习代码. 2.在学习本章的内容中,一开始只是理解了概念,在真正做题中,一点思路都没有 ...
ruby操作mysql
require "win32ole" require 'pathname' require 'mysql2' excel = WIN32OLE.new('excel.applica ...