python多种格式数据加载、处理与存储
多种格式数据加载、处理与存储
实际的场景中,我们会在不同的地方遇到各种不同的数据格式(比如大家熟悉的csv与txt,比如网页HTML格式,比如XML格式),我们来一起看看python如何和这些格式的数据打交道。
2016-08
from __future__ import division
from numpy.random import randn
import numpy as np
import os
import sys
import matplotlib.pyplot as plt
np.random.seed(12345)
plt.rc('figure', figsize=(10, 6))
from pandas import Series, DataFrame
import pandas as pd
np.set_printoptions(precision=4)
%pwd
u'/Users/zzy/GitHub/jupter-project/lesson_4_ipython_notebooks/different_data_formats'
1.各式各样的文本数据
1.1 CSV与TXT读取
!cat data1.csv
a,b,c,d,message
1,2,3,4,hello
5,6,7,8,world
9,10,11,12,foo
df = pd.read_csv('data1.csv')
df
| a | b | c | d | message | |
|---|---|---|---|---|---|
| 0 | 1 | 2 | 3 | 4 | hello |
| 1 | 5 | 6 | 7 | 8 | world |
| 2 | 9 | 10 | 11 | 12 | foo |
pd.read_table('data1.csv', sep=',')
| a | b | c | d | message | |
|---|---|---|---|---|---|
| 0 | 1 | 2 | 3 | 4 | hello |
| 1 | 5 | 6 | 7 | 8 | world |
| 2 | 9 | 10 | 11 | 12 | foo |
!cat data2.csv
1,2,3,4,hello
5,6,7,8,world
9,10,11,12,foo
pd.read_csv('data2.csv', header=None)
| 0 | 1 | 2 | 3 | 4 | |
|---|---|---|---|---|---|
| 0 | 1 | 2 | 3 | 4 | hello |
| 1 | 5 | 6 | 7 | 8 | world |
| 2 | 9 | 10 | 11 | 12 | foo |
pd.read_csv('data2.csv', names=['a', 'b', 'c', 'd', 'message'])
| a | b | c | d | message | |
|---|---|---|---|---|---|
| 0 | 1 | 2 | 3 | 4 | hello |
| 1 | 5 | 6 | 7 | 8 | world |
| 2 | 9 | 10 | 11 | 12 | foo |
names = ['a', 'b', 'c', 'd', 'message']
pd.read_csv('data2.csv', names=names, index_col='message')
| a | b | c | d | |
|---|---|---|---|---|
| message | ||||
| hello | 1 | 2 | 3 | 4 |
| world | 5 | 6 | 7 | 8 |
| foo | 9 | 10 | 11 | 12 |
!cat csv_mindex.csv
parsed = pd.read_csv('csv_mindex.csv', index_col=['key1', 'key2'])
parsed
key1,key2,value1,value2
one,a,1,2
one,b,3,4
one,c,5,6
one,d,7,8
two,a,9,10
two,b,11,12
two,c,13,14
two,d,15,16
| value1 | value2 | ||
|---|---|---|---|
| key1 | key2 | ||
| one | a | 1 | 2 |
| b | 3 | 4 | |
| c | 5 | 6 | |
| d | 7 | 8 | |
| two | a | 9 | 10 |
| b | 11 | 12 | |
| c | 13 | 14 | |
| d | 15 | 16 |
list(open('data3.txt'))
[' A B C\n',
'aaa -0.264438 -1.026059 -0.619500\n',
'bbb 0.927272 0.302904 -0.032399\n',
'ccc -0.264273 -0.386314 -0.217601\n',
'ddd -0.871858 -0.348382 1.100491\n']
result = pd.read_table('data3.txt', sep='\s+')
result
| A | B | C | |
|---|---|---|---|
| aaa | -0.264438 | -1.026059 | -0.619500 |
| bbb | 0.927272 | 0.302904 | -0.032399 |
| ccc | -0.264273 | -0.386314 | -0.217601 |
| ddd | -0.871858 | -0.348382 | 1.100491 |
!cat data4.csv
pd.read_csv('data4.csv', skiprows=[0, 2, 3])
# hey!
a,b,c,d,message
# just wanted to make things more difficult for you
# who reads CSV files with computers, anyway?
1,2,3,4,hello
5,6,7,8,world
9,10,11,12,foo
| a | b | c | d | message | |
|---|---|---|---|---|---|
| 0 | 1 | 2 | 3 | 4 | hello |
| 1 | 5 | 6 | 7 | 8 | world |
| 2 | 9 | 10 | 11 | 12 | foo |
!cat data5.csv
result = pd.read_csv('data5.csv')
result
pd.isnull(result)
something,a,b,c,d,message
one,1,2,3,4,NA
two,5,6,,8,world
three,9,10,11,12,foo
| something | a | b | c | d | message | |
|---|---|---|---|---|---|---|
| 0 | False | False | False | False | False | True |
| 1 | False | False | False | True | False | False |
| 2 | False | False | False | False | False | False |
result = pd.read_csv('data5.csv', na_values=['NULL'])
result
| something | a | b | c | d | message | |
|---|---|---|---|---|---|---|
| 0 | one | 1 | 2 | 3.0 | 4 | NaN |
| 1 | two | 5 | 6 | NaN | 8 | world |
| 2 | three | 9 | 10 | 11.0 | 12 | foo |
sentinels = {'message': ['foo', 'NA'], 'something': ['two']}
pd.read_csv('data5.csv', na_values=sentinels)
| something | a | b | c | d | message | |
|---|---|---|---|---|---|---|
| 0 | one | 1 | 2 | 3.0 | 4 | NaN |
| 1 | NaN | 5 | 6 | NaN | 8 | world |
| 2 | three | 9 | 10 | 11.0 | 12 | NaN |
1.2 分片/块读取文本数据
result = pd.read_csv('data6.csv')
result
| one | two | three | four | key | |
|---|---|---|---|---|---|
| 0 | 0.467976 | -0.038649 | -0.295344 | -1.824726 | L |
| 1 | -0.358893 | 1.404453 | 0.704965 | -0.200638 | B |
| 2 | -0.501840 | 0.659254 | -0.421691 | -0.057688 | G |
| 3 | 0.204886 | 1.074134 | 1.388361 | -0.982404 | R |
| 4 | 0.354628 | -0.133116 | 0.283763 | -0.837063 | Q |
| 5 | 1.817480 | 0.742273 | 0.419395 | -2.251035 | Q |
| 6 | -0.776764 | 0.935518 | -0.332872 | -1.875641 | U |
| 7 | -0.913135 | 1.530624 | -0.572657 | 0.477252 | K |
| 8 | 0.358480 | -0.497572 | -0.367016 | 0.507702 | S |
| 9 | -1.740877 | -1.160417 | -1.637830 | 2.172201 | G |
| 10 | 0.240564 | -0.328249 | 1.252155 | 1.072796 | 8 |
| 11 | 0.764018 | 1.165476 | -0.639544 | 1.495258 | R |
| 12 | 0.571035 | -0.310537 | 0.582437 | -0.298765 | 1 |
| 13 | 2.317658 | 0.430710 | -1.334216 | 0.199679 | P |
| 14 | 1.547771 | -1.119753 | -2.277634 | 0.329586 | J |
| 15 | -1.310608 | 0.401719 | -1.000987 | 1.156708 | E |
| 16 | -0.088496 | 0.634712 | 0.153324 | 0.415335 | B |
| 17 | -0.018663 | -0.247487 | -1.446522 | 0.750938 | A |
| 18 | -0.070127 | -1.579097 | 0.120892 | 0.671432 | F |
| 19 | -0.194678 | -0.492039 | 2.359605 | 0.319810 | H |
| 20 | -0.248618 | 0.868707 | -0.492226 | -0.717959 | W |
| 21 | -1.091549 | -0.867110 | -0.647760 | -0.832562 | C |
| 22 | 0.641404 | -0.138822 | -0.621963 | -0.284839 | C |
| 23 | 1.216408 | 0.992687 | 0.165162 | -0.069619 | V |
| 24 | -0.564474 | 0.792832 | 0.747053 | 0.571675 | I |
| 25 | 1.759879 | -0.515666 | -0.230481 | 1.362317 | S |
| 26 | 0.126266 | 0.309281 | 0.382820 | -0.239199 | L |
| 27 | 1.334360 | -0.100152 | -0.840731 | -0.643967 | 6 |
| 28 | -0.737620 | 0.278087 | -0.053235 | -0.950972 | J |
| 29 | -1.148486 | -0.986292 | -0.144963 | 0.124362 | Y |
| ... | ... | ... | ... | ... | ... |
| 9970 | 0.633495 | -0.186524 | 0.927627 | 0.143164 | 4 |
| 9971 | 0.308636 | -0.112857 | 0.762842 | -1.072977 | 1 |
| 9972 | -1.627051 | -0.978151 | 0.154745 | -1.229037 | Z |
| 9973 | 0.314847 | 0.097989 | 0.199608 | 0.955193 | P |
| 9974 | 1.666907 | 0.992005 | 0.496128 | -0.686391 | S |
| 9975 | 0.010603 | 0.708540 | -1.258711 | 0.226541 | K |
| 9976 | 0.118693 | -0.714455 | -0.501342 | -0.254764 | K |
| 9977 | 0.302616 | -2.011527 | -0.628085 | 0.768827 | H |
| 9978 | -0.098572 | 1.769086 | -0.215027 | -0.053076 | A |
| 9979 | -0.019058 | 1.964994 | 0.738538 | -0.883776 | F |
| 9980 | -0.595349 | 0.001781 | -1.423355 | -1.458477 | M |
| 9981 | 1.392170 | -1.396560 | -1.425306 | -0.847535 | H |
| 9982 | -0.896029 | -0.152287 | 1.924483 | 0.365184 | 6 |
| 9983 | -2.274642 | -0.901874 | 1.500352 | 0.996541 | N |
| 9984 | -0.301898 | 1.019906 | 1.102160 | 2.624526 | I |
| 9985 | -2.548389 | -0.585374 | 1.496201 | -0.718815 | D |
| 9986 | -0.064588 | 0.759292 | -1.568415 | -0.420933 | E |
| 9987 | -0.143365 | -1.111760 | -1.815581 | 0.435274 | 2 |
| 9988 | -0.070412 | -1.055921 | 0.338017 | -0.440763 | X |
| 9989 | 0.649148 | 0.994273 | -1.384227 | 0.485120 | Q |
| 9990 | -0.370769 | 0.404356 | -1.051628 | -1.050899 | 8 |
| 9991 | -0.409980 | 0.155627 | -0.818990 | 1.277350 | W |
| 9992 | 0.301214 | -1.111203 | 0.668258 | 0.671922 | A |
| 9993 | 1.821117 | 0.416445 | 0.173874 | 0.505118 | X |
| 9994 | 0.068804 | 1.322759 | 0.802346 | 0.223618 | H |
| 9995 | 2.311896 | -0.417070 | -1.409599 | -0.515821 | L |
| 9996 | -0.479893 | -0.650419 | 0.745152 | -0.646038 | E |
| 9997 | 0.523331 | 0.787112 | 0.486066 | 1.093156 | K |
| 9998 | -0.362559 | 0.598894 | -1.843201 | 0.887292 | G |
| 9999 | -0.096376 | -1.012999 | -0.657431 | -0.573315 | 0 |
10000 rows × 5 columns
pd.read_csv('data6.csv', nrows=5)
| one | two | three | four | key | |
|---|---|---|---|---|---|
| 0 | 0.467976 | -0.038649 | -0.295344 | -1.824726 | L |
| 1 | -0.358893 | 1.404453 | 0.704965 | -0.200638 | B |
| 2 | -0.501840 | 0.659254 | -0.421691 | -0.057688 | G |
| 3 | 0.204886 | 1.074134 | 1.388361 | -0.982404 | R |
| 4 | 0.354628 | -0.133116 | 0.283763 | -0.837063 | Q |
chunker = pd.read_csv('data6.csv', chunksize=100)
chunker
<pandas.io.parsers.TextFileReader at 0x10d3b5950>
chunker = pd.read_csv('data6.csv', chunksize=100)
tot = Series([])
for piece in chunker:
tot = tot.add(piece['key'].value_counts(), fill_value=0)
tot = tot.order(ascending=False)
/Library/Python/2.7/site-packages/ipykernel/__main__.py:7: FutureWarning: order is deprecated, use sort_values(...)
tot[:10]
E 368.0
X 364.0
L 346.0
O 343.0
Q 340.0
M 338.0
J 337.0
F 335.0
K 334.0
H 330.0
dtype: float64
1.3 把数据写入文本格式
data = pd.read_csv('data5.csv')
data
| something | a | b | c | d | message | |
|---|---|---|---|---|---|---|
| 0 | one | 1 | 2 | 3.0 | 4 | NaN |
| 1 | two | 5 | 6 | NaN | 8 | world |
| 2 | three | 9 | 10 | 11.0 | 12 | foo |
data.to_csv('out.csv')
!cat out.csv
,something,a,b,c,d,message
0,one,1,2,3.0,4,
1,two,5,6,,8,world
2,three,9,10,11.0,12,foo
data.to_csv(sys.stdout, sep='|')
|something|a|b|c|d|message
0|one|1|2|3.0|4|
1|two|5|6||8|world
2|three|9|10|11.0|12|foo
data.to_csv(sys.stdout, na_rep='NULL')
,something,a,b,c,d,message
0,one,1,2,3.0,4,NULL
1,two,5,6,NULL,8,world
2,three,9,10,11.0,12,foo
data.to_csv(sys.stdout, index=False, header=False)
one,1,2,3.0,4,
two,5,6,,8,world
three,9,10,11.0,12,foo
data.to_csv(sys.stdout, index=False, columns=['a', 'b', 'c'])
a,b,c
1,2,3.0
5,6,
9,10,11.0
dates = pd.date_range('1/1/2000', periods=7)
ts = Series(np.arange(7), index=dates)
ts.to_csv('tseries.csv')
!cat tseries.csv
2000-01-01,0
2000-01-02,1
2000-01-03,2
2000-01-04,3
2000-01-05,4
2000-01-06,5
2000-01-07,6
Series.from_csv('tseries.csv', parse_dates=True)
2000-01-01 0
2000-01-02 1
2000-01-03 2
2000-01-04 3
2000-01-05 4
2000-01-06 5
2000-01-07 6
dtype: int64
1.4 手动读写数据(按要求)
!cat data7.csv
"a","b","c"
"1","2","3"
"1","2","3","4"
import csv
f = open('data7.csv')
reader = csv.reader(f)
for line in reader:
print(line)
['a', 'b', 'c']
['1', '2', '3']
['1', '2', '3', '4']
lines = list(csv.reader(open('data7.csv')))
header, values = lines[0], lines[1:]
data_dict = {h: v for h, v in zip(header, zip(*values))}
data_dict
{'a': ('1', '1'), 'b': ('2', '2'), 'c': ('3', '3')}
class my_dialect(csv.Dialect):
lineterminator = '\n'
delimiter = ';'
quotechar = '"'
quoting = csv.QUOTE_MINIMAL
with open('mydata.csv', 'w') as f:
writer = csv.writer(f, dialect=my_dialect)
writer.writerow(('one', 'two', 'three'))
writer.writerow(('1', '2', '3'))
writer.writerow(('4', '5', '6'))
writer.writerow(('7', '8', '9'))
%cat mydata.csv
one;two;three
1;2;3
4;5;6
7;8;9
1.5 JSON格式的数据
obj = \
"""
{"姓名": "张三",
"住处": ["天朝", "挖煤国", "万恶的资本主义日不落帝国"],
"宠物": null,
"兄弟": [{"姓名": "李四", "年龄": 25, "宠物": "汪星人"},
{"姓名": "王五", "年龄": 23, "宠物": "喵星人"}]
}
"""
import json
result = json.loads(obj)
result
{u'\u4f4f\u5904': [u'\u5929\u671d',
u'\u6316\u7164\u56fd',
u'\u4e07\u6076\u7684\u8d44\u672c\u4e3b\u4e49\u65e5\u4e0d\u843d\u5e1d\u56fd'],
u'\u5144\u5f1f': [{u'\u59d3\u540d': u'\u674e\u56db',
u'\u5ba0\u7269': u'\u6c6a\u661f\u4eba',
u'\u5e74\u9f84': 25},
{u'\u59d3\u540d': u'\u738b\u4e94',
u'\u5ba0\u7269': u'\u55b5\u661f\u4eba',
u'\u5e74\u9f84': 23}],
u'\u59d3\u540d': u'\u5f20\u4e09',
u'\u5ba0\u7269': None}
print json.dumps(result, encoding="UTF-8", ensure_ascii=False)
{"兄弟": [{"年龄": 25, "宠物": "汪星人", "姓名": "李四"}, {"年龄": 23, "宠物": "喵星人", "姓名": "王五"}], "住处": ["天朝", "挖煤国", "万恶的资本主义日不落帝国"], "宠物": null, "姓名": "张三"}
result[u"兄弟"][0]
{u'\u59d3\u540d': u'\u674e\u56db',
u'\u5ba0\u7269': u'\u6c6a\u661f\u4eba',
u'\u5e74\u9f84': 25}
print json.dumps(result[u"兄弟"][0], encoding="UTF-8", ensure_ascii=False)
{"年龄": 25, "宠物": "汪星人", "姓名": "李四"}
asjson = json.dumps(result)
brothers = DataFrame(result[u'兄弟'], columns=[u'姓名', u'年龄'])
brothers
| 姓名 | 年龄 | |
|---|---|---|
| 0 | 李四 | 25 |
| 1 | 王五 | 23 |
1.6 人人都爱爬虫,人人都要解析XML 和 HTML
from lxml.html import parse
from urllib2 import urlopen
parsed = parse(urlopen('https://ask.julyedu.com/'))
doc = parsed.getroot()
doc
<Element html at 0x1092ed100>
links = doc.findall('.//a')
links[15:20]
[<Element a at 0x1091afcb0>,
<Element a at 0x1091afd08>,
<Element a at 0x1091afd60>,
<Element a at 0x1091afdb8>,
<Element a at 0x1091afe10>]
lnk = links[14]
lnk
lnk.get('href')
print lnk.text_content()
全部问题
urls = [lnk.get('href') for lnk in doc.findall('.//a')]
urls[-10:]
['https://ask.julyedu.com/people/July',
'https://ask.julyedu.com/people/July',
'http://weibo.com/askjulyedu',
None,
'https://www.julyedu.com/help/index/about',
'https://www.julyedu.com/help/index/contact',
'https://www.julyedu.com/help/index/join',
'http://ask.julyedu.com/question/55',
'http://www.julyapp.com',
'http://www.miitbeian.gov.cn/']
spans = doc.findall('.//span')
len(spans)
137
def _unpack(spans):
return [val.text_content() for val in spans]
contents = _unpack(spans)
for content in contents:
print content
通知设置
贡献
回复了问题 • 2 人关注 • 1 个回复 • 22 次浏览 • 2 小时前
• 来自相关主题
发起了问题 • 2 人关注 • 0 个回复 • 21 次浏览 • 22 小时前
• 来自相关主题
发起了问题 • 1 人关注 • 0 个回复 • 17 次浏览 • 1 天前
• 来自相关主题
贡献
回复了问题 • 3 人关注 • 1 个回复 • 38 次浏览 • 1 天前
• 来自相关主题
发起了问题 • 1 人关注 • 0 个回复 • 35 次浏览 • 1 天前
• 来自相关主题
发起了问题 • 1 人关注 • 0 个回复 • 29 次浏览 • 2 天前
• 来自相关主题
发起了问题 • 1 人关注 • 0 个回复 • 31 次浏览 • 2 天前
• 来自相关主题
贡献
回复了问题 • 37 人关注 • 36 个回复 • 484 次浏览 • 1 天前
• 来自相关主题
贡献
回复了问题 • 12 人关注 • 17 个回复 • 795 次浏览 • 2 天前
• 来自相关主题
贡献
回复了问题 • 4 人关注 • 4 个回复 • 197 次浏览 • 3 天前
• 来自相关主题
贡献
回复了问题 • 2 人关注 • 1 个回复 • 44 次浏览 • 3 天前
• 来自相关主题
贡献
回复了问题 • 2 人关注 • 1 个回复 • 34 次浏览 • 3 天前
• 来自相关主题
贡献
回复了问题 • 2 人关注 • 1 个回复 • 37 次浏览 • 3 天前
• 来自相关主题
贡献
回复了问题 • 2 人关注 • 1 个回复 • 102 次浏览 • 3 天前
• 来自相关主题
贡献
回复了问题 • 2 人关注 • 1 个回复 • 39 次浏览 • 4 天前
• 来自相关主题
发起了问题 • 1 人关注 • 0 个回复 • 26 次浏览 • 4 天前
• 来自相关主题
贡献
回复了问题 • 3 人关注 • 2 个回复 • 126 次浏览 • 4 天前
• 来自相关主题
发表了文章 • 1 个评论 • 764 次浏览 • 4 天前
• 来自相关主题
发起了问题 • 1 人关注 • 0 个回复 • 49 次浏览 • 5 天前
• 来自相关主题
发起了问题 • 1 人关注 • 0 个回复 • 35 次浏览 • 5 天前
• 来自相关主题
发起了问题 • 1 人关注 • 0 个回复 • 59 次浏览 • 6 天前
• 来自相关主题
发起了问题 • 1 人关注 • 0 个回复 • 42 次浏览 • 6 天前
• 来自相关主题
贡献
回复了问题 • 2 人关注 • 1 个回复 • 55 次浏览 • 6 天前
• 来自相关主题
贡献
回复了问题 • 2 人关注 • 1 个回复 • 44 次浏览 • 6 天前
• 来自相关主题
发起了问题 • 1 人关注 • 0 个回复 • 42 次浏览 • 2016-08-28 16:12
• 来自相关主题
发起了问题 • 1 人关注 • 0 个回复 • 42 次浏览 • 2016-08-28 15:47
• 来自相关主题
贡献
回复了问题 • 4 人关注 • 1 个回复 • 530 次浏览 • 2016-08-28 14:48
• 来自相关主题
贡献
回复了问题 • 2 人关注 • 3 个回复 • 911 次浏览 • 2016-08-27 19:18
• 来自相关主题
贡献
回复了问题 • 39 人关注 • 82 个回复 • 3670 次浏览 • 2016-08-27 11:59
• 来自相关主题
贡献
回复了问题 • 4 人关注 • 6 个回复 • 639 次浏览 • 2016-08-28 14:32
• 来自相关主题
发起了问题 • 1 人关注 • 0 个回复 • 77 次浏览 • 2016-08-25 14:13
• 来自相关主题
贡献
回复了问题 • 2 人关注 • 1 个回复 • 274 次浏览 • 2016-08-25 10:37
• 来自相关主题
回复了问题 • 1 人关注 • 1 个回复 • 48 次浏览 • 2016-08-24 17:20
• 来自相关主题
贡献
回复了问题 • 11 人关注 • 14 个回复 • 1052 次浏览 • 2016-08-24 10:01
• 来自相关主题
发起了问题 • 2 人关注 • 0 个回复 • 76 次浏览 • 2016-08-23 15:16
• 来自相关主题
贡献
回复了问题 • 2 人关注 • 1 个回复 • 62 次浏览 • 2016-08-22 21:51
• 来自相关主题
贡献
回复了问题 • 2 人关注 • 2 个回复 • 584 次浏览 • 2016-08-22 12:26
• 来自相关主题
回复了问题 • 1 人关注 • 1 个回复 • 78 次浏览 • 2016-08-21 18:57
• 来自相关主题
贡献
回复了问题 • 2 人关注 • 1 个回复 • 62 次浏览 • 2016-08-21 18:25
• 来自相关主题
贡献
回复了问题 • 2 人关注 • 1 个回复 • 105 次浏览 • 2016-08-21 15:17
• 来自相关主题
贡献
回复了问题 • 3 人关注 • 2 个回复 • 749 次浏览 • 2016-08-21 11:16
• 来自相关主题
贡献
回复了问题 • 5 人关注 • 4 个回复 • 434 次浏览 • 2016-08-19 16:07
• 来自相关主题
贡献
回复了问题 • 2 人关注 • 1 个回复 • 69 次浏览 • 2016-08-19 15:55
• 来自相关主题
回复了问题 • 1 人关注 • 1 个回复 • 108 次浏览 • 2016-08-19 14:48
• 来自相关主题
回复了问题 • 2 人关注 • 1 个回复 • 168 次浏览 • 2016-08-19 14:54
• 来自相关主题
发起了问题 • 1 人关注 • 0 个回复 • 79 次浏览 • 2016-08-19 10:59
• 来自相关主题
发起了问题 • 1 人关注 • 0 个回复 • 96 次浏览 • 2016-08-18 19:04
• 来自相关主题
贡献
回复了问题 • 9 人关注 • 5 个回复 • 864 次浏览 • 2016-08-18 18:41
• 来自相关主题
贡献
回复了问题 • 3 人关注 • 1 个回复 • 1005 次浏览 • 2016-08-18 14:01
• 来自相关主题
贡献
回复了问题 • 7 人关注 • 12 个回复 • 619 次浏览 • 2016-08-18 12:05
• 来自相关主题
python
leetcode
返回顶部
var cnzz_protocol = (("https:" == document.location.protocol) ? " https://" : " http://");document.write(unescape("%3Cspan id='cnzz_stat_icon_1259748782'%3E%3C/span%3E%3Cscript src='" + cnzz_protocol + "s11.cnzz.com/z_stat.php%3Fid%3D1259748782%26show%3Dpic' type='text/javascript'%3E%3C/script%3E"));
questions = doc.findall('.//h4')
len(questions)
50
contents = _unpack(questions)
for content in contents:
print content
@寒老师,CTR课程中的LR-MLlib的代码哪里下载?
逻辑回归如何做特征选择?
opencv在vs2015配置失败,有朋友能给一些可用的教程吗
数据集具有倾斜性或呈长尾分布问题
端到端深度学习在自动驾驶汽车上的应用
CV班:转发微博狂送100元上课券,所有人 人人都有份
关于word2vec的一些资料
python数据分析班作业1
第8类新班:《计算机视觉班》大纲讨论稿
有奖竞答第二题(IBM ponder this 专题):国王、硬币与电子秤的故事
一个数学问题
关于推荐系统基于内容的推荐中使用TF-IDF建模词对资料重要度问题
python数据分析班-安装igraph后运行报错
anaconda程序组中的各个组件都是做什么用的
wide&deep learning问题
SMO算法中关于b值的求解
caffe车辆检测
2017年各大互联网公司校招日程统计
应用班第8次课 图像检索作业
大事记:官网上线自动支付、七月在线APP全面开放下载
趋势科技南京研发中心招聘机器学习、数据挖掘人才
CNN 在文本中的应用
关于特征工程的一些问题
【DNN】为什么神经网络需要激励函数
关于python的一个问题
conda和pip install各种package有什么区别
《python数据分析班》环境安装
16年8月内推公司名单:VIPKID(算法负责人)、溢思得瑞(数据科学家)、洋钱罐(数据挖掘)
一起来吐槽——对官网跟问答社区,有任何反馈或意见?
关于EM算法的一些疑问
deep learning做文本分类
关于特征工程的三个问题
python 3 读取数据编码报错 跪求解决
有奖竞答第一题:LR和线性回归的区别和联系有哪些?
创新工场涂鸦移动团队2017校园招聘
python 安装问题谁能帮一下忙?
PCA本质理解问题
用softmax做输出层碰到的问题
最近做图像花图或者屏幕花屏检测,请问特征提取这块有没有什么思路或者方法
【内推】【美团点评】2017届校招内推开始了
Scince上发表的聚类算法,C++编程实现,有一个bug不知道怎么解决!!跪求大神!!!
求e^x ,为什么不直接使用泰勒展开式?
自动聊天机器人
一般完整机器学习项目的工作流程
如何入门机器学习
搭建一个自动驾驶车模
“Google TensorFlow 深度学习笔记”
海量数据处理问题
关于 弱监督学习
《Python数据分析班》,已超过400人火爆报名
1.7 解析XML
!head -21 Performance_MNR.xml
from lxml import objectify
path = 'Performance_MNR.xml'
parsed = objectify.parse(open(path))
root = parsed.getroot()
data = []
skip_fields = ['PARENT_SEQ', 'INDICATOR_SEQ',
'DESIRED_CHANGE', 'DECIMAL_PLACES']
for elt in root.INDICATOR:
el_data = {}
for child in elt.getchildren():
if child.tag in skip_fields:
continue
el_data[child.tag] = child.pyval
data.append(el_data)
perf = DataFrame(data)
perf
| AGENCY_NAME | CATEGORY | DESCRIPTION | FREQUENCY | INDICATOR_NAME | INDICATOR_UNIT | MONTHLY_ACTUAL | MONTHLY_TARGET | PERIOD_MONTH | PERIOD_YEAR | YTD_ACTUAL | YTD_TARGET | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Metro-North Railroad | Service Indicators | Percent of commuter trains that arrive at thei... | M | On-Time Performance (West of Hudson) | % | 96.9 | 95 | 1 | 2008 | 96.9 | 95 |
| 1 | Metro-North Railroad | Service Indicators | Percent of commuter trains that arrive at thei... | M | On-Time Performance (West of Hudson) | % | 95 | 95 | 2 | 2008 | 96 | 95 |
| 2 | Metro-North Railroad | Service Indicators | Percent of commuter trains that arrive at thei... | M | On-Time Performance (West of Hudson) | % | 96.9 | 95 | 3 | 2008 | 96.3 | 95 |
| 3 | Metro-North Railroad | Service Indicators | Percent of commuter trains that arrive at thei... | M | On-Time Performance (West of Hudson) | % | 98.3 | 95 | 4 | 2008 | 96.8 | 95 |
| 4 | Metro-North Railroad | Service Indicators | Percent of commuter trains that arrive at thei... | M | On-Time Performance (West of Hudson) | % | 95.8 | 95 | 5 | 2008 | 96.6 | 95 |
| 5 | Metro-North Railroad | Service Indicators | Percent of commuter trains that arrive at thei... | M | On-Time Performance (West of Hudson) | % | 94.4 | 95 | 6 | 2008 | 96.2 | 95 |
| 6 | Metro-North Railroad | Service Indicators | Percent of commuter trains that arrive at thei... | M | On-Time Performance (West of Hudson) | % | 96 | 95 | 7 | 2008 | 96.2 | 95 |
| 7 | Metro-North Railroad | Service Indicators | Percent of commuter trains that arrive at thei... | M | On-Time Performance (West of Hudson) | % | 96.4 | 95 | 8 | 2008 | 96.2 | 95 |
| 8 | Metro-North Railroad | Service Indicators | Percent of commuter trains that arrive at thei... | M | On-Time Performance (West of Hudson) | % | 93.7 | 95 | 9 | 2008 | 95.9 | 95 |
| 9 | Metro-North Railroad | Service Indicators | Percent of commuter trains that arrive at thei... | M | On-Time Performance (West of Hudson) | % | 96.4 | 95 | 10 | 2008 | 96 | 95 |
| 10 | Metro-North Railroad | Service Indicators | Percent of commuter trains that arrive at thei... | M | On-Time Performance (West of Hudson) | % | 96.9 | 95 | 11 | 2008 | 96.1 | 95 |
| 11 | Metro-North Railroad | Service Indicators | Percent of commuter trains that arrive at thei... | M | On-Time Performance (West of Hudson) | % | 95.1 | 95 | 12 | 2008 | 96 | 95 |
| 12 | Metro-North Railroad | Service Indicators | Percent of commuter trains that arrive at thei... | M | On-Time Performance (West of Hudson) | % | 92.6 | 96.2 | 1 | 2009 | 92.6 | 96.2 |
| 13 | Metro-North Railroad | Service Indicators | Percent of commuter trains that arrive at thei... | M | On-Time Performance (West of Hudson) | % | 96.8 | 96.2 | 2 | 2009 | 94.6 | 96.2 |
| 14 | Metro-North Railroad | Service Indicators | Percent of commuter trains that arrive at thei... | M | On-Time Performance (West of Hudson) | % | 96.9 | 96.2 | 3 | 2009 | 95.4 | 96.2 |
| 15 | Metro-North Railroad | Service Indicators | Percent of commuter trains that arrive at thei... | M | On-Time Performance (West of Hudson) | % | 97.1 | 96.2 | 4 | 2009 | 95.9 | 96.2 |
| 16 | Metro-North Railroad | Service Indicators | Percent of commuter trains that arrive at thei... | M | On-Time Performance (West of Hudson) | % | 97.8 | 96.2 | 5 | 2009 | 96.2 | 96.2 |
| 17 | Metro-North Railroad | Service Indicators | Percent of commuter trains that arrive at thei... | M | On-Time Performance (West of Hudson) | % | 97.3 | 96.2 | 6 | 2009 | 96.4 | 96.2 |
| 18 | Metro-North Railroad | Service Indicators | Percent of commuter trains that arrive at thei... | M | On-Time Performance (West of Hudson) | % | 96.7 | 96.2 | 7 | 2009 | 96.5 | 96.2 |
| 19 | Metro-North Railroad | Service Indicators | Percent of commuter trains that arrive at thei... | M | On-Time Performance (West of Hudson) | % | 95.7 | 96.2 | 8 | 2009 | 96.4 | 96.2 |
| 20 | Metro-North Railroad | Service Indicators | Percent of commuter trains that arrive at thei... | M | On-Time Performance (West of Hudson) | % | 96.1 | 96.2 | 9 | 2009 | 96.3 | 96.2 |
| 21 | Metro-North Railroad | Service Indicators | Percent of commuter trains that arrive at thei... | M | On-Time Performance (West of Hudson) | % | 94.8 | 96.2 | 10 | 2009 | 96.2 | 96.2 |
| 22 | Metro-North Railroad | Service Indicators | Percent of commuter trains that arrive at thei... | M | On-Time Performance (West of Hudson) | % | 95.7 | 96.2 | 11 | 2009 | 96.1 | 96.2 |
| 23 | Metro-North Railroad | Service Indicators | Percent of commuter trains that arrive at thei... | M | On-Time Performance (West of Hudson) | % | 95 | 96.2 | 12 | 2009 | 96 | 96.2 |
| 24 | Metro-North Railroad | Service Indicators | Percent of commuter trains that arrive at thei... | M | On-Time Performance (West of Hudson) | % | 98 | 96.3 | 1 | 2010 | 98 | 96.3 |
| 25 | Metro-North Railroad | Service Indicators | Percent of commuter trains that arrive at thei... | M | On-Time Performance (West of Hudson) | % | 93 | 96.3 | 2 | 2010 | 95.6 | 96.3 |
| 26 | Metro-North Railroad | Service Indicators | Percent of commuter trains that arrive at thei... | M | On-Time Performance (West of Hudson) | % | 96.9 | 96.3 | 3 | 2010 | 96.1 | 96.3 |
| 27 | Metro-North Railroad | Service Indicators | Percent of commuter trains that arrive at thei... | M | On-Time Performance (West of Hudson) | % | 98.1 | 96.3 | 4 | 2010 | 96.6 | 96.3 |
| 28 | Metro-North Railroad | Service Indicators | Percent of commuter trains that arrive at thei... | M | On-Time Performance (West of Hudson) | % | 97.6 | 96.3 | 5 | 2010 | 96.8 | 96.3 |
| 29 | Metro-North Railroad | Service Indicators | Percent of commuter trains that arrive at thei... | M | On-Time Performance (West of Hudson) | % | 97.4 | 96.3 | 6 | 2010 | 96.9 | 96.3 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 618 | Metro-North Railroad | Service Indicators | Percent of the time that escalators are operat... | M | Escalator Availability | % | 94 | 7 | 2009 | 95.14 | ||
| 619 | Metro-North Railroad | Service Indicators | Percent of the time that escalators are operat... | M | Escalator Availability | % | 97 | 8 | 2009 | 95.38 | ||
| 620 | Metro-North Railroad | Service Indicators | Percent of the time that escalators are operat... | M | Escalator Availability | % | 98.3 | 9 | 2009 | 95.7 | ||
| 621 | Metro-North Railroad | Service Indicators | Percent of the time that escalators are operat... | M | Escalator Availability | % | 98.7 | 10 | 2009 | 96 | ||
| 622 | Metro-North Railroad | Service Indicators | Percent of the time that escalators are operat... | M | Escalator Availability | % | 98.1 | 11 | 2009 | 96.21 | ||
| 623 | Metro-North Railroad | Service Indicators | Percent of the time that escalators are operat... | M | Escalator Availability | % | 100 | 12 | 2009 | 96.5 | ||
| 624 | Metro-North Railroad | Service Indicators | Percent of the time that escalators are operat... | M | Escalator Availability | % | 97.95 | 97 | 1 | 2010 | 97.95 | 97 |
| 625 | Metro-North Railroad | Service Indicators | Percent of the time that escalators are operat... | M | Escalator Availability | % | 100 | 97 | 2 | 2010 | 98.92 | 97 |
| 626 | Metro-North Railroad | Service Indicators | Percent of the time that escalators are operat... | M | Escalator Availability | % | 100 | 97 | 3 | 2010 | 99.29 | 97 |
| 627 | Metro-North Railroad | Service Indicators | Percent of the time that escalators are operat... | M | Escalator Availability | % | 100 | 97 | 4 | 2010 | 99.47 | 97 |
| 628 | Metro-North Railroad | Service Indicators | Percent of the time that escalators are operat... | M | Escalator Availability | % | 100 | 97 | 5 | 2010 | 99.58 | 97 |
| 629 | Metro-North Railroad | Service Indicators | Percent of the time that escalators are operat... | M | Escalator Availability | % | 91.21 | 97 | 6 | 2010 | 98.19 | 97 |
| 630 | Metro-North Railroad | Service Indicators | Percent of the time that escalators are operat... | M | Escalator Availability | % | 100 | 97 | 7 | 2010 | 98.46 | 97 |
| 631 | Metro-North Railroad | Service Indicators | Percent of the time that escalators are operat... | M | Escalator Availability | % | 100 | 97 | 8 | 2010 | 98.69 | 97 |
| 632 | Metro-North Railroad | Service Indicators | Percent of the time that escalators are operat... | M | Escalator Availability | % | 95.2 | 97 | 9 | 2010 | 98.3 | 97 |
| 633 | Metro-North Railroad | Service Indicators | Percent of the time that escalators are operat... | M | Escalator Availability | % | 90.91 | 97 | 10 | 2010 | 97.55 | 97 |
| 634 | Metro-North Railroad | Service Indicators | Percent of the time that escalators are operat... | M | Escalator Availability | % | 96.67 | 97 | 11 | 2010 | 97.47 | 97 |
| 635 | Metro-North Railroad | Service Indicators | Percent of the time that escalators are operat... | M | Escalator Availability | % | 90.03 | 97 | 12 | 2010 | 96.84 | 97 |
| 636 | Metro-North Railroad | Service Indicators | Percent of the time that escalators are operat... | M | Escalator Availability | % | 100 | 97 | 1 | 2011 | 100 | 97 |
| 637 | Metro-North Railroad | Service Indicators | Percent of the time that escalators are operat... | M | Escalator Availability | % | 100 | 97 | 2 | 2011 | 100 | 97 |
| 638 | Metro-North Railroad | Service Indicators | Percent of the time that escalators are operat... | M | Escalator Availability | % | 97.07 | 97 | 3 | 2011 | 98.86 | 97 |
| 639 | Metro-North Railroad | Service Indicators | Percent of the time that escalators are operat... | M | Escalator Availability | % | 98.18 | 97 | 4 | 2011 | 98.76 | 97 |
| 640 | Metro-North Railroad | Service Indicators | Percent of the time that escalators are operat... | M | Escalator Availability | % | 79.18 | 97 | 5 | 2011 | 90.91 | 97 |
| 641 | Metro-North Railroad | Service Indicators | Percent of the time that escalators are operat... | M | Escalator Availability | % | 97 | 6 | 2011 | 97 | ||
| 642 | Metro-North Railroad | Service Indicators | Percent of the time that escalators are operat... | M | Escalator Availability | % | 97 | 7 | 2011 | 97 | ||
| 643 | Metro-North Railroad | Service Indicators | Percent of the time that escalators are operat... | M | Escalator Availability | % | 97 | 8 | 2011 | 97 | ||
| 644 | Metro-North Railroad | Service Indicators | Percent of the time that escalators are operat... | M | Escalator Availability | % | 97 | 9 | 2011 | 97 | ||
| 645 | Metro-North Railroad | Service Indicators | Percent of the time that escalators are operat... | M | Escalator Availability | % | 97 | 10 | 2011 | 97 | ||
| 646 | Metro-North Railroad | Service Indicators | Percent of the time that escalators are operat... | M | Escalator Availability | % | 97 | 11 | 2011 | 97 | ||
| 647 | Metro-North Railroad | Service Indicators | Percent of the time that escalators are operat... | M | Escalator Availability | % | 97 | 12 | 2011 | 97 |
648 rows × 12 columns
root
<Element PERFORMANCE at 0x108a0f290>
root.get('href')
root.text
二进制格式的数据
frame = pd.read_csv('data1.csv')
frame
frame.to_pickle('frame_pickle')
pd.read_pickle('frame_pickle')
| a | b | c | d | message | |
|---|---|---|---|---|---|
| 0 | 1 | 2 | 3 | 4 | hello |
| 1 | 5 | 6 | 7 | 8 | world |
| 2 | 9 | 10 | 11 | 12 | foo |
使用HDF5格式
store = pd.HDFStore('mydata.h5')
store['obj1'] = frame
store['obj1_col'] = frame['a']
store
<class 'pandas.io.pytables.HDFStore'>
File path: mydata.h5
/obj1 frame (shape->[3,5])
/obj1_col series (shape->[3])
store['obj1']
store.close()
os.remove('mydata.h5')
HTML与API交互
import requests
url = 'https://api.github.com/repos/pydata/pandas/milestones/28/labels'
resp = requests.get(url)
resp
<Response [200]>
data[:5]
[{'AGENCY_NAME': 'Metro-North Railroad',
'CATEGORY': 'Service Indicators',
'DESCRIPTION': 'Percent of commuter trains that arrive at their destinations within 5 minutes and 59 seconds of the scheduled time. West of Hudson services include the Pascack Valley and Port Jervis lines. Metro-North Railroad contracts with New Jersey Transit to operate service on these lines.\n',
'FREQUENCY': 'M',
'INDICATOR_NAME': 'On-Time Performance (West of Hudson)',
'INDICATOR_UNIT': '%',
'MONTHLY_ACTUAL': 96.9,
'MONTHLY_TARGET': 95.0,
'PERIOD_MONTH': 1,
'PERIOD_YEAR': 2008,
'YTD_ACTUAL': 96.9,
'YTD_TARGET': 95.0},
{'AGENCY_NAME': 'Metro-North Railroad',
'CATEGORY': 'Service Indicators',
'DESCRIPTION': 'Percent of commuter trains that arrive at their destinations within 5 minutes and 59 seconds of the scheduled time. West of Hudson services include the Pascack Valley and Port Jervis lines. Metro-North Railroad contracts with New Jersey Transit to operate service on these lines.\n',
'FREQUENCY': 'M',
'INDICATOR_NAME': 'On-Time Performance (West of Hudson)',
'INDICATOR_UNIT': '%',
'MONTHLY_ACTUAL': 95.0,
'MONTHLY_TARGET': 95.0,
'PERIOD_MONTH': 2,
'PERIOD_YEAR': 2008,
'YTD_ACTUAL': 96.0,
'YTD_TARGET': 95.0},
{'AGENCY_NAME': 'Metro-North Railroad',
'CATEGORY': 'Service Indicators',
'DESCRIPTION': 'Percent of commuter trains that arrive at their destinations within 5 minutes and 59 seconds of the scheduled time. West of Hudson services include the Pascack Valley and Port Jervis lines. Metro-North Railroad contracts with New Jersey Transit to operate service on these lines.\n',
'FREQUENCY': 'M',
'INDICATOR_NAME': 'On-Time Performance (West of Hudson)',
'INDICATOR_UNIT': '%',
'MONTHLY_ACTUAL': 96.9,
'MONTHLY_TARGET': 95.0,
'PERIOD_MONTH': 3,
'PERIOD_YEAR': 2008,
'YTD_ACTUAL': 96.3,
'YTD_TARGET': 95.0},
{'AGENCY_NAME': 'Metro-North Railroad',
'CATEGORY': 'Service Indicators',
'DESCRIPTION': 'Percent of commuter trains that arrive at their destinations within 5 minutes and 59 seconds of the scheduled time. West of Hudson services include the Pascack Valley and Port Jervis lines. Metro-North Railroad contracts with New Jersey Transit to operate service on these lines.\n',
'FREQUENCY': 'M',
'INDICATOR_NAME': 'On-Time Performance (West of Hudson)',
'INDICATOR_UNIT': '%',
'MONTHLY_ACTUAL': 98.3,
'MONTHLY_TARGET': 95.0,
'PERIOD_MONTH': 4,
'PERIOD_YEAR': 2008,
'YTD_ACTUAL': 96.8,
'YTD_TARGET': 95.0},
{'AGENCY_NAME': 'Metro-North Railroad',
'CATEGORY': 'Service Indicators',
'DESCRIPTION': 'Percent of commuter trains that arrive at their destinations within 5 minutes and 59 seconds of the scheduled time. West of Hudson services include the Pascack Valley and Port Jervis lines. Metro-North Railroad contracts with New Jersey Transit to operate service on these lines.\n',
'FREQUENCY': 'M',
'INDICATOR_NAME': 'On-Time Performance (West of Hudson)',
'INDICATOR_UNIT': '%',
'MONTHLY_ACTUAL': 95.8,
'MONTHLY_TARGET': 95.0,
'PERIOD_MONTH': 5,
'PERIOD_YEAR': 2008,
'YTD_ACTUAL': 96.6,
'YTD_TARGET': 95.0}]
issue_labels = DataFrame(data)
issue_labels
| AGENCY_NAME | CATEGORY | DESCRIPTION | FREQUENCY | INDICATOR_NAME | INDICATOR_UNIT | MONTHLY_ACTUAL | MONTHLY_TARGET | PERIOD_MONTH | PERIOD_YEAR | YTD_ACTUAL | YTD_TARGET | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Metro-North Railroad | Service Indicators | Percent of commuter trains that arrive at thei... | M | On-Time Performance (West of Hudson) | % | 96.9 | 95 | 1 | 2008 | 96.9 | 95 |
| 1 | Metro-North Railroad | Service Indicators | Percent of commuter trains that arrive at thei... | M | On-Time Performance (West of Hudson) | % | 95 | 95 | 2 | 2008 | 96 | 95 |
| 2 | Metro-North Railroad | Service Indicators | Percent of commuter trains that arrive at thei... | M | On-Time Performance (West of Hudson) | % | 96.9 | 95 | 3 | 2008 | 96.3 | 95 |
| 3 | Metro-North Railroad | Service Indicators | Percent of commuter trains that arrive at thei... | M | On-Time Performance (West of Hudson) | % | 98.3 | 95 | 4 | 2008 | 96.8 | 95 |
| 4 | Metro-North Railroad | Service Indicators | Percent of commuter trains that arrive at thei... | M | On-Time Performance (West of Hudson) | % | 95.8 | 95 | 5 | 2008 | 96.6 | 95 |
| 5 | Metro-North Railroad | Service Indicators | Percent of commuter trains that arrive at thei... | M | On-Time Performance (West of Hudson) | % | 94.4 | 95 | 6 | 2008 | 96.2 | 95 |
| 6 | Metro-North Railroad | Service Indicators | Percent of commuter trains that arrive at thei... | M | On-Time Performance (West of Hudson) | % | 96 | 95 | 7 | 2008 | 96.2 | 95 |
| 7 | Metro-North Railroad | Service Indicators | Percent of commuter trains that arrive at thei... | M | On-Time Performance (West of Hudson) | % | 96.4 | 95 | 8 | 2008 | 96.2 | 95 |
| 8 | Metro-North Railroad | Service Indicators | Percent of commuter trains that arrive at thei... | M | On-Time Performance (West of Hudson) | % | 93.7 | 95 | 9 | 2008 | 95.9 | 95 |
| 9 | Metro-North Railroad | Service Indicators | Percent of commuter trains that arrive at thei... | M | On-Time Performance (West of Hudson) | % | 96.4 | 95 | 10 | 2008 | 96 | 95 |
| 10 | Metro-North Railroad | Service Indicators | Percent of commuter trains that arrive at thei... | M | On-Time Performance (West of Hudson) | % | 96.9 | 95 | 11 | 2008 | 96.1 | 95 |
| 11 | Metro-North Railroad | Service Indicators | Percent of commuter trains that arrive at thei... | M | On-Time Performance (West of Hudson) | % | 95.1 | 95 | 12 | 2008 | 96 | 95 |
| 12 | Metro-North Railroad | Service Indicators | Percent of commuter trains that arrive at thei... | M | On-Time Performance (West of Hudson) | % | 92.6 | 96.2 | 1 | 2009 | 92.6 | 96.2 |
| 13 | Metro-North Railroad | Service Indicators | Percent of commuter trains that arrive at thei... | M | On-Time Performance (West of Hudson) | % | 96.8 | 96.2 | 2 | 2009 | 94.6 | 96.2 |
| 14 | Metro-North Railroad | Service Indicators | Percent of commuter trains that arrive at thei... | M | On-Time Performance (West of Hudson) | % | 96.9 | 96.2 | 3 | 2009 | 95.4 | 96.2 |
| 15 | Metro-North Railroad | Service Indicators | Percent of commuter trains that arrive at thei... | M | On-Time Performance (West of Hudson) | % | 97.1 | 96.2 | 4 | 2009 | 95.9 | 96.2 |
| 16 | Metro-North Railroad | Service Indicators | Percent of commuter trains that arrive at thei... | M | On-Time Performance (West of Hudson) | % | 97.8 | 96.2 | 5 | 2009 | 96.2 | 96.2 |
| 17 | Metro-North Railroad | Service Indicators | Percent of commuter trains that arrive at thei... | M | On-Time Performance (West of Hudson) | % | 97.3 | 96.2 | 6 | 2009 | 96.4 | 96.2 |
| 18 | Metro-North Railroad | Service Indicators | Percent of commuter trains that arrive at thei... | M | On-Time Performance (West of Hudson) | % | 96.7 | 96.2 | 7 | 2009 | 96.5 | 96.2 |
| 19 | Metro-North Railroad | Service Indicators | Percent of commuter trains that arrive at thei... | M | On-Time Performance (West of Hudson) | % | 95.7 | 96.2 | 8 | 2009 | 96.4 | 96.2 |
| 20 | Metro-North Railroad | Service Indicators | Percent of commuter trains that arrive at thei... | M | On-Time Performance (West of Hudson) | % | 96.1 | 96.2 | 9 | 2009 | 96.3 | 96.2 |
| 21 | Metro-North Railroad | Service Indicators | Percent of commuter trains that arrive at thei... | M | On-Time Performance (West of Hudson) | % | 94.8 | 96.2 | 10 | 2009 | 96.2 | 96.2 |
| 22 | Metro-North Railroad | Service Indicators | Percent of commuter trains that arrive at thei... | M | On-Time Performance (West of Hudson) | % | 95.7 | 96.2 | 11 | 2009 | 96.1 | 96.2 |
| 23 | Metro-North Railroad | Service Indicators | Percent of commuter trains that arrive at thei... | M | On-Time Performance (West of Hudson) | % | 95 | 96.2 | 12 | 2009 | 96 | 96.2 |
| 24 | Metro-North Railroad | Service Indicators | Percent of commuter trains that arrive at thei... | M | On-Time Performance (West of Hudson) | % | 98 | 96.3 | 1 | 2010 | 98 | 96.3 |
| 25 | Metro-North Railroad | Service Indicators | Percent of commuter trains that arrive at thei... | M | On-Time Performance (West of Hudson) | % | 93 | 96.3 | 2 | 2010 | 95.6 | 96.3 |
| 26 | Metro-North Railroad | Service Indicators | Percent of commuter trains that arrive at thei... | M | On-Time Performance (West of Hudson) | % | 96.9 | 96.3 | 3 | 2010 | 96.1 | 96.3 |
| 27 | Metro-North Railroad | Service Indicators | Percent of commuter trains that arrive at thei... | M | On-Time Performance (West of Hudson) | % | 98.1 | 96.3 | 4 | 2010 | 96.6 | 96.3 |
| 28 | Metro-North Railroad | Service Indicators | Percent of commuter trains that arrive at thei... | M | On-Time Performance (West of Hudson) | % | 97.6 | 96.3 | 5 | 2010 | 96.8 | 96.3 |
| 29 | Metro-North Railroad | Service Indicators | Percent of commuter trains that arrive at thei... | M | On-Time Performance (West of Hudson) | % | 97.4 | 96.3 | 6 | 2010 | 96.9 | 96.3 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 618 | Metro-North Railroad | Service Indicators | Percent of the time that escalators are operat... | M | Escalator Availability | % | 94 | 7 | 2009 | 95.14 | ||
| 619 | Metro-North Railroad | Service Indicators | Percent of the time that escalators are operat... | M | Escalator Availability | % | 97 | 8 | 2009 | 95.38 | ||
| 620 | Metro-North Railroad | Service Indicators | Percent of the time that escalators are operat... | M | Escalator Availability | % | 98.3 | 9 | 2009 | 95.7 | ||
| 621 | Metro-North Railroad | Service Indicators | Percent of the time that escalators are operat... | M | Escalator Availability | % | 98.7 | 10 | 2009 | 96 | ||
| 622 | Metro-North Railroad | Service Indicators | Percent of the time that escalators are operat... | M | Escalator Availability | % | 98.1 | 11 | 2009 | 96.21 | ||
| 623 | Metro-North Railroad | Service Indicators | Percent of the time that escalators are operat... | M | Escalator Availability | % | 100 | 12 | 2009 | 96.5 | ||
| 624 | Metro-North Railroad | Service Indicators | Percent of the time that escalators are operat... | M | Escalator Availability | % | 97.95 | 97 | 1 | 2010 | 97.95 | 97 |
| 625 | Metro-North Railroad | Service Indicators | Percent of the time that escalators are operat... | M | Escalator Availability | % | 100 | 97 | 2 | 2010 | 98.92 | 97 |
| 626 | Metro-North Railroad | Service Indicators | Percent of the time that escalators are operat... | M | Escalator Availability | % | 100 | 97 | 3 | 2010 | 99.29 | 97 |
| 627 | Metro-North Railroad | Service Indicators | Percent of the time that escalators are operat... | M | Escalator Availability | % | 100 | 97 | 4 | 2010 | 99.47 | 97 |
| 628 | Metro-North Railroad | Service Indicators | Percent of the time that escalators are operat... | M | Escalator Availability | % | 100 | 97 | 5 | 2010 | 99.58 | 97 |
| 629 | Metro-North Railroad | Service Indicators | Percent of the time that escalators are operat... | M | Escalator Availability | % | 91.21 | 97 | 6 | 2010 | 98.19 | 97 |
| 630 | Metro-North Railroad | Service Indicators | Percent of the time that escalators are operat... | M | Escalator Availability | % | 100 | 97 | 7 | 2010 | 98.46 | 97 |
| 631 | Metro-North Railroad | Service Indicators | Percent of the time that escalators are operat... | M | Escalator Availability | % | 100 | 97 | 8 | 2010 | 98.69 | 97 |
| 632 | Metro-North Railroad | Service Indicators | Percent of the time that escalators are operat... | M | Escalator Availability | % | 95.2 | 97 | 9 | 2010 | 98.3 | 97 |
| 633 | Metro-North Railroad | Service Indicators | Percent of the time that escalators are operat... | M | Escalator Availability | % | 90.91 | 97 | 10 | 2010 | 97.55 | 97 |
| 634 | Metro-North Railroad | Service Indicators | Percent of the time that escalators are operat... | M | Escalator Availability | % | 96.67 | 97 | 11 | 2010 | 97.47 | 97 |
| 635 | Metro-North Railroad | Service Indicators | Percent of the time that escalators are operat... | M | Escalator Availability | % | 90.03 | 97 | 12 | 2010 | 96.84 | 97 |
| 636 | Metro-North Railroad | Service Indicators | Percent of the time that escalators are operat... | M | Escalator Availability | % | 100 | 97 | 1 | 2011 | 100 | 97 |
| 637 | Metro-North Railroad | Service Indicators | Percent of the time that escalators are operat... | M | Escalator Availability | % | 100 | 97 | 2 | 2011 | 100 | 97 |
| 638 | Metro-North Railroad | Service Indicators | Percent of the time that escalators are operat... | M | Escalator Availability | % | 97.07 | 97 | 3 | 2011 | 98.86 | 97 |
| 639 | Metro-North Railroad | Service Indicators | Percent of the time that escalators are operat... | M | Escalator Availability | % | 98.18 | 97 | 4 | 2011 | 98.76 | 97 |
| 640 | Metro-North Railroad | Service Indicators | Percent of the time that escalators are operat... | M | Escalator Availability | % | 79.18 | 97 | 5 | 2011 | 90.91 | 97 |
| 641 | Metro-North Railroad | Service Indicators | Percent of the time that escalators are operat... | M | Escalator Availability | % | 97 | 6 | 2011 | 97 | ||
| 642 | Metro-North Railroad | Service Indicators | Percent of the time that escalators are operat... | M | Escalator Availability | % | 97 | 7 | 2011 | 97 | ||
| 643 | Metro-North Railroad | Service Indicators | Percent of the time that escalators are operat... | M | Escalator Availability | % | 97 | 8 | 2011 | 97 | ||
| 644 | Metro-North Railroad | Service Indicators | Percent of the time that escalators are operat... | M | Escalator Availability | % | 97 | 9 | 2011 | 97 | ||
| 645 | Metro-North Railroad | Service Indicators | Percent of the time that escalators are operat... | M | Escalator Availability | % | 97 | 10 | 2011 | 97 | ||
| 646 | Metro-North Railroad | Service Indicators | Percent of the time that escalators are operat... | M | Escalator Availability | % | 97 | 11 | 2011 | 97 | ||
| 647 | Metro-North Railroad | Service Indicators | Percent of the time that escalators are operat... | M | Escalator Availability | % | 97 | 12 | 2011 | 97 |
648 rows × 12 columns
2.数据库相关操作
2.1 sqlite数据库
import sqlite3
query = """
CREATE TABLE test
(a VARCHAR(20), b VARCHAR(20),
c REAL, d INTEGER
);"""
con = sqlite3.connect(':memory:')
con.execute(query)
con.commit()
data = [('Atlanta', 'Georgia', 1.25, 6),
('Tallahassee', 'Florida', 2.6, 3),
('Sacramento', 'California', 1.7, 5)]
stmt = "INSERT INTO test VALUES(?, ?, ?, ?)"
con.executemany(stmt, data)
con.commit()
cursor = con.execute('select * from test')
rows = cursor.fetchall()
rows
[(u'Atlanta', u'Georgia', 1.25, 6),
(u'Tallahassee', u'Florida', 2.6, 3),
(u'Sacramento', u'California', 1.7, 5)]
cursor.description
(('a', None, None, None, None, None, None),
('b', None, None, None, None, None, None),
('c', None, None, None, None, None, None),
('d', None, None, None, None, None, None))
DataFrame(rows, columns=zip(*cursor.description)[0])
| a | b | c | d | |
|---|---|---|---|---|
| 0 | Atlanta | Georgia | 1.25 | 6 |
| 1 | Tallahassee | Florida | 2.60 | 3 |
| 2 | Sacramento | California | 1.70 | 5 |
import pandas.io.sql as sql
sql.read_sql('select * from test', con)
| a | b | c | d | |
|---|---|---|---|---|
| 0 | Atlanta | Georgia | 1.25 | 6 |
| 1 | Tallahassee | Florida | 2.60 | 3 |
| 2 | Sacramento | California | 1.70 | 5 |
3.2 MySQL数据库
#coding=utf-8
import MySQLdb
conn= MySQLdb.connect(
host='localhost',
port = 3306,
user='root',
passwd='123456',
db ='test',
)
cur = conn.cursor()
#创建数据表
#cur.execute("create table student(id int ,name varchar(20),class varchar(30),age varchar(10))")
#插入一条数据
#cur.execute("insert into student values('2','Tom','3 year 2 class','9')")
#修改查询条件的数据
#cur.execute("update student set class='3 year 1 class' where name = 'Tom'")
#删除查询条件的数据
#cur.execute("delete from student where age='9'")
cur.close()
conn.commit()
conn.close()
3.3 Memcache
#coding:utf8
import memcache
class MemcachedClient():
''' python memcached 客户端操作示例 '''
def __init__(self, hostList):
self.__mc = memcache.Client(hostList);
def set(self, key, value):
result = self.__mc.set("name", "NieYong")
return result
def get(self, key):
name = self.__mc.get("name")
return name
def delete(self, key):
result = self.__mc.delete("name")
return result
if __name__ == '__main__':
mc = MemcachedClient(["127.0.0.1:11511", "127.0.0.1:11512"])
key = "name"
result = mc.set(key, "NieYong")
print "set的结果:", result
name = mc.get(key)
print "get的结果:", name
result = mc.delete(key)
print "delete的结果:", result
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
<ipython-input-39-51dc19a879b8> in <module>()
1
----> 2 import memcache
3
4 class MemcachedClient():
5 ''' python memcached 客户端操作示例 '''
ImportError: No module named memcache
3.4 MongoDB
#encoding:utf=8
import pymongo
connection=pymongo.Connection('10.32.38.50',27017)
#选择myblog库
db=connection.myblog
# 使用users集合
collection=db.users
# 添加单条数据到集合中
user = {"name":"cui","age":"10"}
collection.insert(user)
#同时添加多条数据到集合中
users=[{"name":"cui","age":"9"},{"name":"cui","age":"11"}]
collection.insert(users)
#查询单条记录
print collection.find_one()
#查询所有记录
for data in collection.find():
print data
#查询此集合中数据条数
print collection.count()
#简单参数查询
for data in collection.find({"name":"1"}):
print data
#使用find_one获取一条记录
print collection.find_one({"name":"1"})
#高级查询
print "__________________________________________"
print '''''collection.find({"age":{"$gt":"10"}})'''
print "__________________________________________"
for data in collection.find({"age":{"$gt":"10"}}).sort("age"):
print data
# 查看db下的所有集合
print db.collection_names()
python多种格式数据加载、处理与存储的更多相关文章
- Python之pandas数据加载、存储
Python之pandas数据加载.存储 0. 输入与输出大致可分为三类: 0.1 读取文本文件和其他更好效的磁盘存储格式 2.2 使用数据库中的数据 0.3 利用Web API操作网络资源 1. 读 ...
- python数据分析笔记——数据加载与整理]
[ python数据分析笔记——数据加载与整理] https://mp.weixin.qq.com/s?__biz=MjM5MDM3Nzg0NA==&mid=2651588899&id ...
- 利用python进行数据加载和存储
1.文本文件 (1)pd.read_csv加载分隔符为逗号的数据:pd.read_table从文件.URL.文件型对象中加载带分隔符的数据.默认为制表符.(加载为DataFrame结构) 参数name ...
- arcgis python 使用光标和内存中的要素类将数据加载到要素集 学习:http://zhihu.esrichina.com.cn/article/634
学习:http://zhihu.esrichina.com.cn/article/634使用光标和内存中的要素类将数据加载到要素集 import arcpy arcpy.env.overwriteOu ...
- 旷视MegEngine数据加载与处理
旷视MegEngine数据加载与处理 在网络训练与测试中,数据的加载和预处理往往会耗费大量的精力. MegEngine 提供了一系列接口来规范化这些处理工作. 利用 Dataset 封装一个数据集 数 ...
- 科学计算三维可视化---TVTK入门(数据加载)
一:数据加载 大多数可视化应用的数据并非是在TVTK库中构建的,很多都是通过接口读取外部数据文件 (一)使用vtkSTLReader来读取外部文件 .stl 文件是在计算机图形应用系统中,用于表示三角 ...
- [源码解析] PyTorch 分布式(1) --- 数据加载之DistributedSampler
[源码解析] PyTorch 分布式(1) --- 数据加载之DistributedSampler 目录 [源码解析] PyTorch 分布式(1) --- 数据加载之DistributedSampl ...
- flask+sqlite3+echarts3+ajax 异步数据加载
结构: /www | |-- /static |....|-- jquery-3.1.1.js |....|-- echarts.js(echarts3是单文件!!) | |-- /templates ...
- Android Volley和Gson实现网络数据加载
Android Volley和Gson实现网络数据加载 先看接口 1 升级接口 http://s.meibeike.com/mcloud/ota/cloudService POST请求 参数列表如下 ...
随机推荐
- bug描述技巧
进入测试行业已经两年了,我从未认真的考虑过提交一个bug需要注意哪些问题,只是主观的认为我只需要描述清楚就OK了,但是我在工作中发现有个别的开发经常跑来告诉我"这个bug你是不是描述错了&q ...
- Java并发包源码分析
并发是一种能并行运行多个程序或并行运行一个程序中多个部分的能力.如果程序中一个耗时的任务能以异步或并行的方式运行,那么整个程序的吞吐量和可交互性将大大改善.现代的PC都有多个CPU或一个CPU中有多个 ...
- 记一次企业级爬虫系统升级改造(二):基于AngleSharp实现的抓取服务
爬虫系统升级改造正式启动: 在第一篇文章,博主主要介绍了本次改造的爬虫系统的业务背景与全局规划构思: 未来Support云系统,不仅仅是爬虫系统,是集爬取数据.数据建模处理统计分析.支持全文检索资源库 ...
- IT培训行业揭秘(三)
关于培训班的课程是怎么设置的呢? 首先,国内也有几个水平不错的培训机构有自己课程研发体系,有自己的课程研发部门.我一直认为良心培训和黑心培训的区别就在这里,因为学生们所学的知识符不符合市场用功需求,就 ...
- csv表格处理(上)-- JS 与 PHP 协作导入导出
CSV简介 在开发后台管理系统的时候,几乎无可避免的会遇到需要导入导出Excel表格的需求.csv也是表格的一种,其中文名为“逗号分隔符文件”.在Excel中打开如下图左边所示,在记事本打开如下图右边 ...
- uicode编码解码
.版本 2.支持库 dp1 bydess = 字节集_还原 (到文本 (bytes)) ' HEX解码返回 (到文本 (解密数据 (bydess, “debugme?”, #RC4算法))) impo ...
- 在linux上如何通过composer安装yii
Composer可以理解成一个依赖管理工具 它能解决以下问题 a) 你有一个项目依赖于若干个库. b) 其中一些库依赖于其他库. c) 你声明你所依赖的东西. d) Composer 会找出哪个版 ...
- CSS Icon 项目地址 小图标-用css写成的
http://cssicon.space/#/icon/focus 这是所有用css写成的 小图标 右侧有 html和css代码
- thinkcmf导航制作
<?php $tree = sp_get_menu_tree('main'); ?> <foreach name="tree" item="vo&quo ...
- linux安装adb
本文只针对centOS6.8,其他版本未测试 1.下载adb包 下载android sdk for linux(http://tools.android-studio.org/index.php/sd ...