pandas 之 多层索引
In many applications, data may be spread across a number of files or datasets or be arranged in a form that is not easy to analyze. This chapter focuses on tools to help combine, and rearrange data.
(在许多应用中,数据可以分布在多个文件或数据集中,或者以不易分析的形式排列。 本章重点介绍帮助组合和重新排列数据的工具.)
import numpy as np
import pandas as pd
多层索引
Hierarchical indexing is an important featuer of pandas that enables you to have multiple(two or more) index levels on an axis. Somewhat abstractly, it provides a way for you to to work with higher dimensional data in a lower dimensional form.(通过多层索引的方式去从低维看待高维数据). Let's start with a simple example; create a Series with a list of lists(or arrays) as the index:
data = pd.Series(np.random.randn(9),
index=['a,a,a,b,b,c,c,d,d'.split(','),
[1,2,3,1,3,1,2,2,3]])
data
a 1 0.874880
2 1.424326
3 -2.028509
b 1 -1.081833
3 -0.072116
c 1 0.575918
2 -1.246831
d 2 -1.008064
3 0.988234
dtype: float64
What you're seeing is a prettified view of a Series with a MultiIndex as its index. The 'gaps' in the index display mean "use the lable directly above":
data.index
MultiIndex(levels=[['a', 'b', 'c', 'd'], [1, 2, 3]],
labels=[[0, 0, 0, 1, 1, 2, 2, 3, 3], [0, 1, 2, 0, 2, 0, 1, 1, 2]])
With a hierarchically indexed object(分层索引对象), so-called partial indexing is possible, enabling you to concisely(便捷地) select subsets of the data.
data['b'] # 1 3
1 -1.081833
3 -0.072116
dtype: float64
data['b':'c'] # 1 3 1 2
b 1 -1.081833
3 -0.072116
c 1 0.575918
2 -1.246831
dtype: float64
data.loc[['b', 'd']] # loc 通常按名字取, iloc 按下标取
b 1 -1.081833
3 -0.072116
d 2 -1.008064
3 0.988234
dtype: float64
"Selection is even possible from an inner level"
data.loc[:, 2]
'Selection is even possible from an inner level'
a 1.424326
c -1.246831
d -1.008064
dtype: float64
Hierarchical indexing plays an important role in reshapeing data and group-based operations like forming a pivot table. For example, you could rearrange the data into a DataFrame using its unstack method:
data.unstack()
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
| 1 | 2 | 3 | |
|---|---|---|---|
| a | 0.874880 | 1.424326 | -2.028509 |
| b | -1.081833 | NaN | -0.072116 |
| c | 0.575918 | -1.246831 | NaN |
| d | NaN | -1.008064 | 0.988234 |
The inverse operation of unstack is stack:
data.unstack().stack() # 相当于没变
a 1 0.874880
2 1.424326
3 -2.028509
b 1 -1.081833
3 -0.072116
c 1 0.575918
2 -1.246831
d 2 -1.008064
3 0.988234
dtype: float64
stack and unstack will be explored more detail later in this chapter.
With a DataFrame, either axis can have a hierarchical index:
frame = pd.DataFrame(np.arange(12).reshape((4,3)),
index=[['a','a','b','b'], [1,2,1,2]],
columns=[['Ohio', 'Ohio', 'Colorado'],
['Green', 'Red', 'Green']]
)
frame
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead tr th {
text-align: left;
}
| Ohio | Colorado | |||
|---|---|---|---|---|
| Green | Red | Green | ||
| a | 1 | 0 | 1 | 2 |
| 2 | 3 | 4 | 5 | |
| b | 1 | 6 | 7 | 8 |
| 2 | 9 | 10 | 11 | |
The hierarchical levels can have names(as strings or any Python objects). If so, these will show up in the console output:
frame.index.names = ['key1', 'key2']
frame.columns.names = ['state', 'color']
"可设置行列索引的名字呢"
frame
'可设置行列索引的名字呢'
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead tr th {
text-align: left;
}
.dataframe thead tr:last-of-type th {
text-align: right;
}
| state | Ohio | Colorado | ||
|---|---|---|---|---|
| color | Green | Red | Green | |
| key1 | key2 | |||
| a | 1 | 0 | 1 | 2 |
| 2 | 3 | 4 | 5 | |
| b | 1 | 6 | 7 | 8 |
| 2 | 9 | 10 | 11 | |
Be careful to distinguish(分辨) the index names 'state' and 'color'
Wiht partial column indexing you can similarly select groups of columns:
(使用部分列索引, 可以相应地使用列组)
frame['Ohio']
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
| color | Green | Red | |
|---|---|---|---|
| key1 | key2 | ||
| a | 1 | 0 | 1 |
| 2 | 3 | 4 | |
| b | 1 | 6 | 7 |
| 2 | 9 | 10 |
A MultiIndex can be created by itself and then reused; the columns in the preceding DataFrame with level names could be created like this.
tmp = pd.MultiIndex.from_arrays([['Ohio', 'Ohio', 'Colorado'], ['Green', 'Red', 'Green']],
names=['state', 'color'])
tmp
MultiIndex(levels=[['Colorado', 'Ohio'], ['Green', 'Red']],
labels=[[1, 1, 0], [0, 1, 0]],
names=['state', 'color'])
重排列和Level排序
At times you will need to rearange the order of the levels on an axis or sort the data by the value in one specific level. The swaplevel takes two levle numbers or names and return a new object with the levels interchanged(but the data is otherwise unaltered):
frame
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead tr th {
text-align: left;
}
.dataframe thead tr:last-of-type th {
text-align: right;
}
| state | Ohio | Colorado | ||
|---|---|---|---|---|
| color | Green | Red | Green | |
| key1 | key2 | |||
| a | 1 | 0 | 1 | 2 |
| 2 | 3 | 4 | 5 | |
| b | 1 | 6 | 7 | 8 |
| 2 | 9 | 10 | 11 | |
frame.swaplevel('key1', 'key2') # 交换索引level
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead tr th {
text-align: left;
}
.dataframe thead tr:last-of-type th {
text-align: right;
}
| state | Ohio | Colorado | ||
|---|---|---|---|---|
| color | Green | Red | Green | |
| key2 | key1 | |||
| 1 | a | 0 | 1 | 2 |
| 2 | a | 3 | 4 | 5 |
| 1 | b | 6 | 7 | 8 |
| 2 | b | 9 | 10 | 11 |
sort_index, on the other hand, sorts the data using only the values in a single level. When swapping levels, it's not uncommon to also use sort_index so that the result is lexicographically(词典的) sorted by the indicated level:
frame.sort_index(level=1)
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead tr th {
text-align: left;
}
.dataframe thead tr:last-of-type th {
text-align: right;
}
| state | Ohio | Colorado | ||
|---|---|---|---|---|
| color | Green | Red | Green | |
| key1 | key2 | |||
| a | 1 | 0 | 1 | 2 |
| b | 1 | 6 | 7 | 8 |
| a | 2 | 3 | 4 | 5 |
| b | 2 | 9 | 10 | 11 |
# cj
frame.sort_index(level=0)
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead tr th {
text-align: left;
}
.dataframe thead tr:last-of-type th {
text-align: right;
}
| state | Ohio | Colorado | ||
|---|---|---|---|---|
| color | Green | Red | Green | |
| key1 | key2 | |||
| a | 1 | 0 | 1 | 2 |
| 2 | 3 | 4 | 5 | |
| b | 1 | 6 | 7 | 8 |
| 2 | 9 | 10 | 11 | |
"先交换轴索引, 再按照轴0排序"
frame.swaplevel(0, 1).sort_index(level=0)
'先交换轴索引, 再按照轴0排序'
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead tr th {
text-align: left;
}
.dataframe thead tr:last-of-type th {
text-align: right;
}
| state | Ohio | Colorado | ||
|---|---|---|---|---|
| color | Green | Red | Green | |
| key2 | key1 | |||
| 1 | a | 0 | 1 | 2 |
| b | 6 | 7 | 8 | |
| 2 | a | 3 | 4 | 5 |
| b | 9 | 10 | 11 | |
Data selection performance is much better on hierarchically indexed if the index is lexicographically sorted starting with the outermost level-that is the result of calling sort_index()
如果索引从最外层开始按字典顺序排序,则在分层索引上,>数据选择性能要好得多——这是调用sort index()的结果
按level描述性统计
Many descriptive and summary statistic on DataFrame and Series have a level option in which you can specify the level you want to aggregate by on a particular axis. Consider the above DataFrame; we can aggregate by level on either the rows or columns like so:
frame
frame.sum(level='key2')
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead tr th {
text-align: left;
}
.dataframe thead tr:last-of-type th {
text-align: right;
}
| state | Ohio | Colorado | ||
|---|---|---|---|---|
| color | Green | Red | Green | |
| key1 | key2 | |||
| a | 1 | 0 | 1 | 2 |
| 2 | 3 | 4 | 5 | |
| b | 1 | 6 | 7 | 8 |
| 2 | 9 | 10 | 11 | |
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead tr th {
text-align: left;
}
.dataframe thead tr:last-of-type th {
text-align: right;
}
| state | Ohio | Colorado | |
|---|---|---|---|
| color | Green | Red | Green |
| key2 | |||
| 1 | 6 | 8 | 10 |
| 2 | 12 | 14 | 16 |
frame.sum(level='color', axis=1)
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
| color | Green | Red | |
|---|---|---|---|
| key1 | key2 | ||
| a | 1 | 2 | 1 |
| 2 | 8 | 4 | |
| b | 1 | 14 | 7 |
| 2 | 20 | 10 |
Under the hood, this utilizes(利用) pandas's groupby machinery, which will be discussed in more detail later in the book.
将DF某列值作为行索引
It's not unusual(不寻常的) to want to use one or more columns from a DataFrame as the row index; alternatively, you may wish to move the row index into the DataFrame's columns. Here' an example DataFrame:
想要使用DataFrame中的一个或多个列作为行索引并不罕见; 或者,您可能希望将行索引移动到DataFrame的列中。 这是一个示例DataFrame:
frame = pd.DataFrame({
'a': range(7),
'b': range(7, 0, -1),
'c':"one,one,one,two,two,two,two".split(','), # cj
'd':[0, 1, 2, 0, 1, 2, 3]
})
frame
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
| a | b | c | d | |
|---|---|---|---|---|
| 0 | 0 | 7 | one | 0 |
| 1 | 1 | 6 | one | 1 |
| 2 | 2 | 5 | one | 2 |
| 3 | 3 | 4 | two | 0 |
| 4 | 4 | 3 | two | 1 |
| 5 | 5 | 2 | two | 2 |
| 6 | 6 | 1 | two | 3 |
DataFrame's set_index function will create a new DataFrame using one or more of its columns as the index:
"将 c, d 列作为index, 同时去掉c, d"
frame2 = frame.set_index(['c', 'd'])
frame2
'将 c, d 列作为index, 同时去掉c, d'
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
| a | b | ||
|---|---|---|---|
| c | d | ||
| one | 0 | 0 | 7 |
| 1 | 1 | 6 | |
| 2 | 2 | 5 | |
| two | 0 | 3 | 4 |
| 1 | 4 | 3 | |
| 2 | 5 | 2 | |
| 3 | 6 | 1 |
By default the columns are removed from the DataFrame, though you can leave them in:
frame.set_index(['c', 'd'], drop=False)
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
| a | b | c | d | ||
|---|---|---|---|---|---|
| c | d | ||||
| one | 0 | 0 | 7 | one | 0 |
| 1 | 1 | 6 | one | 1 | |
| 2 | 2 | 5 | one | 2 | |
| two | 0 | 3 | 4 | two | 0 |
| 1 | 4 | 3 | two | 1 | |
| 2 | 5 | 2 | two | 2 | |
| 3 | 6 | 1 | two | 3 |
reset_index, on the other hand, does the opposite of set_index; the hierachical index levels are moved into the columns:
frame2
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
| a | b | ||
|---|---|---|---|
| c | d | ||
| one | 0 | 0 | 7 |
| 1 | 1 | 6 | |
| 2 | 2 | 5 | |
| two | 0 | 3 | 4 |
| 1 | 4 | 3 | |
| 2 | 5 | 2 | |
| 3 | 6 | 1 |
"将多层index给还原到列去..."
frame2.reset_index()
'将多层index给还原到列去...'
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
| c | d | a | b | |
|---|---|---|---|---|
| 0 | one | 0 | 0 | 7 |
| 1 | one | 1 | 1 | 6 |
| 2 | one | 2 | 2 | 5 |
| 3 | two | 0 | 3 | 4 |
| 4 | two | 1 | 4 | 3 |
| 5 | two | 2 | 5 | 2 |
| 6 | two | 3 | 6 | 1 |
# cj test
time.clock()
6e-07
def f(x, l=[]):
for i in range(x):
l.append(i*i)
print(l)
f(2)
f(3, [3,2,1])
f(3)
[0, 1]
[3, 2, 1, 0, 1, 4]
[0, 1, 0, 1, 4]
pandas 之 多层索引的更多相关文章
- pandas:多层索引
多层索引是指在行或者列轴上有两个及以上级别的索引,一般表示一个数据的几个分项. 1.创建多层索引 1.1通过分组产生多层索引 1.2由序列创建 1.3由元组创建 1.4可迭代对象的笛卡尔积 1.5将D ...
- pandas学习(创建多层索引、数据重塑与轴向旋转)
pandas学习(创建多层索引.数据重塑与轴向旋转) 目录 创建多层索引 数据重塑与轴向旋转 创建多层索引 隐式构造 Series 最常见的方法是给DataFrame构造函数的index参数传递两个或 ...
- 8 pandas模块,多层索引
1 创建多层索引 1)隐式构造 最常见的方法是给DataFrame构造函数的index参数传递两个或更多的数组 · Series也可以创建多层索引 ...
- pandas中层次化索引与切片
Pandas层次化索引 1. 创建多层索引 隐式索引: 常见的方式是给dataframe构造函数的index参数传递两个或是多个数组 Series也可以创建多层索引 Series多层索引 B =Ser ...
- pandas基础用法——索引
# -*- coding: utf-8 -*- # Time : 2016/11/28 15:14 # Author : XiaoDeng # version : python3.5 # Softwa ...
- pandas 之 时间序列索引
import numpy as np import pandas as pd 引入 A basic kind of time series object in pandas is a Series i ...
- Pandas | 08 重建索引
重新索引会更改DataFrame的行标签和列标签. 可以通过索引来实现多个操作: 重新排序现有数据以匹配一组新的标签. 在没有标签数据的标签位置插入缺失值(NA)标记. import pandas a ...
- numpy和pandas的基础索引切片
Numpy的索引切片 索引 In [72]: arr = np.array([[[1,1,1],[2,2,2]],[[3,3,3],[4,4,4]]]) In [73]: arr Out[73]: a ...
- Lesson8——Pandas reindex重置索引
pandas目录 1 简介 重置索引(reindex)可以更改原 DataFrame 的行标签或列标签,并使更改后的行.列标签与 DataFrame 中的数据逐一匹配.通过重置索引操作,您可以完成对现 ...
随机推荐
- zzulioj - 2618: ACM-ICPC亚洲区域赛ZZULI站
题目链接; http://acm.zzuli.edu.cn/problem.php?id=2618 题目描述 玩了这么多游戏,V决定还是去做几道ACM题练练手,于是翻到了一道201X年ACM/ICPC ...
- 一步一步编写AVL树
第一步:定义结构体 typedef struct Node{ int d; //data ; //height struct Node* l=NULL; struct Node* r=NULL; No ...
- matplotlib折线图
绘制折线图:参考https://baijiahao.baidu.com/s?id=1608586625622704613 (3)近10年GDP变化的曲线图,及三次产业GDP变化的曲 ...
- shell脚本显示字体颜色
shell脚本中echo显示内容带颜色显示,echo显示带颜色,需要使用参数-e 格式如下: echo -e "\033[字背景颜色:文字颜色m字符串\033[0m" 例如: ec ...
- 每日一问:不一样的角度吐槽下 DataBinding
我们项目采用的是 kotlin && DataBinding 处理的,可能你会疑问,既然用的是 kotlin,为啥没有用 kotlinx?新的页面当然是用的 kotlinx 啦,但我们 ...
- rabbitmq安装集群
centos 7.3 64 172.18.39.241 k8s-mini-241172.18.39.242 k8s-mini-242172.18.39.243 k8s-master-243 vim / ...
- 学习spring源码-可参考的资料
剑指Spring源码(二) https://www.cnblogs.com/codebear/p/10374261.html 使用idea和gradle编译spring5源码https://blog. ...
- Maven 教程(7)— Maven使用的最佳实践
原文地址:https://blog.csdn.net/liupeifeng3514/article/details/79544201 1.设置MAVEN_OPTS环境变量 通常需要设置MAVEN_OP ...
- Jmeter之Bean shell使用(二)(转载)
转载地址:https://www.cnblogs.com/puresoul/p/4949889.html 上一篇Jmeter之Bean shell使用(一)简单介绍了下Jmeter中的Bean she ...
- [原创]小巧免杀的端口转发工具PortTran(附.net源码)
0x001 简介 PortTran by k8gege.NET版端口转发工具,支持任意权限下转发 0x002 背景工具在2016年左右写的,当时某个内网不知何原故LCX用不了 由于Win2003才刚停 ...