pandas 之 多层索引
In many applications, data may be spread across a number of files or datasets or be arranged in a form that is not easy to analyze. This chapter focuses on tools to help combine, and rearrange data.
(在许多应用中,数据可以分布在多个文件或数据集中,或者以不易分析的形式排列。 本章重点介绍帮助组合和重新排列数据的工具.)
import numpy as np
import pandas as pd
多层索引
Hierarchical indexing is an important featuer of pandas that enables you to have multiple(two or more) index levels on an axis. Somewhat abstractly, it provides a way for you to to work with higher dimensional data in a lower dimensional form.(通过多层索引的方式去从低维看待高维数据). Let's start with a simple example; create a Series with a list of lists(or arrays) as the index:
data = pd.Series(np.random.randn(9),
index=['a,a,a,b,b,c,c,d,d'.split(','),
[1,2,3,1,3,1,2,2,3]])
data
a 1 0.874880
2 1.424326
3 -2.028509
b 1 -1.081833
3 -0.072116
c 1 0.575918
2 -1.246831
d 2 -1.008064
3 0.988234
dtype: float64
What you're seeing is a prettified view of a Series with a MultiIndex as its index. The 'gaps' in the index display mean "use the lable directly above":
data.index
MultiIndex(levels=[['a', 'b', 'c', 'd'], [1, 2, 3]],
labels=[[0, 0, 0, 1, 1, 2, 2, 3, 3], [0, 1, 2, 0, 2, 0, 1, 1, 2]])
With a hierarchically indexed object(分层索引对象), so-called partial indexing is possible, enabling you to concisely(便捷地) select subsets of the data.
data['b'] # 1 3
1 -1.081833
3 -0.072116
dtype: float64
data['b':'c'] # 1 3 1 2
b 1 -1.081833
3 -0.072116
c 1 0.575918
2 -1.246831
dtype: float64
data.loc[['b', 'd']] # loc 通常按名字取, iloc 按下标取
b 1 -1.081833
3 -0.072116
d 2 -1.008064
3 0.988234
dtype: float64
"Selection is even possible from an inner level"
data.loc[:, 2]
'Selection is even possible from an inner level'
a 1.424326
c -1.246831
d -1.008064
dtype: float64
Hierarchical indexing plays an important role in reshapeing data and group-based operations like forming a pivot table. For example, you could rearrange the data into a DataFrame using its unstack method:
data.unstack()
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
| 1 | 2 | 3 | |
|---|---|---|---|
| a | 0.874880 | 1.424326 | -2.028509 |
| b | -1.081833 | NaN | -0.072116 |
| c | 0.575918 | -1.246831 | NaN |
| d | NaN | -1.008064 | 0.988234 |
The inverse operation of unstack is stack:
data.unstack().stack() # 相当于没变
a 1 0.874880
2 1.424326
3 -2.028509
b 1 -1.081833
3 -0.072116
c 1 0.575918
2 -1.246831
d 2 -1.008064
3 0.988234
dtype: float64
stack and unstack will be explored more detail later in this chapter.
With a DataFrame, either axis can have a hierarchical index:
frame = pd.DataFrame(np.arange(12).reshape((4,3)),
index=[['a','a','b','b'], [1,2,1,2]],
columns=[['Ohio', 'Ohio', 'Colorado'],
['Green', 'Red', 'Green']]
)
frame
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead tr th {
text-align: left;
}
| Ohio | Colorado | |||
|---|---|---|---|---|
| Green | Red | Green | ||
| a | 1 | 0 | 1 | 2 |
| 2 | 3 | 4 | 5 | |
| b | 1 | 6 | 7 | 8 |
| 2 | 9 | 10 | 11 | |
The hierarchical levels can have names(as strings or any Python objects). If so, these will show up in the console output:
frame.index.names = ['key1', 'key2']
frame.columns.names = ['state', 'color']
"可设置行列索引的名字呢"
frame
'可设置行列索引的名字呢'
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead tr th {
text-align: left;
}
.dataframe thead tr:last-of-type th {
text-align: right;
}
| state | Ohio | Colorado | ||
|---|---|---|---|---|
| color | Green | Red | Green | |
| key1 | key2 | |||
| a | 1 | 0 | 1 | 2 |
| 2 | 3 | 4 | 5 | |
| b | 1 | 6 | 7 | 8 |
| 2 | 9 | 10 | 11 | |
Be careful to distinguish(分辨) the index names 'state' and 'color'
Wiht partial column indexing you can similarly select groups of columns:
(使用部分列索引, 可以相应地使用列组)
frame['Ohio']
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
| color | Green | Red | |
|---|---|---|---|
| key1 | key2 | ||
| a | 1 | 0 | 1 |
| 2 | 3 | 4 | |
| b | 1 | 6 | 7 |
| 2 | 9 | 10 |
A MultiIndex can be created by itself and then reused; the columns in the preceding DataFrame with level names could be created like this.
tmp = pd.MultiIndex.from_arrays([['Ohio', 'Ohio', 'Colorado'], ['Green', 'Red', 'Green']],
names=['state', 'color'])
tmp
MultiIndex(levels=[['Colorado', 'Ohio'], ['Green', 'Red']],
labels=[[1, 1, 0], [0, 1, 0]],
names=['state', 'color'])
重排列和Level排序
At times you will need to rearange the order of the levels on an axis or sort the data by the value in one specific level. The swaplevel takes two levle numbers or names and return a new object with the levels interchanged(but the data is otherwise unaltered):
frame
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead tr th {
text-align: left;
}
.dataframe thead tr:last-of-type th {
text-align: right;
}
| state | Ohio | Colorado | ||
|---|---|---|---|---|
| color | Green | Red | Green | |
| key1 | key2 | |||
| a | 1 | 0 | 1 | 2 |
| 2 | 3 | 4 | 5 | |
| b | 1 | 6 | 7 | 8 |
| 2 | 9 | 10 | 11 | |
frame.swaplevel('key1', 'key2') # 交换索引level
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead tr th {
text-align: left;
}
.dataframe thead tr:last-of-type th {
text-align: right;
}
| state | Ohio | Colorado | ||
|---|---|---|---|---|
| color | Green | Red | Green | |
| key2 | key1 | |||
| 1 | a | 0 | 1 | 2 |
| 2 | a | 3 | 4 | 5 |
| 1 | b | 6 | 7 | 8 |
| 2 | b | 9 | 10 | 11 |
sort_index, on the other hand, sorts the data using only the values in a single level. When swapping levels, it's not uncommon to also use sort_index so that the result is lexicographically(词典的) sorted by the indicated level:
frame.sort_index(level=1)
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead tr th {
text-align: left;
}
.dataframe thead tr:last-of-type th {
text-align: right;
}
| state | Ohio | Colorado | ||
|---|---|---|---|---|
| color | Green | Red | Green | |
| key1 | key2 | |||
| a | 1 | 0 | 1 | 2 |
| b | 1 | 6 | 7 | 8 |
| a | 2 | 3 | 4 | 5 |
| b | 2 | 9 | 10 | 11 |
# cj
frame.sort_index(level=0)
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead tr th {
text-align: left;
}
.dataframe thead tr:last-of-type th {
text-align: right;
}
| state | Ohio | Colorado | ||
|---|---|---|---|---|
| color | Green | Red | Green | |
| key1 | key2 | |||
| a | 1 | 0 | 1 | 2 |
| 2 | 3 | 4 | 5 | |
| b | 1 | 6 | 7 | 8 |
| 2 | 9 | 10 | 11 | |
"先交换轴索引, 再按照轴0排序"
frame.swaplevel(0, 1).sort_index(level=0)
'先交换轴索引, 再按照轴0排序'
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead tr th {
text-align: left;
}
.dataframe thead tr:last-of-type th {
text-align: right;
}
| state | Ohio | Colorado | ||
|---|---|---|---|---|
| color | Green | Red | Green | |
| key2 | key1 | |||
| 1 | a | 0 | 1 | 2 |
| b | 6 | 7 | 8 | |
| 2 | a | 3 | 4 | 5 |
| b | 9 | 10 | 11 | |
Data selection performance is much better on hierarchically indexed if the index is lexicographically sorted starting with the outermost level-that is the result of calling sort_index()
如果索引从最外层开始按字典顺序排序,则在分层索引上,>数据选择性能要好得多——这是调用sort index()的结果
按level描述性统计
Many descriptive and summary statistic on DataFrame and Series have a level option in which you can specify the level you want to aggregate by on a particular axis. Consider the above DataFrame; we can aggregate by level on either the rows or columns like so:
frame
frame.sum(level='key2')
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead tr th {
text-align: left;
}
.dataframe thead tr:last-of-type th {
text-align: right;
}
| state | Ohio | Colorado | ||
|---|---|---|---|---|
| color | Green | Red | Green | |
| key1 | key2 | |||
| a | 1 | 0 | 1 | 2 |
| 2 | 3 | 4 | 5 | |
| b | 1 | 6 | 7 | 8 |
| 2 | 9 | 10 | 11 | |
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead tr th {
text-align: left;
}
.dataframe thead tr:last-of-type th {
text-align: right;
}
| state | Ohio | Colorado | |
|---|---|---|---|
| color | Green | Red | Green |
| key2 | |||
| 1 | 6 | 8 | 10 |
| 2 | 12 | 14 | 16 |
frame.sum(level='color', axis=1)
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
| color | Green | Red | |
|---|---|---|---|
| key1 | key2 | ||
| a | 1 | 2 | 1 |
| 2 | 8 | 4 | |
| b | 1 | 14 | 7 |
| 2 | 20 | 10 |
Under the hood, this utilizes(利用) pandas's groupby machinery, which will be discussed in more detail later in the book.
将DF某列值作为行索引
It's not unusual(不寻常的) to want to use one or more columns from a DataFrame as the row index; alternatively, you may wish to move the row index into the DataFrame's columns. Here' an example DataFrame:
想要使用DataFrame中的一个或多个列作为行索引并不罕见; 或者,您可能希望将行索引移动到DataFrame的列中。 这是一个示例DataFrame:
frame = pd.DataFrame({
'a': range(7),
'b': range(7, 0, -1),
'c':"one,one,one,two,two,two,two".split(','), # cj
'd':[0, 1, 2, 0, 1, 2, 3]
})
frame
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
| a | b | c | d | |
|---|---|---|---|---|
| 0 | 0 | 7 | one | 0 |
| 1 | 1 | 6 | one | 1 |
| 2 | 2 | 5 | one | 2 |
| 3 | 3 | 4 | two | 0 |
| 4 | 4 | 3 | two | 1 |
| 5 | 5 | 2 | two | 2 |
| 6 | 6 | 1 | two | 3 |
DataFrame's set_index function will create a new DataFrame using one or more of its columns as the index:
"将 c, d 列作为index, 同时去掉c, d"
frame2 = frame.set_index(['c', 'd'])
frame2
'将 c, d 列作为index, 同时去掉c, d'
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
| a | b | ||
|---|---|---|---|
| c | d | ||
| one | 0 | 0 | 7 |
| 1 | 1 | 6 | |
| 2 | 2 | 5 | |
| two | 0 | 3 | 4 |
| 1 | 4 | 3 | |
| 2 | 5 | 2 | |
| 3 | 6 | 1 |
By default the columns are removed from the DataFrame, though you can leave them in:
frame.set_index(['c', 'd'], drop=False)
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
| a | b | c | d | ||
|---|---|---|---|---|---|
| c | d | ||||
| one | 0 | 0 | 7 | one | 0 |
| 1 | 1 | 6 | one | 1 | |
| 2 | 2 | 5 | one | 2 | |
| two | 0 | 3 | 4 | two | 0 |
| 1 | 4 | 3 | two | 1 | |
| 2 | 5 | 2 | two | 2 | |
| 3 | 6 | 1 | two | 3 |
reset_index, on the other hand, does the opposite of set_index; the hierachical index levels are moved into the columns:
frame2
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
| a | b | ||
|---|---|---|---|
| c | d | ||
| one | 0 | 0 | 7 |
| 1 | 1 | 6 | |
| 2 | 2 | 5 | |
| two | 0 | 3 | 4 |
| 1 | 4 | 3 | |
| 2 | 5 | 2 | |
| 3 | 6 | 1 |
"将多层index给还原到列去..."
frame2.reset_index()
'将多层index给还原到列去...'
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
| c | d | a | b | |
|---|---|---|---|---|
| 0 | one | 0 | 0 | 7 |
| 1 | one | 1 | 1 | 6 |
| 2 | one | 2 | 2 | 5 |
| 3 | two | 0 | 3 | 4 |
| 4 | two | 1 | 4 | 3 |
| 5 | two | 2 | 5 | 2 |
| 6 | two | 3 | 6 | 1 |
# cj test
time.clock()
6e-07
def f(x, l=[]):
for i in range(x):
l.append(i*i)
print(l)
f(2)
f(3, [3,2,1])
f(3)
[0, 1]
[3, 2, 1, 0, 1, 4]
[0, 1, 0, 1, 4]
pandas 之 多层索引的更多相关文章
- pandas:多层索引
多层索引是指在行或者列轴上有两个及以上级别的索引,一般表示一个数据的几个分项. 1.创建多层索引 1.1通过分组产生多层索引 1.2由序列创建 1.3由元组创建 1.4可迭代对象的笛卡尔积 1.5将D ...
- pandas学习(创建多层索引、数据重塑与轴向旋转)
pandas学习(创建多层索引.数据重塑与轴向旋转) 目录 创建多层索引 数据重塑与轴向旋转 创建多层索引 隐式构造 Series 最常见的方法是给DataFrame构造函数的index参数传递两个或 ...
- 8 pandas模块,多层索引
1 创建多层索引 1)隐式构造 最常见的方法是给DataFrame构造函数的index参数传递两个或更多的数组 · Series也可以创建多层索引 ...
- pandas中层次化索引与切片
Pandas层次化索引 1. 创建多层索引 隐式索引: 常见的方式是给dataframe构造函数的index参数传递两个或是多个数组 Series也可以创建多层索引 Series多层索引 B =Ser ...
- pandas基础用法——索引
# -*- coding: utf-8 -*- # Time : 2016/11/28 15:14 # Author : XiaoDeng # version : python3.5 # Softwa ...
- pandas 之 时间序列索引
import numpy as np import pandas as pd 引入 A basic kind of time series object in pandas is a Series i ...
- Pandas | 08 重建索引
重新索引会更改DataFrame的行标签和列标签. 可以通过索引来实现多个操作: 重新排序现有数据以匹配一组新的标签. 在没有标签数据的标签位置插入缺失值(NA)标记. import pandas a ...
- numpy和pandas的基础索引切片
Numpy的索引切片 索引 In [72]: arr = np.array([[[1,1,1],[2,2,2]],[[3,3,3],[4,4,4]]]) In [73]: arr Out[73]: a ...
- Lesson8——Pandas reindex重置索引
pandas目录 1 简介 重置索引(reindex)可以更改原 DataFrame 的行标签或列标签,并使更改后的行.列标签与 DataFrame 中的数据逐一匹配.通过重置索引操作,您可以完成对现 ...
随机推荐
- ESA2GJK1DH1K基础篇: 来吧! 彻底了解一下MQTT
首先你需要知道MQTT并不是什么高大上的事物,它只是一个软件,对就是一个软件.其实就是个TCP服务器 一,既然是TCP服务器,这个TCP服务器和咱平时做的有什么不一样呢. 首先,平时的时候咱做的TCP ...
- CF264D - Colorful Stones 题解
题面 官方题解 模拟赛题解 题解概述: 定义符号A~B表示序列A是序列B的子序列,A!~B反之. 设操作序列为I,则有A~I,B!~I,C~I,D!~I. 可得出条件①B!~C且D!~A,所以我们只要 ...
- react + node + express + ant + mongodb 的简洁兼时尚的博客网站
前言 此项目是用于构建博客网站的,由三部分组成,包含前台展示.管理后台和后端. 此项目是基于 react + node + express + ant + mongodb 的,项目已经开源,项目地址在 ...
- Windows安装gmpy2
我在终端用python2的pip安装gmpy2时显示缺少Visual C++ 9.0 按照其要求,访问他给的网址安装一下 https://pypi.org/project/gmpy2/#files 进 ...
- 【医学】三分钟看懂乳腺BI-RADS分级
“BI-RADS”是指美国放射学会的乳腺影像报告和数据系统(Breast Imaging Reporting and Data System)的缩写.BI-RADS分级法将乳腺病变分为0-6级,用来评 ...
- 7种 JVM 垃圾收集器特点、优劣势及使用场景(多图)
7种 JVM 垃圾收集器特点.优劣势及使用场景(多图) mp.weixin.qq.com 点击上方"IT牧场",选择"设为星标"技术干货每日送达! 一.常见垃 ...
- k8s+Jenkins+GitLab-自动化部署项目
0.目录 整体架构目录:ASP.NET Core分布式项目实战-目录 k8s架构目录:Kubernetes(k8s)集群部署(k8s企业级Docker容器集群管理)系列目录 此文阅读目录: 1.闲聊 ...
- Scala Operators, File & RegExp
Operators Thread.`yield`() 反引号除了用于命名标识符,还可以在调用方法时避免冲突(yield 为 Scala 关键字,但也是 Thread 的方法) 中缀运算符(infix ...
- Python变量问题:命名、类型、赋值
1.变量问题 1.1变量命名: (1)变量名只能包含字母.数字和下划线.变量名可以字母或下划线开头,但不能以数字开头.例如,name1_,_name1都是合法的,但1name就不行(2)变量名中间不能 ...
- .NET Core程序中,如何获取和设置操作系统环境变量的值
有时候我们在.NET Core程序中需要获取和设置操作系统环境变量的值.本文演示如何使用Environment.GetEnvironmentVariable和Environment.SetEnviro ...