pandas 之 多层索引
In many applications, data may be spread across a number of files or datasets or be arranged in a form that is not easy to analyze. This chapter focuses on tools to help combine, and rearrange data.
(在许多应用中,数据可以分布在多个文件或数据集中,或者以不易分析的形式排列。 本章重点介绍帮助组合和重新排列数据的工具.)
import numpy as np
import pandas as pd
多层索引
Hierarchical indexing is an important featuer of pandas that enables you to have multiple(two or more) index levels on an axis. Somewhat abstractly, it provides a way for you to to work with higher dimensional data in a lower dimensional form.(通过多层索引的方式去从低维看待高维数据). Let's start with a simple example; create a Series with a list of lists(or arrays) as the index:
data = pd.Series(np.random.randn(9),
                index=['a,a,a,b,b,c,c,d,d'.split(','),
                      [1,2,3,1,3,1,2,2,3]])
data
a  1    0.874880
   2    1.424326
   3   -2.028509
b  1   -1.081833
   3   -0.072116
c  1    0.575918
   2   -1.246831
d  2   -1.008064
   3    0.988234
dtype: float64
What you're seeing is a prettified view of a Series with a MultiIndex as its index. The 'gaps' in the index display mean "use the lable directly above":
data.index
MultiIndex(levels=[['a', 'b', 'c', 'd'], [1, 2, 3]],
           labels=[[0, 0, 0, 1, 1, 2, 2, 3, 3], [0, 1, 2, 0, 2, 0, 1, 1, 2]])
With a hierarchically indexed object(分层索引对象), so-called partial indexing is possible, enabling you to concisely(便捷地) select subsets of the data.
data['b']  # 1 3
1   -1.081833
3   -0.072116
dtype: float64
data['b':'c']  # 1 3 1 2
b  1   -1.081833
   3   -0.072116
c  1    0.575918
   2   -1.246831
dtype: float64
data.loc[['b', 'd']]  # loc 通常按名字取, iloc 按下标取
b  1   -1.081833
   3   -0.072116
d  2   -1.008064
   3    0.988234
dtype: float64
"Selection is even possible from an inner level" 
data.loc[:, 2]
'Selection is even possible from an inner level'
a    1.424326
c   -1.246831
d   -1.008064
dtype: float64
Hierarchical indexing plays an important role in reshapeing data and group-based operations like forming a pivot table. For example, you could rearrange the data into a DataFrame using its unstack method:
data.unstack()
.dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }
.dataframe tbody tr th {
    vertical-align: top;
}
.dataframe thead th {
    text-align: right;
}
| 1 | 2 | 3 | |
|---|---|---|---|
| a | 0.874880 | 1.424326 | -2.028509 | 
| b | -1.081833 | NaN | -0.072116 | 
| c | 0.575918 | -1.246831 | NaN | 
| d | NaN | -1.008064 | 0.988234 | 
The inverse operation of unstack is stack:
data.unstack().stack()  # 相当于没变
a  1    0.874880
   2    1.424326
   3   -2.028509
b  1   -1.081833
   3   -0.072116
c  1    0.575918
   2   -1.246831
d  2   -1.008064
   3    0.988234
dtype: float64
stack and unstack will be explored more detail later in this chapter.
With a DataFrame, either axis can have a hierarchical index:
frame = pd.DataFrame(np.arange(12).reshape((4,3)),
                     index=[['a','a','b','b'], [1,2,1,2]],
                     columns=[['Ohio', 'Ohio', 'Colorado'],
                             ['Green', 'Red', 'Green']]
                    )
frame
.dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }
.dataframe tbody tr th {
    vertical-align: top;
}
.dataframe thead tr th {
    text-align: left;
}
| Ohio | Colorado | |||
|---|---|---|---|---|
| Green | Red | Green | ||
| a | 1 | 0 | 1 | 2 | 
| 2 | 3 | 4 | 5 | |
| b | 1 | 6 | 7 | 8 | 
| 2 | 9 | 10 | 11 | |
The hierarchical levels can have names(as strings or any Python objects). If so, these will show up in the console output:
frame.index.names = ['key1', 'key2']
frame.columns.names = ['state', 'color']
"可设置行列索引的名字呢"
frame
'可设置行列索引的名字呢'
.dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }
.dataframe tbody tr th {
    vertical-align: top;
}
.dataframe thead tr th {
    text-align: left;
}
.dataframe thead tr:last-of-type th {
    text-align: right;
}
| state | Ohio | Colorado | ||
|---|---|---|---|---|
| color | Green | Red | Green | |
| key1 | key2 | |||
| a | 1 | 0 | 1 | 2 | 
| 2 | 3 | 4 | 5 | |
| b | 1 | 6 | 7 | 8 | 
| 2 | 9 | 10 | 11 | |
Be careful to distinguish(分辨) the index names 'state' and 'color'
Wiht partial column indexing you can similarly select groups of columns:
(使用部分列索引, 可以相应地使用列组)
frame['Ohio']
.dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }
.dataframe tbody tr th {
    vertical-align: top;
}
.dataframe thead th {
    text-align: right;
}
| color | Green | Red | |
|---|---|---|---|
| key1 | key2 | ||
| a | 1 | 0 | 1 | 
| 2 | 3 | 4 | |
| b | 1 | 6 | 7 | 
| 2 | 9 | 10 | 
A MultiIndex can be created by itself and then reused; the columns in the preceding DataFrame with level names could be created like this.
tmp = pd.MultiIndex.from_arrays([['Ohio', 'Ohio', 'Colorado'], ['Green', 'Red', 'Green']],
names=['state', 'color'])
tmp
MultiIndex(levels=[['Colorado', 'Ohio'], ['Green', 'Red']],
           labels=[[1, 1, 0], [0, 1, 0]],
           names=['state', 'color'])
重排列和Level排序
At times you will need to rearange the order of the levels on an axis or sort the data by the value in one specific level. The swaplevel takes two levle numbers or names and return a new object with the levels interchanged(but the data is otherwise unaltered):
frame
.dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }
.dataframe tbody tr th {
    vertical-align: top;
}
.dataframe thead tr th {
    text-align: left;
}
.dataframe thead tr:last-of-type th {
    text-align: right;
}
| state | Ohio | Colorado | ||
|---|---|---|---|---|
| color | Green | Red | Green | |
| key1 | key2 | |||
| a | 1 | 0 | 1 | 2 | 
| 2 | 3 | 4 | 5 | |
| b | 1 | 6 | 7 | 8 | 
| 2 | 9 | 10 | 11 | |
frame.swaplevel('key1', 'key2')  # 交换索引level
.dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }
.dataframe tbody tr th {
    vertical-align: top;
}
.dataframe thead tr th {
    text-align: left;
}
.dataframe thead tr:last-of-type th {
    text-align: right;
}
| state | Ohio | Colorado | ||
|---|---|---|---|---|
| color | Green | Red | Green | |
| key2 | key1 | |||
| 1 | a | 0 | 1 | 2 | 
| 2 | a | 3 | 4 | 5 | 
| 1 | b | 6 | 7 | 8 | 
| 2 | b | 9 | 10 | 11 | 
sort_index, on the other hand, sorts the data using only the values in a single level. When swapping levels, it's not uncommon to also use sort_index so that the result is lexicographically(词典的) sorted by the indicated level:
frame.sort_index(level=1)
.dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }
.dataframe tbody tr th {
    vertical-align: top;
}
.dataframe thead tr th {
    text-align: left;
}
.dataframe thead tr:last-of-type th {
    text-align: right;
}
| state | Ohio | Colorado | ||
|---|---|---|---|---|
| color | Green | Red | Green | |
| key1 | key2 | |||
| a | 1 | 0 | 1 | 2 | 
| b | 1 | 6 | 7 | 8 | 
| a | 2 | 3 | 4 | 5 | 
| b | 2 | 9 | 10 | 11 | 
# cj
frame.sort_index(level=0)
.dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }
.dataframe tbody tr th {
    vertical-align: top;
}
.dataframe thead tr th {
    text-align: left;
}
.dataframe thead tr:last-of-type th {
    text-align: right;
}
| state | Ohio | Colorado | ||
|---|---|---|---|---|
| color | Green | Red | Green | |
| key1 | key2 | |||
| a | 1 | 0 | 1 | 2 | 
| 2 | 3 | 4 | 5 | |
| b | 1 | 6 | 7 | 8 | 
| 2 | 9 | 10 | 11 | |
"先交换轴索引, 再按照轴0排序"
frame.swaplevel(0, 1).sort_index(level=0)
'先交换轴索引, 再按照轴0排序'
.dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }
.dataframe tbody tr th {
    vertical-align: top;
}
.dataframe thead tr th {
    text-align: left;
}
.dataframe thead tr:last-of-type th {
    text-align: right;
}
| state | Ohio | Colorado | ||
|---|---|---|---|---|
| color | Green | Red | Green | |
| key2 | key1 | |||
| 1 | a | 0 | 1 | 2 | 
| b | 6 | 7 | 8 | |
| 2 | a | 3 | 4 | 5 | 
| b | 9 | 10 | 11 | |
Data selection performance is much better on hierarchically indexed if the index is lexicographically sorted starting with the outermost level-that is the result of calling sort_index()
如果索引从最外层开始按字典顺序排序,则在分层索引上,>数据选择性能要好得多——这是调用sort index()的结果
按level描述性统计
Many descriptive and summary statistic on DataFrame and Series have a level option in which you can specify the level you want to aggregate by on a particular axis. Consider the above DataFrame; we can aggregate by level on either the rows or columns like so:
frame
frame.sum(level='key2')
.dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }
.dataframe tbody tr th {
    vertical-align: top;
}
.dataframe thead tr th {
    text-align: left;
}
.dataframe thead tr:last-of-type th {
    text-align: right;
}
| state | Ohio | Colorado | ||
|---|---|---|---|---|
| color | Green | Red | Green | |
| key1 | key2 | |||
| a | 1 | 0 | 1 | 2 | 
| 2 | 3 | 4 | 5 | |
| b | 1 | 6 | 7 | 8 | 
| 2 | 9 | 10 | 11 | |
.dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }
.dataframe tbody tr th {
    vertical-align: top;
}
.dataframe thead tr th {
    text-align: left;
}
.dataframe thead tr:last-of-type th {
    text-align: right;
}
| state | Ohio | Colorado | |
|---|---|---|---|
| color | Green | Red | Green | 
| key2 | |||
| 1 | 6 | 8 | 10 | 
| 2 | 12 | 14 | 16 | 
frame.sum(level='color', axis=1)
.dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }
.dataframe tbody tr th {
    vertical-align: top;
}
.dataframe thead th {
    text-align: right;
}
| color | Green | Red | |
|---|---|---|---|
| key1 | key2 | ||
| a | 1 | 2 | 1 | 
| 2 | 8 | 4 | |
| b | 1 | 14 | 7 | 
| 2 | 20 | 10 | 
Under the hood, this utilizes(利用) pandas's groupby machinery, which will be discussed in more detail later in the book.
将DF某列值作为行索引
It's not unusual(不寻常的) to want to use one or more columns from a DataFrame as the row index; alternatively, you may wish to move the row index into the DataFrame's columns. Here' an example DataFrame:
想要使用DataFrame中的一个或多个列作为行索引并不罕见; 或者,您可能希望将行索引移动到DataFrame的列中。 这是一个示例DataFrame:
frame = pd.DataFrame({
    'a': range(7),
    'b': range(7, 0, -1),
    'c':"one,one,one,two,two,two,two".split(','),  # cj
    'd':[0, 1, 2, 0, 1, 2, 3]
})
frame
.dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }
.dataframe tbody tr th {
    vertical-align: top;
}
.dataframe thead th {
    text-align: right;
}
| a | b | c | d | |
|---|---|---|---|---|
| 0 | 0 | 7 | one | 0 | 
| 1 | 1 | 6 | one | 1 | 
| 2 | 2 | 5 | one | 2 | 
| 3 | 3 | 4 | two | 0 | 
| 4 | 4 | 3 | two | 1 | 
| 5 | 5 | 2 | two | 2 | 
| 6 | 6 | 1 | two | 3 | 
DataFrame's set_index function will create a new DataFrame using one or more of its columns as the index:
"将 c, d 列作为index, 同时去掉c, d"
frame2 = frame.set_index(['c', 'd']) 
frame2
'将 c, d 列作为index, 同时去掉c, d'
.dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }
.dataframe tbody tr th {
    vertical-align: top;
}
.dataframe thead th {
    text-align: right;
}
| a | b | ||
|---|---|---|---|
| c | d | ||
| one | 0 | 0 | 7 | 
| 1 | 1 | 6 | |
| 2 | 2 | 5 | |
| two | 0 | 3 | 4 | 
| 1 | 4 | 3 | |
| 2 | 5 | 2 | |
| 3 | 6 | 1 | 
By default the columns are removed from the DataFrame, though you can leave them in:
frame.set_index(['c', 'd'], drop=False)
.dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }
.dataframe tbody tr th {
    vertical-align: top;
}
.dataframe thead th {
    text-align: right;
}
| a | b | c | d | ||
|---|---|---|---|---|---|
| c | d | ||||
| one | 0 | 0 | 7 | one | 0 | 
| 1 | 1 | 6 | one | 1 | |
| 2 | 2 | 5 | one | 2 | |
| two | 0 | 3 | 4 | two | 0 | 
| 1 | 4 | 3 | two | 1 | |
| 2 | 5 | 2 | two | 2 | |
| 3 | 6 | 1 | two | 3 | 
reset_index, on the other hand, does the opposite of set_index; the hierachical index levels are moved into the columns:
frame2
.dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }
.dataframe tbody tr th {
    vertical-align: top;
}
.dataframe thead th {
    text-align: right;
}
| a | b | ||
|---|---|---|---|
| c | d | ||
| one | 0 | 0 | 7 | 
| 1 | 1 | 6 | |
| 2 | 2 | 5 | |
| two | 0 | 3 | 4 | 
| 1 | 4 | 3 | |
| 2 | 5 | 2 | |
| 3 | 6 | 1 | 
"将多层index给还原到列去..."
frame2.reset_index()
'将多层index给还原到列去...'
.dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }
.dataframe tbody tr th {
    vertical-align: top;
}
.dataframe thead th {
    text-align: right;
}
| c | d | a | b | |
|---|---|---|---|---|
| 0 | one | 0 | 0 | 7 | 
| 1 | one | 1 | 1 | 6 | 
| 2 | one | 2 | 2 | 5 | 
| 3 | two | 0 | 3 | 4 | 
| 4 | two | 1 | 4 | 3 | 
| 5 | two | 2 | 5 | 2 | 
| 6 | two | 3 | 6 | 1 | 
# cj test
time.clock()
6e-07
def f(x, l=[]):
    for i in range(x):
        l.append(i*i)
    print(l)
f(2)
f(3, [3,2,1])
f(3)
[0, 1]
[3, 2, 1, 0, 1, 4]
[0, 1, 0, 1, 4]												
											pandas 之 多层索引的更多相关文章
- pandas:多层索引
		
多层索引是指在行或者列轴上有两个及以上级别的索引,一般表示一个数据的几个分项. 1.创建多层索引 1.1通过分组产生多层索引 1.2由序列创建 1.3由元组创建 1.4可迭代对象的笛卡尔积 1.5将D ...
 - pandas学习(创建多层索引、数据重塑与轴向旋转)
		
pandas学习(创建多层索引.数据重塑与轴向旋转) 目录 创建多层索引 数据重塑与轴向旋转 创建多层索引 隐式构造 Series 最常见的方法是给DataFrame构造函数的index参数传递两个或 ...
 - 8 pandas模块,多层索引
		
1 创建多层索引 1)隐式构造 最常见的方法是给DataFrame构造函数的index参数传递两个或更多的数组 · Series也可以创建多层索引 ...
 - pandas中层次化索引与切片
		
Pandas层次化索引 1. 创建多层索引 隐式索引: 常见的方式是给dataframe构造函数的index参数传递两个或是多个数组 Series也可以创建多层索引 Series多层索引 B =Ser ...
 - pandas基础用法——索引
		
# -*- coding: utf-8 -*- # Time : 2016/11/28 15:14 # Author : XiaoDeng # version : python3.5 # Softwa ...
 - pandas 之 时间序列索引
		
import numpy as np import pandas as pd 引入 A basic kind of time series object in pandas is a Series i ...
 - Pandas | 08 重建索引
		
重新索引会更改DataFrame的行标签和列标签. 可以通过索引来实现多个操作: 重新排序现有数据以匹配一组新的标签. 在没有标签数据的标签位置插入缺失值(NA)标记. import pandas a ...
 - numpy和pandas的基础索引切片
		
Numpy的索引切片 索引 In [72]: arr = np.array([[[1,1,1],[2,2,2]],[[3,3,3],[4,4,4]]]) In [73]: arr Out[73]: a ...
 - Lesson8——Pandas reindex重置索引
		
pandas目录 1 简介 重置索引(reindex)可以更改原 DataFrame 的行标签或列标签,并使更改后的行.列标签与 DataFrame 中的数据逐一匹配.通过重置索引操作,您可以完成对现 ...
 
随机推荐
- vue文件夹上传组件选哪个好?
			
一. 功能性需求与非功能性需求 要求操作便利,一次选择多个文件和文件夹进行上传:支持PC端全平台操作系统,Windows,Linux,Mac 支持文件和文件夹的批量下载,断点续传.刷新页面后继续传输. ...
 - 5-STM32物联网开发WIFI(ESP8266)+GPRS(Air202)系统方案安全篇(配置MQTT的SSL证书,验证安全通信)
			
4-STM32物联网开发WIFI(ESP8266)+GPRS(Air202)系统方案安全篇(为域名申请SSl证书) 前面的准备工作终于完了 复制这两个证书 放到云端MQTT的这个位置,其实放哪里都可以 ...
 - 三天精通Vue--学前摘要
			
Vue Vue是一个前端框架,中文学习教程https://cn.vuejs.org/v2/guide/components.html 学习的前提:一点的 HTML+CSS+js node.js是前端的 ...
 - SqlServer事务语法及使用方法(转)
			
原博:http://blog.csdn.net/xiaouncle/article/details/52891563 事务是关于原子性的.原子性的概念是指可以把一些事情当做一个不可分割的单元来看待.从 ...
 - Java编程思想之十四 类型信息
			
第十四章 类型信息 运行时类型信息使得你可以在程序运行时发现和使用类型信息 14.1 为什么需要RTTI 面向对象编程中基本的目的是:让代码只操作对基类的引用. 多态: import java.uti ...
 - javascript中的each遍历
			
each的用法 1.数组中的each 复制代码 var arr = [ "one", "two", "three", "four ...
 - docker 学习操作记录 3
			
记录3 [BEGIN] // :: Last :: from 192.168.114.1 root@coder:~# man addgroup ADDUSER() System Manager's M ...
 - Linux内核klist链表分析
			
1.前言 在Linux内核的源码中,除了简洁的list链表外,内核还有klist链表,它是list链表的线程安全版本,在结构体中提供了整个链表的自旋锁,对链表节点查找.插入和删除等操作,都需要先获得这 ...
 - idea 全局内容搜索和替换
			
在做项目时,有时会在整个项目里或指定文件夹下进行全局搜索和替换,这是一个很方便功能.使用方法如下: 一.全局搜索1.使用快捷键Ctrl+Shift+F打开搜索窗口,或者通过点击Edit–>Fin ...
 - windows版mysql安装
			
https://blog.csdn.net/ycxzuoxin/article/details/80908447