Pandas选择与索引
Series和DataFrame对象与Numpy数组和标准Python字典的数据索引和选择模式一样。
字典形式选择和索引
Series
In [1]: import pandas as pd
In [2]: data = pd.Series([0.25,0.5,0.75,1.0],index=['a','b','c','d'])
In [3]: data
Out[3]:
a 0.25
b 0.50
c 0.75
d 1.00
dtype: float64
#用判断字典键的方法,判断某字段是否包含在Series的Index对象内
In [4]: 'a' in data
Out[4]: True
#用获取字典键的方法,获取Index对象
In [6]: data.keys()
Out[6]: Index(['a', 'b', 'c', 'd'], dtype='object')
#用获取字典键值对象的方法,获取Index/value映射对象
In [9]: list(data.items())
Out[9]: [('a', 0.25), ('b', 0.5), ('c', 0.75), ('d', 1.0)]
#用添加字典键值对象的方法,添加Series数据
In [10]: data['e'] = 2
In [11]: data
Out[11]:
a 0.25
b 0.50
c 0.75
d 1.00
e 2.00
dtype: float64
DataFrame:相当于若干个Series对象构成的字典
In [13]: area = pd.Series({'California':43231,'Texas':53432,'New York':67632,
'Florida':89734,'Illinois':63321})
In [14]: pop = pd.Series({'California':453231,'Texas':533432,'New York':676732,
'Florida':897534,'Illinois':633241})
#以Series对象为字典对象进行创建DataFrame对象
In [15]: data = pd.DataFrame({'area':area,'pop':pop})
In [16]: data
Out[16]:
area pop
California 43231 453231
Texas 53432 533432
New York 67632 676732
Florida 89734 897534
Illinois 63321 633241
#获取某列(相当于获取字典某个键值)
In [17]: data['area']
Out[17]:
California 43231
Texas 53432
New York 67632
Florida 89734
Illinois 63321
Name: area, dtype: int64
#以属性形式获取某列
In [18]: data.area
Out[18]:
California 43231
Texas 53432
New York 67632
Florida 89734
Illinois 63321
Name: area, dtype: int64
#以字典形式赋值一个新列
In [19]: data['density'] = data['pop']/data['area']
In [20]: data
Out[20]:
area pop density
California 43231 453231 10.483935
Texas 53432 533432 9.983381
New York 67632 676732 10.006092
Florida 89734 897534 10.002162
Illinois 63321 633241 10.000490
数组形式选择和索引
Series相当于Numpy的一维数组
In [48]: data
Out[48]:
a 0.25
b 0.50
c 0.75
d 1.00
dtype: float64
#以显式索引进行切片,包含最后一个索引对应的值
In [49]: data['a':'c']
Out[49]:
a 0.25
b 0.50
c 0.75
dtype: float64
#以隐式索引进行切片,不包含最后一个索引对应的值
In [50]: data[0:2]
Out[50]:
a 0.25
b 0.50
dtype: float64
#利用布尔索引进行取值
In [51]: data[(data > 0.3) & (data < 0.8)]
Out[51]:
b 0.50
c 0.75
dtype: float64
#利用花哨索引进行取值
In [52]: data[['a','d']]
Out[52]:
a 0.25
d 1.00
dtype: float64
DataFrame相当于Numpy的二维数组
In [22]: data
Out[22]:
area pop density
California 43231 453231 10.483935
Texas 53432 533432 9.983381
New York 67632 676732 10.006092
Florida 89734 897534 10.002162
Illinois 63321 633241 10.000490
#获取DataFrame的数组值,返回的是Numpy二维数组
In [23]: data.values
Out[23]:
array([[4.32310000e+04, 4.53231000e+05, 1.04839351e+01],
[5.34320000e+04, 5.33432000e+05, 9.98338075e+00],
[6.76320000e+04, 6.76732000e+05, 1.00060918e+01],
[8.97340000e+04, 8.97534000e+05, 1.00021619e+01],
[6.33210000e+04, 6.33241000e+05, 1.00004896e+01]])
#对DataFrame的数组进行装置,相当于Numpy二维数组的转置
In [24]: data.T
Out[24]:
California Texas New York Florida Illinois
area 43231.000000 53432.000000 67632.000000 89734.000000 63321.00000
pop 453231.000000 533432.000000 676732.000000 897534.000000 633241.00000
density 10.483935 9.983381 10.006092 10.002162 10.00049
#获得某一行的数组(方法同Numpy数组)
In [25]: data.values[0]
Out[25]: array([4.32310000e+04, 4.53231000e+05, 1.04839351e+01])
#布尔索引
In [27]: data[data > 10]
Out[27]:
area pop density
California 43231 453231 10.483935
Texas 53432 533432 NaN
New York 67632 676732 10.006092
Florida 89734 897534 10.002162
Illinois 63321 633241 10.000490
In [28]: data[data['density']>10]
Out[28]:
area pop density
California 43231 453231 10.483935
New York 67632 676732 10.006092
Florida 89734 897534 10.002162
Illinois 63321 633241 10.000490
索引选择器(iloc 和 loc)
为了避免在Index和Columns为整数时,造成切片和取值时的混乱,Panda提供了一些索引器(indexer)属性来进行取值。
iloc:隐私索引(顺序整数)
#Series
In [53]: data
Out[53]:
a 0.25
b 0.50
c 0.75
d 1.00
dtype: float64
In [54]: data.iloc[0:2]
Out[54]:
a 0.25
b 0.50
dtype: float64
In [55]: data.iloc[2]
Out[55]: 0.75
#DataFrame
In [22]: data
Out[22]:
area pop density
California 43231 453231 10.483935
Texas 53432 533432 9.983381
New York 67632 676732 10.006092
Florida 89734 897534 10.002162
Illinois 63321 633241 10.000490
In [31]: data.iloc[:3,:2]
Out[31]:
area pop
California 43231 453231
Texas 53432 533432
New York 67632 676732
In [32]: data.iloc[0,2] = 100
In [33]: data
Out[33]:
area pop density
California 43231 453231 100.000000
Texas 53432 533432 9.983381
New York 67632 676732 10.006092
Florida 89734 897534 10.002162
Illinois 63321 633241 10.000490
In [34]: data.iloc[3,2]
Out[34]: 10.002161945305012
In [35]: data.iloc[:3,2]
Out[35]:
California 100.000000
Texas 9.983381
New York 10.006092
Name: density, dtype: float64
In [36]: data.iloc[3,:2]
Out[36]:
area 89734.0
pop 897534.0
Name: Florida, dtype: float64
#切出的是DataFrame结构
In [37]: data.iloc[2:3,:2]
Out[37]:
area pop
New York 67632 676732
loc:显示索引(自定义)
#Series
In [53]: data
Out[53]:
a 0.25
b 0.50
c 0.75
d 1.00
dtype: float64
In [56]: data.loc['b']
Out[56]: 0.5
In [57]: data.loc['a':'b']
Out[57]:
a 0.25
b 0.50
dtype: float64
In [58]: data.loc[['a','c']]
Out[58]:
a 0.25
c 0.75
dtype: float64
#DataFrame
In [38]: data
Out[38]:
area pop density
California 43231 453231 100.000000
Texas 53432 533432 9.983381
New York 67632 676732 10.006092
Florida 89734 897534 10.002162
Illinois 63321 633241 10.000490
#显性索引切片时与整数切片不同,最后一个切片位置也要包含在内
In [39]: data['California':'Texas']
Out[39]:
area pop density
California 43231 453231 100.000000
Texas 53432 533432 9.983381
In [40]: data.loc['California':'Texas']
Out[40]:
area pop density
California 43231 453231 100.000000
Texas 53432 533432 9.983381
In [41]: data.loc[data['density']>10,['area','density']]
Out[41]:
area density
California 43231 100.000000
New York 67632 10.006092
Florida 89734 10.002162
Illinois 63321 10.000490
In [44]: data.loc[:,['pop']]
Out[44]:
pop
California 453231
Texas 533432
New York 676732
Florida 897534
Illinois 633241
多级索引应用(Hierarchical Indexing)
为解决高维数组表达的需求而引入多级索引(Hierarchical indexing)。Pandas的MultiIndex类型提供了丰富的操作手法。
多级索引创建
#直接创建,index写成多级模式,MultiIndex在后台自动创建
In [3]: data = pd.DataFrame(np.random.rand(4,2),
index = [['a','a','b','b'],[1,2,1,2]],
columns=['data1','data2'])
In [4]: data
Out[4]:
data1 data2
a 1 0.154166 0.773928
2 0.859236 0.777436
b 1 0.468589 0.010060
2 0.849230 0.929585
#使用元组字典进行创建,自动把元组元素分割转换成MultiIndex
In [5]: data = {('California',2000):3334323,('California',2010):33444332,
('Texas',2000):5555999,('Texas',2010):7778839}
In [6]: pd.Series(data)
Out[6]:
California 2000 3334323
2010 33444332
Texas 2000 5555999
2010 7778839
dtype: int64
#使用列表进行创建
In [59]: pd.MultiIndex.from_arrays([['a','a','b','b'],[1,2,1,2]])
Out[59]:
MultiIndex([('a', 1),
('a', 2),
('b', 1),
('b', 2)],)
#使用元组进行创建
In [60]: pd.MultiIndex.from_tuples([('a',1),('a',2),('b',1),('b',2)])
Out[60]:
MultiIndex([('a', 1),
('a', 2),
('b', 1),
('b', 2)],
)
#使用笛卡尔积进行创建
In [62]: pd.MultiIndex.from_product([['a','b'],[1,2]])
Out[62]:
MultiIndex([('a', 1),
('a', 2),
('b', 1),
('b', 2)],
)
#直接利用levels和labels(codes)进行创建
In [63]: pd.MultiIndex(levels=[['a','b'],[1,2]],labels=[[0,0,1,1],[0,1,0,1]])
D:\Anaconda\install\Scripts\ipython:1: FutureWarning: the 'labels' keyword is deprecated, use 'codes' instead
Out[63]:
MultiIndex([('a', 1),
('a', 2),
('b', 1),
('b', 2)],
)
In [64]: pd.MultiIndex(levels=[['a','b'],[1,2]],codes=[[0,0,1,1],[0,1,0,1]])
Out[64]:
MultiIndex([('a', 1),
('a', 2),
('b', 1),
('b', 2)],
)
多级索引取值与切片
Series
In [1]: import pandas as pd
In [2]: pop_index = pd.MultiIndex.from_product([['California','Texas','New York'],
['2000','2010']])
In [3]: data = [23343,33453,78893,34569,56493,89392]
In [4]: pop = pd.Series(data, index=pop_index, name='population')
In [5]: pop
Out[5]:
California 2000 23343
2010 33453
Texas 2000 78893
2010 34569
New York 2000 56493
2010 89392
Name: population, dtype: int64
#添加索引名称
In [6]: pop.index.names = ['state','year']
In [7]: pop
Out[7]:
state year
California 2000 23343
2010 33453
Texas 2000 78893
2010 34569
New York 2000 56493
2010 89392
Name: population, dtype: int64
In [23]: pop
Out[23]:
state year
California 2000 23343
2010 33453
Texas 2000 78893
2010 34569
New York 2000 56493
2010 89392
Name: population, dtype: int64
#利用显性索引进行局部切片时,需要MultiIndex按顺序排列的(利用sort_index函数实现)
In [25]: pop
Out[25]:
state year
California 2000 23343
2010 33453
Texas 2000 78893
2010 34569
New York 2000 56493
2010 89392
Name: population, dtype: int64
In [26]: pop = pop.sort_index()
In [27]: pop
Out[27]:
state year
California 2000 23343
2010 33453
New York 2000 56493
2010 89392
Texas 2000 78893
2010 34569
Name: population, dtype: int64
#显性索引局部切片
In [28]: pop.loc['California':'New York']
Out[28]:
state year
California 2000 23343
2010 33453
New York 2000 56493
2010 89392
Name: population, dtype: int64
In [29]: pop.loc['California':'New York','2000']
Out[29]:
state year
California 2000 23343
New York 2000 56493
Name: population, dtype: int64
#显性+花哨索引取值
In [30]: pop.loc[['California','Texas']]
Out[30]:
state year
California 2000 23343
2010 33453
Texas 2000 78893
2010 34569
Name: population, dtype: int64
#隐性索引局部切片
In [32]: pop.iloc[:2]
Out[32]:
state year
California 2000 23343
2010 33453
Name: population, dtype: int64
#布尔索引取值
In [33]: pop[pop > 30000]
Out[33]:
state year
California 2010 33453
New York 2000 56493
2010 89392
Texas 2000 78893
2010 34569
Name: population, dtype: int64
DataFrame
In [1]: import pandas as pd
In [4]: import numpy as np
#行标签
In [2]: index = pd.MultiIndex.from_product([[2019,2020],[1,2,3,4]],
names=['year','quarter'])
#列标签
In [3]: columns = pd.MultiIndex.from_product([['Bill','Andy','Coco'],
['Temp','HR']],names=['name','type'])
#获取模拟数据
In [12]: data = np.round(np.random.randn(8,6),1)
In [15]: data[:,::2] *10
Out[15]:
array([[ 8., -5., -10.],
[ 4., 22., 4.],
[ 14., -19., -18.],
[ 8., -1., -9.],
[ 2., -5., 4.],
[ 17., 7., 4.],
[ 0., 3., 2.],
[ 3., 11., -11.]])
In [17]: data += 37
In [18]: data
Out[18]:
array([[45. , 35.4, 32. , 38.1, 27. , 37.7],
[41. , 36.9, 59. , 35.4, 41. , 37.1],
[51. , 37.1, 18. , 37.5, 19. , 36.7],
[45. , 37.6, 36. , 37.6, 28. , 37.5],
[39. , 35.9, 32. , 37.2, 41. , 38. ],
[54. , 36.8, 44. , 38.7, 41. , 36.9],
[37. , 37. , 40. , 35.1, 39. , 37.9],
[40. , 37.5, 48. , 37.5, 26. , 35.7]])
#创建带行列标签的DataFrame
In [19]: health_data = pd.DataFrame(data, index=index, columns=columns)
In [20]: health_data
Out[20]:
name Bill Andy Coco
type Temp HR Temp HR Temp HR
year quarter
2019 1 45.0 35.4 32.0 38.1 27.0 37.7
2 41.0 36.9 59.0 35.4 41.0 37.1
3 51.0 37.1 18.0 37.5 19.0 36.7
4 45.0 37.6 36.0 37.6 28.0 37.5
2020 1 39.0 35.9 32.0 37.2 41.0 38.0
2 54.0 36.8 44.0 38.7 41.0 36.9
3 37.0 37.0 40.0 35.1 39.0 37.9
4 40.0 37.5 48.0 37.5 26.0 35.7
#直接根据标签名称索引获取值
In [21]: health_data['Bill']
Out[21]:
type Temp HR
year quarter
2019 1 45.0 35.4
2 41.0 36.9
3 51.0 37.1
4 45.0 37.6
2020 1 39.0 35.9
2 54.0 36.8
3 37.0 37.0
4 40.0 37.5
In [22]: health_data['Bill', 'HR']
Out[22]:
year quarter
2019 1 35.4
2 36.9
3 37.1
4 37.6
2020 1 35.9
2 36.8
3 37.0
4 37.5
Name: (Bill, HR), dtype: float64
#隐性索引获取值
In [24]: health_data.iloc[:2,:2]
Out[24]:
name Bill
type Temp HR
year quarter
2019 1 45.0 35.4
2 41.0 36.9
#显性索引获取值
In [23]: health_data.loc[:,('Bill','HR')]
Out[23]:
year quarter
2019 1 35.4
2 36.9
3 37.1
4 37.6
2020 1 35.9
2 36.8
3 37.0
4 37.5
Name: (Bill, HR), dtype: float64
#使用IndexSlice对象对显性索引进行分割
In [25]: idx = pd.IndexSlice
In [28]: health_data.loc[idx[:,1],idx[:,'HR']]
Out[28]:
name Bill Andy Coco
type HR HR HR
year quarter
2019 1 35.4 38.1 37.7
2020 1 35.9 37.2 38.0
In [29]: health_data.loc[idx[:,2],idx[:,'HR']]
Out[29]:
name Bill Andy Coco
type HR HR HR
year quarter
2019 2 36.9 35.4 37.1
2020 2 36.8 38.7 36.9
多级索引排序和转换
In [18]: frame = pd.DataFrame(np.arange(12).reshape((4,3)),
...: index=[list('aabb'),list('1212')],
...: columns = [['Ohio','Ohio','Colorado'],
...: ['Green','Red','Green']])
In [19]: frame
Out[19]:
Ohio Colorado
Green Red Green
a 1 0 1 2
2 3 4 5
b 1 6 7 8
2 9 10 11
In [20]: frame.index.names = ['key1','key2']
In [21]: frame.columns.names = ['state','color']
In [22]: frame
Out[22]:
state Ohio Colorado
color Green Red Green
key1 key2
a 1 0 1 2
2 3 4 5
b 1 6 7 8
2 9 10 11
#交换Index
In [26]: frame.swaplevel('key1','key2')
Out[26]:
state Ohio Colorado
color Green Red Green
key2 key1
1 a 0 1 2
2 a 3 4 5
1 b 6 7 8
2 b 9 10 11
In [27]: frame
Out[27]:
state Ohio Colorado
color Green Red Green
key1 key2
a 1 0 1 2
2 3 4 5
b 1 6 7 8
2 9 10 11
In [28]: frame.sort_index(level=1)
Out[28]:
state Ohio Colorado
color Green Red Green
key1 key2
a 1 0 1 2
b 1 6 7 8
a 2 3 4 5
b 2 9 10 11
In [29]: frame.swaplevel(0,1).sort_index(level=0)
Out[29]:
state Ohio Colorado
color Green Red Green
key2 key1
1 a 0 1 2
b 6 7 8
2 a 3 4 5
b 9 10 11
多级索引行列转换和重置(stack、unstack、set_index、reset_index)
Series
In [6]: pop
Out[6]:
state year
California 2000 23343
2010 33453
Texas 2000 78893
2010 34569
New York 2000 56493
2010 89392
Name: population, dtype: int64
#通过level参数设置转换索引的层级
In [7]: pop.unstack(level=0)
Out[7]:
state California New York Texas
year
2000 23343 56493 78893
2010 33453 89392 34569
In [8]: pop.unstack(level=1)
Out[8]:
year 2000 2010
state
California 23343 33453
New York 56493 89392
Texas 78893 34569
In [9]: pop.unstack(level='year')
Out[9]:
year 2000 2010
state
California 23343 33453
New York 56493 89392
Texas 78893 34569
In [10]: pop.unstack(level='year').stack(level='year')
Out[10]:
state year
California 2000 23343
2010 33453
New York 2000 56493
2010 89392
Texas 2000 78893
2010 34569
dtype: int64
#reset_index()把层级索引变成列标签
In [11]: pop_flat = pop.reset_index(name='population')
In [12]: pop_flat
Out[12]:
state year population
0 California 2000 23343
1 California 2010 33453
2 Texas 2000 78893
3 Texas 2010 34569
4 New York 2000 56493
5 New York 2010 89392
#set_index()把列标签设置成层级索引
In [13]: pop_flat.set_index(['state','year'])
Out[13]:
population
state year
California 2000 23343
2010 33453
Texas 2000 78893
2010 34569
New York 2000 56493
2010 89392
DataFrame
In [30]: health_data
Out[30]:
name Bill Andy Coco
type Temp HR Temp HR Temp HR
year quarter
2019 1 45.0 35.4 32.0 38.1 27.0 37.7
2 41.0 36.9 59.0 35.4 41.0 37.1
3 51.0 37.1 18.0 37.5 19.0 36.7
4 45.0 37.6 36.0 37.6 28.0 37.5
2020 1 39.0 35.9 32.0 37.2 41.0 38.0
2 54.0 36.8 44.0 38.7 41.0 36.9
3 37.0 37.0 40.0 35.1 39.0 37.9
4 40.0 37.5 48.0 37.5 26.0 35.7
#从列堆积到行:stack(level=*),*表示索引的序号,用隐性索引则自上而下从0开始索引,用显性索引可直接赋值索引名称
#默认level=1
In [31]: health_data.stack()
Out[31]:
name Andy Bill Coco
year quarter type
2019 1 HR 38.1 35.4 37.7
Temp 32.0 45.0 27.0
2 HR 35.4 36.9 37.1
Temp 59.0 41.0 41.0
3 HR 37.5 37.1 36.7
Temp 18.0 51.0 19.0
4 HR 37.6 37.6 37.5
Temp 36.0 45.0 28.0
2020 1 HR 37.2 35.9 38.0
Temp 32.0 39.0 41.0
2 HR 38.7 36.8 36.9
Temp 44.0 54.0 41.0
3 HR 35.1 37.0 37.9
Temp 40.0 37.0 39.0
4 HR 37.5 37.5 35.7
Temp 48.0 40.0 26.0
In [33]: health_data.stack(level=1)
Out[33]:
name Andy Bill Coco
year quarter type
2019 1 HR 38.1 35.4 37.7
Temp 32.0 45.0 27.0
2 HR 35.4 36.9 37.1
Temp 59.0 41.0 41.0
3 HR 37.5 37.1 36.7
Temp 18.0 51.0 19.0
4 HR 37.6 37.6 37.5
Temp 36.0 45.0 28.0
2020 1 HR 37.2 35.9 38.0
Temp 32.0 39.0 41.0
2 HR 38.7 36.8 36.9
Temp 44.0 54.0 41.0
3 HR 35.1 37.0 37.9
Temp 40.0 37.0 39.0
4 HR 37.5 37.5 35.7
Temp 48.0 40.0 26.0
In [32]: health_data.stack(level=0)
Out[32]:
type HR Temp
year quarter name
2019 1 Andy 38.1 32.0
Bill 35.4 45.0
Coco 37.7 27.0
2 Andy 35.4 59.0
Bill 36.9 41.0
Coco 37.1 41.0
3 Andy 37.5 18.0
Bill 37.1 51.0
Coco 36.7 19.0
4 Andy 37.6 36.0
Bill 37.6 45.0
Coco 37.5 28.0
2020 1 Andy 37.2 32.0
Bill 35.9 39.0
Coco 38.0 41.0
2 Andy 38.7 44.0
Bill 36.8 54.0
Coco 36.9 41.0
3 Andy 35.1 40.0
Bill 37.0 37.0
Coco 37.9 39.0
4 Andy 37.5 48.0
Bill 37.5 40.0
Coco 35.7 26.0
In [38]: health_data.stack(level='name')
Out[38]:
type HR Temp
year quarter name
2019 1 Andy 38.1 32.0
Bill 35.4 45.0
Coco 37.7 27.0
2 Andy 35.4 59.0
Bill 36.9 41.0
Coco 37.1 41.0
3 Andy 37.5 18.0
Bill 37.1 51.0
Coco 36.7 19.0
4 Andy 37.6 36.0
Bill 37.6 45.0
Coco 37.5 28.0
2020 1 Andy 37.2 32.0
Bill 35.9 39.0
Coco 38.0 41.0
2 Andy 38.7 44.0
Bill 36.8 54.0
Coco 36.9 41.0
3 Andy 35.1 40.0
Bill 37.0 37.0
Coco 37.9 39.0
4 Andy 37.5 48.0
Bill 37.5 40.0
Coco 35.7 26.0
#从行散开到列:unstack(level=*),*表示索引的序号,用隐性索引则自左而右从0开始索引,用显性索引可直接赋值索引名称
#默认level=1
In [35]: health_data.unstack(level=0)
Out[35]:
name Bill Andy Coco
type Temp HR Temp HR Temp HR
year 2019 2020 2019 2020 2019 2020 2019 2020 2019 2020 2019 2020
quarter
1 45.0 39.0 35.4 35.9 32.0 32.0 38.1 37.2 27.0 41.0 37.7 38.0
2 41.0 54.0 36.9 36.8 59.0 44.0 35.4 38.7 41.0 41.0 37.1 36.9
3 51.0 37.0 37.1 37.0 18.0 40.0 37.5 35.1 19.0 39.0 36.7 37.9
4 45.0 40.0 37.6 37.5 36.0 48.0 37.6 37.5 28.0 26.0 37.5 35.7
In [36]: health_data.unstack(level=1)
Out[36]:
name Bill Andy ... Coco
type Temp HR Temp ... HR Temp HR
quarter 1 2 3 4 1 2 3 4 1 ... 4 1 2 3 4 1 2 3 4
year ...
2019 45.0 41.0 51.0 45.0 35.4 36.9 37.1 37.6 32.0 ... 37.6 27.0 41.0 19.0 28.0 37.7 37.1 36.7 37.5
2020 39.0 54.0 37.0 40.0 35.9 36.8 37.0 37.5 32.0 ... 37.5 41.0 41.0 39.0 26.0 38.0 36.9 37.9 35.7
[2 rows x 24 columns]
In [39]: health_data.unstack(level='year')
Out[39]:
name Bill Andy Coco
type Temp HR Temp HR Temp HR
year 2019 2020 2019 2020 2019 2020 2019 2020 2019 2020 2019 2020
quarter
1 45.0 39.0 35.4 35.9 32.0 32.0 38.1 37.2 27.0 41.0 37.7 38.0
2 41.0 54.0 36.9 36.8 59.0 44.0 35.4 38.7 41.0 41.0 37.1 36.9
3 51.0 37.0 37.1 37.0 18.0 40.0 37.5 35.1 19.0 39.0 36.7 37.9
4 45.0 40.0 37.6 37.5 36.0 48.0 37.6 37.5 28.0 26.0 37.5 35.7
多级索引数据统计方法
In [23]: health_data
Out[23]:
name Bill Andy Coco
type Temp HR Temp HR Temp HR
year quarter
2019 1 37.1 35.3 37.2 37.5 36.2 37.6
2 36.9 37.0 37.8 35.9 37.5 36.6
3 36.8 34.7 35.4 37.5 37.0 38.2
4 36.8 37.6 36.2 37.4 36.4 36.7
2020 1 37.5 34.8 36.8 36.5 36.3 37.4
2 36.2 36.7 37.3 37.2 36.3 36.4
3 37.7 38.1 36.9 37.7 36.6 36.1
4 36.5 36.3 37.6 36.3 35.5 37.3
#level参数设置累计的索引层级
In [24]: data_mean = health_data.mean(level='year')
In [25]: data_mean
Out[25]:
name Bill Andy Coco
type Temp HR Temp HR Temp HR
year
2019 36.900 36.150 36.65 37.075 36.775 37.275
2020 36.975 36.475 37.15 36.925 36.175 36.800
#axis参数设置累计的轴
In [26]: data_mean.mean(axis=1,level='type')
Out[26]:
type Temp HR
year
2019 36.775000 36.833333
2020 36.766667 36.733333
Pandas选择与索引的更多相关文章
- 如何选择普通索引和唯一索引《死磕MySQL系列 五》
系列文章 一.原来一条select语句在MySQL是这样执行的<死磕MySQL系列 一> 二.一生挚友redo log.binlog<死磕MySQL系列 二> 三.MySQL强 ...
- pandas选择数据-【老鱼学pandas】
选择列 根据列名来选择某列的数据 import pandas as pd import numpy as np dates = pd.date_range("2017-01-08" ...
- pandas选择单元格,选择行列
首先创建示例df: df = pd.DataFrame(np.arange(16).reshape(4, 4), columns=list('ABCD'), index=list('5678')) d ...
- pandas 之 多层索引
In many applications, data may be spread across a number of files or datasets or be arranged in a fo ...
- numpy和pandas的基础索引切片
Numpy的索引切片 索引 In [72]: arr = np.array([[[1,1,1],[2,2,2]],[[3,3,3],[4,4,4]]]) In [73]: arr Out[73]: a ...
- 利用Python进行数据分析(11) pandas基础: 层次化索引
层次化索引 层次化索引指你能在一个数组上拥有多个索引,例如: 有点像Excel里的合并单元格对么? 根据索引选择数据子集 以外层索引的方式选择数据子集: 以内层索引的方式选择数据: 多重索引S ...
- 深入学习jQuery选择器系列第三篇——过滤选择器之索引选择器
× 目录 [1]通用形式 [2]首尾索引 [3]奇偶索引[4]范围索引 前面的话 上一篇介绍了过滤选择器中的子元素选择器部分,本文开始介绍极易与之混淆的索引选择器 通用形式 $(':eq(index) ...
- 但从谈论性能点SQL Server选择聚集索引键
简单介绍 在SQL Server中,数据是按页进行存放的.而为表加上聚集索引后,SQL Server对于数据的查找就是依照聚集索引的列作为keyword进行了. 因此对于聚集索引的选择对性能的影响就变 ...
- pandas 选择列或者添加列生成新的DataFrame
选择某些列 import pandas as pd # 从Excel中读取数据,生成DataFrame数据 # 导入Excel路径和sheet name df = pd.read_excel(exce ...
- pandas基础用法——索引
# -*- coding: utf-8 -*- # Time : 2016/11/28 15:14 # Author : XiaoDeng # version : python3.5 # Softwa ...
随机推荐
- 【MongoDB】C# .Net MongoDB常用语法
1.1.驱动安装 使用NuGet包管理器安装MongoDB C#驱动:MongoDB.Driver 1.2. C#连接MongoDB //获取MongoDB连接客户端 MongoClient clie ...
- Java面向对象之内部类的几类使用场景
介绍 Java内部类是一种特殊的类,它定义在另一个类的内部.内部类提供了许多有用的特性,包括访问外部类的私有成员.隐藏实现细节以及实现回调接口等.以下是Java内部类的一些常用场景及其举例说明: 回调 ...
- 【Azure 应用程序见解】通过Azure Funciton的门户启用Application Insights后,Application Insights无法收到监控数据
问题描述 比较早期创建的Azure Funciton服务,近期发现在门户中已经启用了Application Insights功能,但是正确配置Applicaiton Insights后,却无法获取关联 ...
- 【Azure Redis 缓存】Redis性能指标之Server Load
Server Load描述 在Redis的官方介绍中,Server Load指标是Redis 服务器忙于处理消息并且非空闲等待消息的周期百分比. 如果此计数器达到 100,则意味着 Redis 服务器 ...
- HashMap很美好,但线程不安全怎么办?ConcurrentHashMap告诉你答案!
写在开头 在<耗时2天,写完HashMap>这篇文章中,我们提到关于HashMap线程不安全的问题,主要存在如下3点风险: 风险1: put的时候导致元素丢失:如两个线程同时put,且ke ...
- Redis项目常见解决方案
## 1. 缓存预热 在项目启动,或者服务器重启后, 因为请求量较大, 此时对关系型数据库的访问量就有可能超标,导致服务卡顿,宕机, 所以在启动前应该对缓存进行预热: 前置准备工作: 日常例行统计数据 ...
- Java static 关键字的使用 小练习
1 package com.bytezreo.statictest2; 2 3 /** 4 * 5 * @Description static 关键字的使用 小练习 6 * @author Bytez ...
- 将本地文件上传到github仓库
将本地文件上传到github空仓库 本地使用git上传文件: 第一步:在需要的文件夹(文件夹里已经放了需要提交的内容)右击git bash,输入git init 第二步:将本地文件上传到本地git仓库 ...
- [MAUI 项目实战] 音乐播放器(二):播放内核
播放控制服务 IMusicControlService: 播放控制类,用于当前平台播放器对象的操作,对当前所播放曲目的暂停/播放,下一首/上一首,快进快退(寻迹),随机.单曲模式等功能的控制. 播放控 ...
- SoftCnKiller 更新程序 bat 调用vbs 更新,下载gitee文件 更新自身数据
bat @echo off title 更新流氓软件黑名单 cd /d "%~dp0" echo 请选择更新源,默认使用1.GitCode更新. echo.&choice ...