pandas-09 pd.groupby()的用法

在pandas中的groupby和在sql语句中的groupby有异曲同工之妙，不过也难怪，毕竟关系数据库中的存放数据的结构也是一张大表罢了，与dataframe的形式相似。

import numpy as np

import pandas as pd

from pandas import Series, DataFrame

df = pd.read_csv('./city_weather.csv')

print(df)

'''

          date city  temperature  wind

0   03/01/2016   BJ            8     5

1   17/01/2016   BJ           12     2

2   31/01/2016   BJ           19     2

3   14/02/2016   BJ           -3     3

4   28/02/2016   BJ           19     2

5   13/03/2016   BJ            5     3

6   27/03/2016   SH           -4     4

7   10/04/2016   SH           19     3

8   24/04/2016   SH           20     3

9   08/05/2016   SH           17     3

10  22/05/2016   SH            4     2

11  05/06/2016   SH          -10     4

12  19/06/2016   SH            0     5

13  03/07/2016   SH           -9     5

14  17/07/2016   GZ           10     2

15  31/07/2016   GZ           -1     5

16  14/08/2016   GZ            1     5

17  28/08/2016   GZ           25     4

18  11/09/2016   SZ           20     1

19  25/09/2016   SZ          -10     4

'''

g = df.groupby(df['city'])

# <pandas.core.groupby.groupby.DataFrameGroupBy object at 0x7f10450e12e8>

print(g.groups)

# {'BJ': Int64Index([0, 1, 2, 3, 4, 5], dtype='int64'),

# 'GZ': Int64Index([14, 15, 16, 17], dtype='int64'),

# 'SZ': Int64Index([18, 19], dtype='int64'),

# 'SH': Int64Index([6, 7, 8, 9, 10, 11, 12, 13], dtype='int64')}

print(g.size()) # g.size() 可以统计每个组 成员的 数量

'''

city

BJ    6

GZ    4

SH    8

SZ    2

dtype: int64

'''

print(g.get_group('BJ')) # 得到 某个 分组

'''

         date city  temperature  wind

0  03/01/2016   BJ            8     5

1  17/01/2016   BJ           12     2

2  31/01/2016   BJ           19     2

3  14/02/2016   BJ           -3     3

4  28/02/2016   BJ           19     2

5  13/03/2016   BJ            5     3

'''

df_bj = g.get_group('BJ')

print(df_bj.mean()) # 对这个 分组 求平均

'''

temperature    10.000000

wind            2.833333

dtype: float64

'''

# 直接使用 g 对象，求平均值

print(g.mean()) # 对 每一个 分组， 都计算分组

'''

      temperature      wind

city

BJ         10.000  2.833333

GZ          8.750  4.000000

SH          4.625  3.625000

SZ          5.000  2.500000

'''

print(g.max())

'''

            date  temperature  wind

city

BJ    31/01/2016           19     5

GZ    31/07/2016           25     5

SH    27/03/2016           20     5

SZ    25/09/2016           20     4

'''

print(g.min())

'''

            date  temperature  wind

city

BJ    03/01/2016           -3     2

GZ    14/08/2016           -1     2

SH    03/07/2016          -10     2

SZ    11/09/2016          -10     1

'''

# g 对象还可以使用 for 进行循环遍历

for name, group in g:

    print(name)

    print(group)

# g 可以转化为 list类型， dict类型

print(list(g)) # 元组第一个元素是 分组的label，第二个是dataframe

'''

[('BJ',          date city  temperature  wind

0  03/01/2016   BJ            8     5

1  17/01/2016   BJ           12     2

2  31/01/2016   BJ           19     2

3  14/02/2016   BJ           -3     3

4  28/02/2016   BJ           19     2

5  13/03/2016   BJ            5     3),

('GZ',           date city  temperature  wind

14  17/07/2016   GZ           10     2

15  31/07/2016   GZ           -1     5

16  14/08/2016   GZ            1     5

17  28/08/2016   GZ           25     4),

('SH',           date city  temperature  wind

6   27/03/2016   SH           -4     4

7   10/04/2016   SH           19     3

8   24/04/2016   SH           20     3

9   08/05/2016   SH           17     3

10  22/05/2016   SH            4     2

11  05/06/2016   SH          -10     4

12  19/06/2016   SH            0     5

13  03/07/2016   SH           -9     5),

('SZ',           date city  temperature  wind

18  11/09/2016   SZ           20     1

19  25/09/2016   SZ          -10     4)]

'''

print(dict(list(g))) # 返回键值对，值的类型是 dataframe

'''

{'SH':           date city  temperature  wind

6   27/03/2016   SH           -4     4

7   10/04/2016   SH           19     3

8   24/04/2016   SH           20     3

9   08/05/2016   SH           17     3

10  22/05/2016   SH            4     2

11  05/06/2016   SH          -10     4

12  19/06/2016   SH            0     5

13  03/07/2016   SH           -9     5,

'SZ':           date city  temperature  wind

18  11/09/2016   SZ           20     1

19  25/09/2016   SZ          -10     4,

'GZ':           date city  temperature  wind

14  17/07/2016   GZ           10     2

15  31/07/2016   GZ           -1     5

16  14/08/2016   GZ            1     5

17  28/08/2016   GZ           25     4,

'BJ':          date city  temperature  wind

0  03/01/2016   BJ            8     5

1  17/01/2016   BJ           12     2

2  31/01/2016   BJ           19     2

3  14/02/2016   BJ           -3     3

4  28/02/2016   BJ           19     2

5  13/03/2016   BJ            5     3}

'''

pandas-09 pd.groupby()的用法的更多相关文章

数据分析面试题之Pandas中的groupby
昨天晚上,笔者有幸参加了一场面试,有一个环节就是现场编程!题目如下: 示例数据如下,求每名学生(ID)对应的成绩(score)最高的那门科目(class)与ID,用Python实现: 这个题目 ...
pandas pivot_table或者groupby实现sql 中的count distinct 功能
pandas pivot_table或者groupby实现sql 中的count distinct 功能 import pandas as pd import numpy as np data = p ...
Pandas分组（GroupBy）
任何分组(groupby)操作都涉及原始对象的以下操作之一.它们是 - 分割对象应用一个函数结合的结果在许多情况下,我们将数据分成多个集合,并在每个子集上应用一些函数.在应用函数中,可以执行以下 ...
pandas.DataFrame的groupby()方法的基本使用
pandas.DataFrame的groupby()方法是一个特别常用和有用的方法.让我们快速掌握groupby()方法的基础使用,从此数据分析又多一法宝. 首先导入package: import p ...
pandas-16 pd.merge()的用法
pandas-16 pd.merge()的用法使用过sql语言的话,一定对join,left join, right join等非常熟悉,在pandas中,merge的作用也非常类似. 如:pd.m ...
Pandas中关于 loc \ iloc 用法的理解
转载至:https://blog.csdn.net/w_weiying/article/details/81411257 loc函数:通过行索引 "Index" 中的具体值来取行数 ...
pandas，pd.ExcelWriter保存结果到已存在的excel文件中
背景:pandas支持将DataFrame数据直接保存到excel中保存的case如下: import pandas as pd with pd.ExcelWriter('a.xls') as ...
pandas.DataFrame——pd数据框的简单认识、存csv文件
接着前天的豆瓣书单信息爬取,这一篇文章看一下利用pandas完成对数据的存储. 回想一下我们当时在最后得到了六个列表:img_urls, titles, ratings, authors, detai ...
Pandas | 09 迭代
Pandas对象之间的基本迭代的行为取决于类型.当迭代一个系列时,它被视为数组式,基本迭代产生这些值.其他数据结构,如:DataFrame和Panel,遵循类似惯例,迭代对象的键. 简而言之,基本迭代 ...

随机推荐

Jav获取文件的MD5码，比较两个文件内容是否相同
Jav获取文件的MD5码,比较两个文件内容是否相同代码: System.out.println(DigestUtils.md5Hex(new FileInputStream(new File(&qu ...
Un-Error-ASP.NET：无法加载协定为“YlbService.MMSHServicesSoap”的终结点配置部分，因为找到了该协定的多个终结点配置。请按名称指示首选的终结点配置部分。
ylbtech-Error-ASP.NET:无法加载协定为“YlbService.MMSHServicesSoap”的终结点配置部分,因为找到了该协定的多个终结点配置.请按名称指示首选的终结点配置部分 ...
git 学习目录
git命令方式 git - 1.基础 git - 2.github git - 3.分支番外 git - gitHub生成Markdown目录
C# 语音技术
1.使用DotNetSpeech.dll. /// <summary> /// 朗读/// </summary>/// <param name="text&qu ...
【PHP】 PHP中插件机制的一种实现方案
插件,亦即Plug-in,是指一类特定的功能模块(通常由第三方开发者实现),它的特点是:当你需要它的时候激活它,不需要它的时候禁用/删除它:且无论是激活还是禁用都不影响系统核心模块的运行,也就是说插 ...
经典面试题之——如何自由转换两个没有继承关系的字段及类型相同的实体模型，AutoMapper？
相信很多童鞋们都被问到过这个问题,不管是在面试的时候被问过,还是笔试题里考过,甚至有些童鞋们找我要学习资料的时候我也考过这个问题,包括博主我自己,也曾被问过,而且博主现在有时作为公司的面试官,也喜欢问 ...
快速改变文件hash值的方法
查看哈希值命令 Linux : md5sum + 文件名 $ md5sum .png fe5c3f5ef1d207bc1b646911b463c907 .png Windows : certutil ...
Xamarin.Android UnauthorizedAccessException: Access to the path is denied
进行文件读写,勾选了权限 <uses-permission android:name="android.permission.READ_EXTERNAL_STORAGE" / ...
C#.NET XML 与实体 MODEL 互转，非序列化
只能处理简单结构XML 和实体. using System.Text; using System.Xml; namespace A.Util { public static class MyXmlU ...
elasticsearch-head
elasticsearch-head 是用于监控 Elasticsearch 状态的客户端插件,包括数据可视化.执行增删改查操作等安装前先安装nodejs 1.下载地址 2.安装 npm ins ...

pandas-09 pd.groupby()的用法

pandas-09 pd.groupby()的用法

pandas-09 pd.groupby()的用法的更多相关文章

随机推荐

热门专题