pandas分组聚合案例

美国2012年总统候选人政治献金数据分析

导入包

import numpy as np

import pandas as pd

from pandas import Series,DataFrame

方便操作，将月份和参选人以及所在政党进行定义

months = {'JAN' : 1, 'FEB' : 2, 'MAR' : 3, 'APR' : 4, 'MAY' : 5, 'JUN' : 6,

          'JUL' : 7, 'AUG' : 8, 'SEP' : 9, 'OCT': 10, 'NOV': 11, 'DEC' : 12}

of_interest = ['Obama, Barack', 'Romney, Mitt', 'Santorum, Rick',

               'Paul, Ron', 'Gingrich, Newt']

parties = {

  'Bachmann, Michelle': 'Republican',

  'Romney, Mitt': 'Republican',

  'Obama, Barack': 'Democrat',

  "Roemer, Charles E. 'Buddy' III": 'Reform',

  'Pawlenty, Timothy': 'Republican',

  'Johnson, Gary Earl': 'Libertarian',

  'Paul, Ron': 'Republican',

  'Santorum, Rick': 'Republican',

  'Cain, Herman': 'Republican',

  'Gingrich, Newt': 'Republican',

  'McCotter, Thaddeus G': 'Republican',

  'Huntsman, Jon': 'Republican',

  'Perry, Rick': 'Republican'

 }

df = pd.read_csv('./data/usa_election.txt')

df.head()

C:\anaconda3\lib\site-packages\IPython\core\interactiveshell.py:2728: DtypeWarning: Columns (6) have mixed types. Specify dtype option on import or set low_memory=False.

  interactivity=interactivity, compiler=compiler, result=result)

.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}

.dataframe tbody tr th {

    vertical-align: top;

}

.dataframe thead th {

    text-align: right;

}

	cmte_id	cand_id	cand_nm	contbr_nm	contbr_city	contbr_st	contbr_zip	contbr_employer	contbr_occupation	contb_receipt_amt	contb_receipt_dt	receipt_desc	memo_cd	memo_text	form_tp	file_num
0	C00410118	P20002978	Bachmann, Michelle	HARVEY, WILLIAM	MOBILE	AL	3.6601e+08	RETIRED	RETIRED	250.0	20-JUN-11	NaN	NaN	NaN	SA17A	736166
1	C00410118	P20002978	Bachmann, Michelle	HARVEY, WILLIAM	MOBILE	AL	3.6601e+08	RETIRED	RETIRED	50.0	23-JUN-11	NaN	NaN	NaN	SA17A	736166
2	C00410118	P20002978	Bachmann, Michelle	SMITH, LANIER	LANETT	AL	3.68633e+08	INFORMATION REQUESTED	INFORMATION REQUESTED	250.0	05-JUL-11	NaN	NaN	NaN	SA17A	749073
3	C00410118	P20002978	Bachmann, Michelle	BLEVINS, DARONDA	PIGGOTT	AR	7.24548e+08	NONE	RETIRED	250.0	01-AUG-11	NaN	NaN	NaN	SA17A	749073
4	C00410118	P20002978	Bachmann, Michelle	WARDENBURG, HAROLD	HOT SPRINGS NATION	AR	7.19016e+08	NONE	RETIRED	300.0	20-JUN-11	NaN	NaN	NaN	SA17A	736166

# 新建一列各个候选人所在党派party

df['party'] = df['cand_nm'].map(parties)

df.head()

.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}

.dataframe tbody tr th {

    vertical-align: top;

}

.dataframe thead th {

    text-align: right;

}

	cmte_id	cand_id	cand_nm	contbr_nm	contbr_city	contbr_st	contbr_zip	contbr_employer	contbr_occupation	contb_receipt_amt	contb_receipt_dt	receipt_desc	memo_cd	memo_text	form_tp	file_num	party
0	C00410118	P20002978	Bachmann, Michelle	HARVEY, WILLIAM	MOBILE	AL	3.6601e+08	RETIRED	RETIRED	250.0	20-JUN-11	NaN	NaN	NaN	SA17A	736166	Republican
1	C00410118	P20002978	Bachmann, Michelle	HARVEY, WILLIAM	MOBILE	AL	3.6601e+08	RETIRED	RETIRED	50.0	23-JUN-11	NaN	NaN	NaN	SA17A	736166	Republican
2	C00410118	P20002978	Bachmann, Michelle	SMITH, LANIER	LANETT	AL	3.68633e+08	INFORMATION REQUESTED	INFORMATION REQUESTED	250.0	05-JUL-11	NaN	NaN	NaN	SA17A	749073	Republican
3	C00410118	P20002978	Bachmann, Michelle	BLEVINS, DARONDA	PIGGOTT	AR	7.24548e+08	NONE	RETIRED	250.0	01-AUG-11	NaN	NaN	NaN	SA17A	749073	Republican
4	C00410118	P20002978	Bachmann, Michelle	WARDENBURG, HAROLD	HOT SPRINGS NATION	AR	7.19016e+08	NONE	RETIRED	300.0	20-JUN-11	NaN	NaN	NaN	SA17A	736166	Republican

# party这一列中有哪些元素

df['party'].unique()

array(['Republican', 'Democrat', 'Reform', 'Libertarian'], dtype=object)

# 统计party列中各个元素出现次数

df['party'].value_counts()

Democrat       292400

Republican     237575

Reform           5364

Libertarian       702

Name: party, dtype: int64

# 查看各个党派收到的政治献金总数contb_receipt_amt

df.groupby(by='party')['contb_receipt_amt'].sum()

party

Democrat       8.105758e+07

Libertarian    4.132769e+05

Reform         3.390338e+05

Republican     1.192255e+08

Name: contb_receipt_amt, dtype: float64

# 查看每天各个党派收到的政治献金总数contb_receipt_amt

df.groupby(by=['contb_receipt_dt','party'])['contb_receipt_amt'].sum()

contb_receipt_dt  party

01-APR-11         Reform              50.00

                  Republican       12635.00

01-AUG-11         Democrat        175281.00

                  Libertarian       1000.00

                  Reform            1847.00

                  Republican      234598.46

01-DEC-11         Democrat        651532.82

                  Libertarian        725.00

                  Reform             875.00

                  Republican      486405.96

01-FEB-11         Republican         250.00

01-JAN-11         Republican        8600.00

01-JAN-12         Democrat         58098.80

                  Reform             515.00

                  Republican       75704.72

01-JUL-11         Democrat        165961.00

                  Libertarian       2000.00

                  Reform             100.00

                  Republican      115848.72

01-JUN-11         Democrat        145459.00

                  Libertarian        500.00

                  Reform              50.00

                  Republican      433109.20

01-MAR-11         Republican        1000.00

01-MAY-11         Democrat         82644.00

                  Reform             480.00

                  Republican       28663.87

01-NOV-11         Democrat        122529.87

                  Libertarian       3000.00

                  Reform            1792.00

                                    ...

30-OCT-11         Reform            3910.00

                  Republican       43913.16

30-SEP-11         Democrat       3373517.24

                  Libertarian        550.00

                  Reform            2050.00

                  Republican     4886331.76

31-AUG-11         Democrat        374387.44

                  Libertarian      10750.00

                  Reform             450.00

                  Republican     1017735.02

31-DEC-11         Democrat       3553072.57

                  Reform             695.00

                  Republican     1094376.72

31-JAN-11         Republican        6000.00

31-JAN-12         Democrat       1418410.31

                  Reform             150.00

                  Republican      869890.41

31-JUL-11         Democrat         20305.00

                  Reform             966.00

                  Republican       12781.02

31-MAR-11         Reform             200.00

                  Republican       62475.00

31-MAY-11         Democrat        351705.66

                  Libertarian        250.00

                  Reform             100.00

                  Republican      301339.80

31-OCT-11         Democrat        204996.87

                  Libertarian       4250.00

                  Reform            3105.00

                  Republican      734601.83

Name: contb_receipt_amt, Length: 1183, dtype: float64

# 将表中日期格式转换为'yyyy-mm-dd'  day-m-y

def transformDate(d):

    day,month,year = d.split('-')

    month = months[month]

    return '20'+year+'-'+str(month)+'-'+day

df['contb_receipt_dt'] = df['contb_receipt_dt'].apply(transformDate)

df.head()

.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}

.dataframe tbody tr th {

    vertical-align: top;

}

.dataframe thead th {

    text-align: right;

}

	cmte_id	cand_id	cand_nm	contbr_nm	contbr_city	contbr_st	contbr_zip	contbr_employer	contbr_occupation	contb_receipt_amt	contb_receipt_dt	receipt_desc	memo_cd	memo_text	form_tp	file_num	party
0	C00410118	P20002978	Bachmann, Michelle	HARVEY, WILLIAM	MOBILE	AL	3.6601e+08	RETIRED	RETIRED	250.0	2011-6-20	NaN	NaN	NaN	SA17A	736166	Republican
1	C00410118	P20002978	Bachmann, Michelle	HARVEY, WILLIAM	MOBILE	AL	3.6601e+08	RETIRED	RETIRED	50.0	2011-6-23	NaN	NaN	NaN	SA17A	736166	Republican
2	C00410118	P20002978	Bachmann, Michelle	SMITH, LANIER	LANETT	AL	3.68633e+08	INFORMATION REQUESTED	INFORMATION REQUESTED	250.0	2011-7-05	NaN	NaN	NaN	SA17A	749073	Republican
3	C00410118	P20002978	Bachmann, Michelle	BLEVINS, DARONDA	PIGGOTT	AR	7.24548e+08	NONE	RETIRED	250.0	2011-8-01	NaN	NaN	NaN	SA17A	749073	Republican
4	C00410118	P20002978	Bachmann, Michelle	WARDENBURG, HAROLD	HOT SPRINGS NATION	AR	7.19016e+08	NONE	RETIRED	300.0	2011-6-20	NaN	NaN	NaN	SA17A	736166	Republican

# 查看老兵(捐献者职业)主要支持谁  ：查看老兵们捐赠给谁的钱最多

# 1.将老兵对应的行数据取出

df['contbr_occupation'] == 'DISABLED VETERAN'

old_bing = df.loc[df['contbr_occupation'] == 'DISABLED VETERAN']

# 2.根据候选人分组

old_bing.groupby(by='cand_nm')['contb_receipt_amt'].sum()

cand_nm

Cain, Herman       300.00

Obama, Barack     4205.00

Paul, Ron         2425.49

Santorum, Rick     250.00

Name: contb_receipt_amt, dtype: float64

df['contb_receipt_amt'].max()

1944042.43

#捐赠金额最大的人的职业以及捐献额  .通过query("查询条件来查找捐献人职业")

df.query('contb_receipt_amt == 1944042.43')

.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}

.dataframe tbody tr th {

    vertical-align: top;

}

.dataframe thead th {

    text-align: right;

}

	cmte_id	cand_id	cand_nm	contbr_nm	contbr_city	contbr_st	contbr_zip	contbr_employer	contbr_occupation	contb_receipt_amt	contb_receipt_dt	receipt_desc	memo_cd	memo_text	form_tp	file_num	party
176127	C00431445	P80003338	Obama, Barack	OBAMA VICTORY FUND 2012 - UNITEMIZED	CHICAGO	IL	60680	NaN	NaN	1944042.43	2011-12-31	NaN	X	*	SA18	763233	Democrat

pandas分组聚合案例的更多相关文章

Pandas 分组聚合
# 导入相关库 import numpy as np import pandas as pd 创建数据 index = pd.Index(data=["Tom", "Bo ...
Python Pandas分组聚合
Pycharm 鼠标移动到函数上,CTRL+Q可以快速查看文档,CTR+P可以看基本的参数. apply(),applymap()和map() apply()和applymap()是DataFrame ...
Pandas 分组聚合：分组、分组对象操作
1.概述 1.1 group语法 df.groupby(self, by=None, axis=0, level=None, as_index: bool=True, sort: bool=True, ...
DataAnalysis-Pandas分组聚合
title: Pandas分组聚合 tags: 数据分析 python categories: DataAnalysis toc: true date: 2020-02-10 16:28:49 Des ...
pandas分组和聚合
Pandas分组与聚合分组 (groupby) 对数据集进行分组,然后对每组进行统计分析 SQL能够对数据进行过滤,分组聚合 pandas能利用groupby进行更加复杂的分组运算分组运算过程:s ...
Pandas分组运算（groupby）修炼
Pandas分组运算(groupby)修炼 Pandas的groupby()功能很强大,用好了可以方便的解决很多问题,在数据处理以及日常工作中经常能施展拳脚. 今天,我们一起来领略下groupby() ...
34.分组聚合操作—bucket
主要知识点: 学习聚合知识一.准备数据 1.家电卖场案例背景建立index 以一个家电卖场中的电视销售数据为背景,来对各种品牌,各种颜色的电视的销量和销售额,进行各种各样角度的分析 ...
白日梦的Elasticsearch实战笔记，ES账号免费借用、32个查询案例、15个聚合案例、7个查询优化技巧。
目录一.导读二.福利:账号借用三._search api 搜索api 3.1.什么是query string search? 3.2.什么是query dsl? 3.3.干货!32个查询案例! ...
白日梦的Elasticsearch实战笔记，32个查询案例、15个聚合案例、7个查询优化技巧。
目录一.导读三._search api 搜索api 3.1.什么是query string search? 3.2.什么是query dsl? 3.3.干货!32个查询案例! 四.聚合分析 4.1 ...

随机推荐

unity2017 光照与渲染（一）
光照&渲染(基于unity2017.2.0) Custom Skybox 天空盒最丰富的环境光 a. TextureShape 改成 Cube. b. 把图片直接丢给天空,就会自动生成材质. ...
从1到n整数中1出现的次数（整数中1出现的次数）
题目求出1~13的整数中1出现的次数,并算出100~1300的整数中1出现的次数?为此他特别数了一下1~13中包含1的数字有1.10.11.12.13因此共出现6次,但是对于后面问题他就没辙了.AC ...
PHP: thinkPHP踩坑记录(实现API接口以及处理莫名其妙的500问题)
因为各种原因开始学习PHP,并且要在两周内能够对PHP项目进行二次开发,还好PHP够简单,至少入门很简单,很快就接触thinkPHP框架. 在了解了路由匹配视图的规则之后,开始着手尝试编写API接口, ...
Reference与ReferenceQueue
Reference源码分析首先我们先看一下Reference类的注释: /** * Abstract base class for reference objects. This class def ...
02.list--约瑟夫环
from fib import fib # 参考01.线性表 def josephus_a(n, k, m): """ 约瑟夫环没有人用0表示,n个人出列即结束 :pa ...
pip安装包出现timeout的解决办法
今天安装django时老是出现timeout WARNING: Retrying (Retry(total=4, connect=None, read=None, redirect=None, sta ...
Python3解leetcode Number of Boomerangs
问题描述: Given n points in the plane that are all pairwise distinct, a "boomerang" is a tuple ...
yum源更换
折腾了半天,怀疑自己能力的时候,发现原来不是我的错.树莓派换源国内的aliyun,163都不能用,最好找到这个 # CentOS-Base.repo # # The mirror system us ...
CodeForces - 1183H Subsequences (hard version) （DP）
题目:https://vjudge.net/contest/325352#problem/C 题意:输入n,m,给你一个长度为n的串,然后你有一个集合,集合里面都是你的子序列,集合里面不能重复,集合中 ...
[CSP-S模拟测试]:嘟嘟噜（约瑟夫问题）
题目描述由于众所周知的原因,冈部一直欠真由理一串香蕉.为了封上真由理的嘴,冈部承诺只要真由理回答出这个问题,就给她买一车的香蕉:一开始有$n$个人围成一个圈,从$1$开始顺时针报数,报出$m$的人被 ...

pandas分组聚合案例

美国2012年总统候选人政治献金数据分析

pandas分组聚合案例的更多相关文章

随机推荐

热门专题