Python数据分析(三)pandas resample 重采样
下方是pandas中resample方法的定义,帮助文档http://pandas.pydata.org/pandas-docs/stable/timeseries.html#resampling中有更加详细的解释。
def resample(self, rule, how=None, axis=0, fill_method=None, closed=None,
label=None, convention='start', kind=None, loffset=None,
limit=None, base=0, on=None, level=None):
"""
Convenience method for frequency conversion and resampling of time
series. Object must have a datetime-like index (DatetimeIndex,
PeriodIndex, or TimedeltaIndex), or pass datetime-like values
to the on or level keyword.(数据重采样和频率转换,数据必须有时间类型的索引列) Parameters
----------
rule : string
the offset string or object representing target conversion(代表目标转换的偏移量)
axis : int, optional, default 0(操作的轴信息)
closed : {'right', 'left'}
Which side of bin interval is closed. The default is 'left'
for all frequency offsets except for 'M', 'A', 'Q', 'BM',
'BA', 'BQ', and 'W' which all have a default of 'right'.(哪一个方向的间隔是关闭的,)
label : {'right', 'left'}
Which bin edge label to label bucket with. The default is 'left'
for all frequency offsets except for 'M', 'A', 'Q', 'BM',
'BA', 'BQ', and 'W' which all have a default of 'right'.(区间的哪一个方向的边界标签保留)
convention : {'start', 'end', 's', 'e'}
For PeriodIndex only, controls whether to use the start or end of
`rule`
kind: {'timestamp', 'period'}, optional
Pass 'timestamp' to convert the resulting index to a
``DateTimeIndex`` or 'period' to convert it to a ``PeriodIndex``.
By default the input representation is retained.
loffset : timedelta
Adjust the resampled time labels
base : int, default 0
For frequencies that evenly subdivide 1 day, the "origin" of the
aggregated intervals. For example, for '5min' frequency, base could
range from 0 through 4. Defaults to 0
on : string, optional
For a DataFrame, column to use instead of index for resampling.
Column must be datetime-like. .. versionadded:: 0.19.0 level : string or int, optional
For a MultiIndex, level (name or number) to use for
resampling. Level must be datetime-like. .. versionadded:: 0.19.0 Returns
-------
Resampler object Notes
-----
See the `user guide
<http://pandas.pydata.org/pandas-docs/stable/timeseries.html#resampling>`_
for more. To learn more about the offset strings, please see `this link
<http://pandas.pydata.org/pandas-docs/stable/timeseries.html#offset-aliases>`__. Examples
-------- Start by creating a series with 9 one minute timestamps.(新建频率为1min的时间序列) >>> index = pd.date_range('1/1/2000', periods=9, freq='T')
>>> series = pd.Series(range(9), index=index)
>>> series
2000-01-01 00:00:00 0
2000-01-01 00:01:00 1
2000-01-01 00:02:00 2
2000-01-01 00:03:00 3
2000-01-01 00:04:00 4
2000-01-01 00:05:00 5
2000-01-01 00:06:00 6
2000-01-01 00:07:00 7
2000-01-01 00:08:00 8
Freq: T, dtype: int64 Downsample the series into 3 minute bins and sum the values
of the timestamps falling into a bin.(下采样为三分钟) >>> series.resample('3T').sum()
2000-01-01 00:00:00 3
2000-01-01 00:03:00 12
2000-01-01 00:06:00 21
Freq: 3T, dtype: int64 Downsample the series into 3 minute bins as above, but label each
bin using the right edge instead of the left. Please note that the
value in the bucket used as the label is not included in the bucket,
which it labels. For example, in the original series the
bucket ``2000-01-01 00:03:00`` contains the value 3, but the summed
value in the resampled bucket with the label ``2000-01-01 00:03:00``
does not include 3 (if it did, the summed value would be 6, not 3).
To include this value close the right side of the bin interval as
illustrated in the example below this one. >>> series.resample('3T', label='right').sum()(保留间隔的右侧标签,上一个结果是左侧标签)
2000-01-01 00:03:00 3
2000-01-01 00:06:00 12
2000-01-01 00:09:00 21
Freq: 3T, dtype: int64 Downsample the series into 3 minute bins as above, but close the right
side of the bin interval.(降采样为3分钟) >>> series.resample('3T', label='right', closed='right').sum()
2000-01-01 00:00:00 0
2000-01-01 00:03:00 6
2000-01-01 00:06:00 15
2000-01-01 00:09:00 15
Freq: 3T, dtype: int64 Upsample the series into 30 second bins.(生采样为30秒) >>> series.resample('30S').asfreq()[0:5] #select first 5 rows
2000-01-01 00:00:00 0.0
2000-01-01 00:00:30 NaN
2000-01-01 00:01:00 1.0
2000-01-01 00:01:30 NaN
2000-01-01 00:02:00 2.0
Freq: 30S, dtype: float64 Upsample the series into 30 second bins and fill the ``NaN``
values using the ``pad`` method.(向前0阶保持)
pad/ffill:用前一个非缺失值去填充该缺失值
backfill/bfill:用下一个非缺失值填充该缺失值
>>> series.resample('30S').pad()[0:5]
2000-01-01 00:00:00 0
2000-01-01 00:00:30 0
2000-01-01 00:01:00 1
2000-01-01 00:01:30 1
2000-01-01 00:02:00 2
Freq: 30S, dtype: int64
Upsample the series into 30 second bins and fill the
``NaN`` values using the ``bfill`` method.(向后0阶保持)
>>> series.resample('30S').bfill()[0:5]
2000-01-01 00:00:00 0
2000-01-01 00:00:30 1
2000-01-01 00:01:00 1
2000-01-01 00:01:30 2
2000-01-01 00:02:00 2
Freq: 30S, dtype: int64
Pass a custom function via ``apply``
>>> def custom_resampler(array_like):
... return np.sum(array_like)+5
>>> series.resample('3T').apply(custom_resampler)
2000-01-01 00:00:00 8
2000-01-01 00:03:00 17
2000-01-01 00:06:00 26
Freq: 3T, dtype: int64
For a Series with a PeriodIndex, the keyword `convention` can be
used to control whether to use the start or end of `rule`.
>>> s = pd.Series([1, 2], index=pd.period_range('2012-01-01',
freq='A',
periods=2))
>>> s
2012 1
2013 2
Freq: A-DEC, dtype: int64
Resample by month using 'start' `convention`. Values are assigned to
the first month of the period.
>>> s.resample('M', convention='start').asfreq().head()
2012-01 1.0
2012-02 NaN
2012-03 NaN
2012-04 NaN
2012-05 NaN
Freq: M, dtype: float64
Resample by month using 'end' `convention`. Values are assigned to
the last month of the period.
>>> s.resample('M', convention='end').asfreq()
2012-12 1.0
2013-01 NaN
2013-02 NaN
2013-03 NaN
2013-04 NaN
2013-05 NaN
2013-06 NaN
2013-07 NaN
2013-08 NaN
2013-09 NaN
2013-10 NaN
2013-11 NaN
2013-12 2.0
Freq: M, dtype: float64
For DataFrame objects, the keyword ``on`` can be used to specify the
column instead of the index for resampling.
>>> df = pd.DataFrame(data=9*[range(4)], columns=['a', 'b', 'c', 'd'])
>>> df['time'] = pd.date_range('1/1/2000', periods=9, freq='T')
>>> df.resample('3T', on='time').sum()
a b c d
time
2000-01-01 00:00:00 0 3 6 9
2000-01-01 00:03:00 0 3 6 9
2000-01-01 00:06:00 0 3 6 9
For a DataFrame with MultiIndex, the keyword ``level`` can be used to
specify on level the resampling needs to take place.
>>> time = pd.date_range('1/1/2000', periods=5, freq='T')
>>> df2 = pd.DataFrame(data=10*[range(4)],
columns=['a', 'b', 'c', 'd'],
index=pd.MultiIndex.from_product([time, [1, 2]])
)
>>> df2.resample('3T', level=0).sum()
a b c d
2000-01-01 00:00:00 0 6 12 18
2000-01-01 00:03:00 0 4 8 12
Python数据分析(三)pandas resample 重采样的更多相关文章
- Python数据分析库pandas基本操作
Python数据分析库pandas基本操作2017年02月20日 17:09:06 birdlove1987 阅读数:22631 标签: python 数据分析 pandas 更多 个人分类: Pyt ...
- Python数据分析之pandas基本数据结构:Series、DataFrame
1引言 本文总结Pandas中两种常用的数据类型: (1)Series是一种一维的带标签数组对象. (2)DataFrame,二维,Series容器 2 Series数组 2.1 Series数组构成 ...
- Python 数据分析:Pandas 缺省值的判断
Python 数据分析:Pandas 缺省值的判断 背景 我们从数据库中取出数据存入 Pandas None 转换成 NaN 或 NaT.但是,我们将 Pandas 数据写入数据库时又需要转换成 No ...
- Python数据分析之Pandas操作大全
从头到尾都是手码的,文中的所有示例也都是在Pycharm中运行过的,自己整理笔记的最大好处在于可以按照自己的思路来构建矿建,等到将来在需要的时候能够以最快的速度看懂并应用=_= 注:为方便表述,本章设 ...
- Python数据分析之pandas学习
Python中的pandas模块进行数据分析. 接下来pandas介绍中将学习到如下8块内容:1.数据结构简介:DataFrame和Series2.数据索引index3.利用pandas查询数据4.利 ...
- python数据分析之pandas数据选取:df[] df.loc[] df.iloc[] df.ix[] df.at[] df.iat[]
1 引言 Pandas是作为Python数据分析著名的工具包,提供了多种数据选取的方法,方便实用.本文主要介绍Pandas的几种数据选取的方法. Pandas中,数据主要保存为Dataframe和Se ...
- Python数据分析之pandas
Python中的pandas模块进行数据分析. 接下来pandas介绍中将学习到如下8块内容:1.数据结构简介:DataFrame和Series2.数据索引index3.利用pandas查询数据4.利 ...
- Python数据分析之pandas学习(基础操作)
一.pandas数据结构介绍 在pandas中有两类非常重要的数据结构,即序列Series和数据框DataFrame.Series类似于numpy中的一维数组,除了通吃一维数组可用的函数或方法,而且其 ...
- python数据分析三个重要方法之:numpy和pandas
关于数据分析的组件之一:numpy ndarray的属性 4个必记参数:ndim:维度shape:形状(各维度的长度)size:总长度dtype:元素类型 一:np.array()产生n维 ...
随机推荐
- jenkins部署记录
环境规划 主机分配 192.168.2.139 : gitlab 192.168.2.141 : jenkins 192.168.2.142 : haproxy01 192.168.2.143 :ha ...
- Eclipse中各种文件的注释与取消注释的快捷键
Eclipse中各种文件的注释与取消注释的快捷键 Java文件: 注释和取消注释的快捷键都是:CTRL + / 或 Shift+Ctrl+C JS文件: 注释和取消注释的快捷键都是:CTRL + / ...
- jquery添加html代码的几种方法
经常用jq来DOM添加html代码 就总结了jq里面最常用的动态添加html代码的方法 append在元素内部的尾部加上元素 prepend在元素内部的前部加上元素 after在元素外部的尾部加上元素 ...
- YII2.0 表单验证手机号是否合法且唯一
[['phone'], 'unique'], ['phone','match','pattern'=>'/^1[345678]{1}\d{9}$/','message'=>'{attrib ...
- Hadoop(8)-HDFS的读写数据流程以及机架感知
1. HDFS的写数据流程 1.客户端通过fs模块向NameNode申请文件上传,NameNode检查请求是否合法,如用户权限,目标文件是否已存在,父目录是否存在等等 2.NameNode返回是否可以 ...
- python应用:爬虫实例(静态网页)
爬取起点中文网某本小说实例: # -*-coding:utf8-*- import requests import urllib import urllib2 from bs4 import Beau ...
- Python3 模块、包调用&路径
''' 以下代码均为讲解,不能实际操作 ''' ''' 博客园 Infi_chu ''' ''' 模块的优点: 1.高可维护性 2.可以大大减少编写的代码量 模块一共有三种: 1.Python标准库 ...
- UVA - 1606 Amphiphilic Carbon Molecules 极角扫描法
题目:点击查看题目 思路:这道题的解决思路是极角扫描法.极角扫描法的思想主要是先选择一个点作为基准点,然后求出各点对于该点的相对坐标,同时求出该坐标系下的极角,按照极角对点进行排序.然后选取点与基准点 ...
- 三张照片解决--win10系统的edge浏览器设置为浏览器IE8,IE7,IE9---完美解决 费元星
主要思想: 第二种方法: 参考文档: 1.可以在系统盘的C:\Program Files\Internet Explorer中找到iexplore.exe,然后将其发送到桌 ...
- iOS下原生与JS交互(总结)
iOS开发免不了要与UIWebView打交道,然后就要涉及到JS与原生OC交互,今天总结一下JS与原生OC交互的两种方式. JS调用原生OC篇(我自己用的方式二,简单方便) 方式一 第一种方式是用JS ...