Python数据分析（三）pandas resample 重采样

下方是pandas中resample方法的定义，帮助文档http://pandas.pydata.org/pandas-docs/stable/timeseries.html#resampling中有更加详细的解释。

    def resample(self, rule, how=None, axis=0, fill_method=None, closed=None,

                 label=None, convention='start', kind=None, loffset=None,

                 limit=None, base=0, on=None, level=None):

        """

        Convenience method for frequency conversion and resampling of time

        series.  Object must have a datetime-like index (DatetimeIndex,

        PeriodIndex, or TimedeltaIndex), or pass datetime-like values

        to the on or level keyword.（数据重采样和频率转换，数据必须有时间类型的索引列）

        Parameters

        ----------

        rule : string

            the offset string or object representing target conversion（代表目标转换的偏移量）

        axis : int, optional, default 0（操作的轴信息）

        closed : {'right', 'left'}

            Which side of bin interval is closed. The default is 'left'

            for all frequency offsets except for 'M', 'A', 'Q', 'BM',

            'BA', 'BQ', and 'W' which all have a default of 'right'.（哪一个方向的间隔是关闭的，）

        label : {'right', 'left'}

            Which bin edge label to label bucket with. The default is 'left'

            for all frequency offsets except for 'M', 'A', 'Q', 'BM',

            'BA', 'BQ', and 'W' which all have a default of 'right'.（区间的哪一个方向的边界标签保留）

        convention : {'start', 'end', 's', 'e'}

            For PeriodIndex only, controls whether to use the start or end of

            `rule`

        kind: {'timestamp', 'period'}, optional

            Pass 'timestamp' to convert the resulting index to a

            ``DateTimeIndex`` or 'period' to convert it to a ``PeriodIndex``.

            By default the input representation is retained.

        loffset : timedelta

            Adjust the resampled time labels

        base : int, default 0

            For frequencies that evenly subdivide 1 day, the "origin" of the

            aggregated intervals. For example, for '5min' frequency, base could

            range from 0 through 4. Defaults to 0

        on : string, optional

            For a DataFrame, column to use instead of index for resampling.

            Column must be datetime-like.

            .. versionadded:: 0.19.0

        level : string or int, optional

            For a MultiIndex, level (name or number) to use for

            resampling.  Level must be datetime-like.

            .. versionadded:: 0.19.0

        Returns

        -------

        Resampler object

        Notes

        -----

        See the `user guide

        <http://pandas.pydata.org/pandas-docs/stable/timeseries.html#resampling>`_

        for more.

        To learn more about the offset strings, please see `this link

        <http://pandas.pydata.org/pandas-docs/stable/timeseries.html#offset-aliases>`__.

        Examples

        --------

        Start by creating a series with 9 one minute timestamps.（新建频率为1min的时间序列）

        >>> index = pd.date_range('1/1/2000', periods=9, freq='T')

        >>> series = pd.Series(range(9), index=index)

        >>> series

        2000-01-01 00:00:00    0

        2000-01-01 00:01:00    1

        2000-01-01 00:02:00    2

        2000-01-01 00:03:00    3

        2000-01-01 00:04:00    4

        2000-01-01 00:05:00    5

        2000-01-01 00:06:00    6

        2000-01-01 00:07:00    7

        2000-01-01 00:08:00    8

        Freq: T, dtype: int64

        Downsample the series into 3 minute bins and sum the values

        of the timestamps falling into a bin.（下采样为三分钟）

        >>> series.resample('3T').sum()

        2000-01-01 00:00:00     3

        2000-01-01 00:03:00    12

        2000-01-01 00:06:00    21

        Freq: 3T, dtype: int64

        Downsample the series into 3 minute bins as above, but label each

        bin using the right edge instead of the left. Please note that the

        value in the bucket used as the label is not included in the bucket,

        which it labels. For example, in the original series the

        bucket ``2000-01-01 00:03:00`` contains the value 3, but the summed

        value in the resampled bucket with the label ``2000-01-01 00:03:00``

        does not include 3 (if it did, the summed value would be 6, not 3).

        To include this value close the right side of the bin interval as

        illustrated in the example below this one.

        >>> series.resample('3T', label='right').sum()（保留间隔的右侧标签，上一个结果是左侧标签）

        2000-01-01 00:03:00     3

        2000-01-01 00:06:00    12

        2000-01-01 00:09:00    21

        Freq: 3T, dtype: int64

        Downsample the series into 3 minute bins as above, but close the right

        side of the bin interval.（降采样为3分钟）

        >>> series.resample('3T', label='right', closed='right').sum()

        2000-01-01 00:00:00     0

        2000-01-01 00:03:00     6

        2000-01-01 00:06:00    15

        2000-01-01 00:09:00    15

        Freq: 3T, dtype: int64

        Upsample the series into 30 second bins.（生采样为30秒）

        >>> series.resample('30S').asfreq()[0:5] #select first 5 rows

        2000-01-01 00:00:00   0.0

        2000-01-01 00:00:30   NaN

        2000-01-01 00:01:00   1.0

        2000-01-01 00:01:30   NaN

        2000-01-01 00:02:00   2.0

        Freq: 30S, dtype: float64

        Upsample the series into 30 second bins and fill the ``NaN``

        values using the ``pad`` method.(向前0阶保持)

        pad/ffill：用前一个非缺失值去填充该缺失值 
        backfill/bfill：用下一个非缺失值填充该缺失值

        >>> series.resample('30S').pad()[0:5]

        2000-01-01 00:00:00    0

        2000-01-01 00:00:30    0

        2000-01-01 00:01:00    1

        2000-01-01 00:01:30    1

        2000-01-01 00:02:00    2

        Freq: 30S, dtype: int64

        Upsample the series into 30 second bins and fill the

        ``NaN`` values using the ``bfill`` method.（向后0阶保持）

        >>> series.resample('30S').bfill()[0:5]

        2000-01-01 00:00:00    0

        2000-01-01 00:00:30    1

        2000-01-01 00:01:00    1

        2000-01-01 00:01:30    2

        2000-01-01 00:02:00    2

        Freq: 30S, dtype: int64

        Pass a custom function via ``apply``

        >>> def custom_resampler(array_like):

        ...     return np.sum(array_like)+5

        >>> series.resample('3T').apply(custom_resampler)

        2000-01-01 00:00:00     8

        2000-01-01 00:03:00    17

        2000-01-01 00:06:00    26

        Freq: 3T, dtype: int64

        For a Series with a PeriodIndex, the keyword `convention` can be

        used to control whether to use the start or end of `rule`.

        >>> s = pd.Series([1, 2], index=pd.period_range('2012-01-01',

                                                        freq='A',

                                                        periods=2))

        >>> s

        2012    1

        2013    2

        Freq: A-DEC, dtype: int64

        Resample by month using 'start' `convention`. Values are assigned to

        the first month of the period.

        >>> s.resample('M', convention='start').asfreq().head()

        2012-01    1.0

        2012-02    NaN

        2012-03    NaN

        2012-04    NaN

        2012-05    NaN

        Freq: M, dtype: float64

        Resample by month using 'end' `convention`. Values are assigned to

        the last month of the period.

        >>> s.resample('M', convention='end').asfreq()

        2012-12    1.0

        2013-01    NaN

        2013-02    NaN

        2013-03    NaN

        2013-04    NaN

        2013-05    NaN

        2013-06    NaN

        2013-07    NaN

        2013-08    NaN

        2013-09    NaN

        2013-10    NaN

        2013-11    NaN

        2013-12    2.0

        Freq: M, dtype: float64

        For DataFrame objects, the keyword ``on`` can be used to specify the

        column instead of the index for resampling.

        >>> df = pd.DataFrame(data=9*[range(4)], columns=['a', 'b', 'c', 'd'])

        >>> df['time'] = pd.date_range('1/1/2000', periods=9, freq='T')

        >>> df.resample('3T', on='time').sum()

                             a  b  c  d

        time

        2000-01-01 00:00:00  0  3  6  9

        2000-01-01 00:03:00  0  3  6  9

        2000-01-01 00:06:00  0  3  6  9

        For a DataFrame with MultiIndex, the keyword ``level`` can be used to

        specify on level the resampling needs to take place.

        >>> time = pd.date_range('1/1/2000', periods=5, freq='T')

        >>> df2 = pd.DataFrame(data=10*[range(4)],

                               columns=['a', 'b', 'c', 'd'],

                               index=pd.MultiIndex.from_product([time, [1, 2]])

                               )

        >>> df2.resample('3T', level=0).sum()

                             a  b   c   d

        2000-01-01 00:00:00  0  6  12  18

        2000-01-01 00:03:00  0  4   8  12

Python数据分析（三）pandas resample 重采样的更多相关文章

Python数据分析库pandas基本操作
Python数据分析库pandas基本操作2017年02月20日 17:09:06 birdlove1987 阅读数:22631 标签: python 数据分析 pandas 更多个人分类: Pyt ...
Python数据分析之pandas基本数据结构：Series、DataFrame
1引言本文总结Pandas中两种常用的数据类型: (1)Series是一种一维的带标签数组对象. (2)DataFrame,二维,Series容器 2 Series数组 2.1 Series数组构成 ...
Python 数据分析：Pandas 缺省值的判断
Python 数据分析:Pandas 缺省值的判断背景我们从数据库中取出数据存入 Pandas None 转换成 NaN 或 NaT.但是,我们将 Pandas 数据写入数据库时又需要转换成 No ...
Python数据分析之Pandas操作大全
从头到尾都是手码的,文中的所有示例也都是在Pycharm中运行过的,自己整理笔记的最大好处在于可以按照自己的思路来构建矿建,等到将来在需要的时候能够以最快的速度看懂并应用=_= 注:为方便表述,本章设 ...
Python数据分析之pandas学习
Python中的pandas模块进行数据分析. 接下来pandas介绍中将学习到如下8块内容:1.数据结构简介:DataFrame和Series2.数据索引index3.利用pandas查询数据4.利 ...
python数据分析之pandas数据选取：df[] df.loc[] df.iloc[] df.ix[] df.at[] df.iat[]
1 引言 Pandas是作为Python数据分析著名的工具包,提供了多种数据选取的方法,方便实用.本文主要介绍Pandas的几种数据选取的方法. Pandas中,数据主要保存为Dataframe和Se ...
Python数据分析之pandas
Python中的pandas模块进行数据分析. 接下来pandas介绍中将学习到如下8块内容:1.数据结构简介:DataFrame和Series2.数据索引index3.利用pandas查询数据4.利 ...
Python数据分析之pandas学习(基础操作)
一.pandas数据结构介绍在pandas中有两类非常重要的数据结构,即序列Series和数据框DataFrame.Series类似于numpy中的一维数组,除了通吃一维数组可用的函数或方法,而且其 ...
python数据分析三个重要方法之:numpy和pandas
关于数据分析的组件之一:numpy ndarray的属性 4个必记参数:ndim:维度shape:形状(各维度的长度)size:总长度dtype:元素类型一:np.array()产生n维 ...

随机推荐

Apache Maven（六）：存储库
Maven 存储库主要是存放一些第三方依赖jar包等. 严格来说,只有两种存储库:本地和远程,本地存储库是指您远程下载到本地的一个缓存,还包含尚未发布的临时构建文件.远程存储库是指一些可以通过各种协议 ...
python 摘要算法
一.概述: 摘要算法主要特征是加密过程不需要密钥,并且加密的数据无法解密,只有输入相同的明文数据经过相同的摘要算法才能得到相同的密文.摘要算法主要应用在“数字签名”领域.接下来会讲述RSA公司的MD5 ...
redis之哨兵(Sentinel)
Redis-Sentinel是redis官方推荐的高可用性解决方案,当用redis作master-slave的高可用时,如果master本身宕机,redis本身或者客户端都没有实现主从切换的功能. 而 ...
让UltraEdit-32成为Delphi 7编译器的工具设置
UltraEdit-32编译Delphi的工具设置 {================================================}Dcc32 命令行(&C):C:\Pro ...
python函数的返回值
返回值:return1.没有返回值 #不写return #只写return:结束一个函数 #return None2.有一个返回值 #可以返回任何数据类型 #只要返回就可 ...
ctf题目writeup（8）
2019.2.11 南京邮电的ctf平台: 地址http://ctf.nuptzj.cn/challenges# 他们好像搭新的平台了...我注册弄了好半天... 1. 签到题,打开网址: 查看一下页 ...
node解析post表单信息
一共有4种解析方式 urlencoded.json.text .raw 发起请求的form表单中可以设置三种数据编码方式 application/x-www-form-urlencoded.multi ...
Sql Server 表间对应关系
<1>.关联映射:一对多/多对一存在最普遍的映射关系,简单来讲就如球员与球队的关系:一对多:从球队角度来说一个球队拥有多个球员即为一对多多对一:从球员角度来说多个球员属于一个球队即为 ...
sqoop 的使用 -20160410
1 导入导出数据库 1)列出mysql数据库中的所有数据库命令 # sqoop list-databases --connect jdbc:mysql://localhost:3306/ - ...
基于jersey和Apache Tomcat构建Restful Web服务（一）
基于jersey和Apache Tomcat构建Restful Web服务(一) 现如今,RESTful架构已然成为了最流行的一种互联网软件架构,它结构清晰.符合标准.易于理解.扩展方便,所以得到越来 ...

Python数据分析（三）pandas resample 重采样

Python数据分析（三）pandas resample 重采样的更多相关文章

随机推荐

热门专题