import numpy as np
import pandas as pd

引入

A basic kind of time series object in pandas is a Series indexed by timestamps, which is often represented external to pandas as Python string or datetime objects:

from datetime import datetime
dates = [
datetime(2011, 1, 2),
datetime(2011, 1, 5),
datetime(2011, 1, 7),
datetime(2011, 1, 8),
datetime(2011, 1, 10),
datetime(2011, 1, 12)
] ts = pd.Series(np.random.randn(6), index=dates) ts
2011-01-02    0.825502
2011-01-05 0.453766
2011-01-07 0.077024
2011-01-08 -1.320742
2011-01-10 -1.109912
2011-01-12 -0.469907
dtype: float64

Under the hood, these datetime objects have been put in a DatetimeIndex:

ts.index
DatetimeIndex(['2011-01-02', '2011-01-05', '2011-01-07', '2011-01-08',
'2011-01-10', '2011-01-12'],
dtype='datetime64[ns]', freq=None)

Like other Series, arithmetic operations between differently indexed time series auto-matically align(自动对齐) on the dates:

ts + ts[::2]
2011-01-02    1.651004
2011-01-05 NaN
2011-01-07 0.154049
2011-01-08 NaN
2011-01-10 -2.219823
2011-01-12 NaN
dtype: float64

Recall that ts[::2] selects every second element in ts:

pandas stores timestamp using NumPy's datetime64 data type the nanosecond resolution:

ts.index.dtype
dtype('<M8[ns]')

Scalar values from a DatetimeIndex are Timestamp object:

stamp = ts.index[0]

stamp
Timestamp('2011-01-02 00:00:00')

A Timestamp can be substituted(被替代) anywhere you would use a datetime object. Additionally, it can store frequency information(if any) and understands how to do time zone conversions and other kinds of manipulations. More on both of these things later.

(各种转换操作, 对于时间序列)

索引-切片

Time series behaves like any other pandas.Series when you are indexing and selecting data based on label:

stamp = ts.index[2]

ts[stamp]
0.0770243257021936

As a convenience, you can also pass a string that is interpretable as a date:

ts['1/10/2011']
-1.109911691867437
ts['20110110']
-1.109911691867437

For longer time series, a year or only a year and month can be passed to easly select slices of data:

longer_ts = pd.Series(np.random.randn(1000),
index=pd.date_range('1/1/2000', periods=1000)) longer_ts[:5]
2000-01-01    0.401394
2000-01-02 0.720214
2000-01-03 0.488505
2000-01-04 0.446179
2000-01-05 -2.129299
Freq: D, dtype: float64
longer_ts['2001'][:5]
2001-01-01    0.315472
2001-01-02 0.796386
2001-01-03 0.611503
2001-01-04 0.980799
2001-01-05 0.184401
Freq: D, dtype: float64

Here, the string '2001' is interpreted as a year and selects that time period. This also works if you speicify the month:

longer_ts['2001-05'][:5]
2001-05-01    0.439009
2001-05-02 -0.304236
2001-05-03 0.603268
2001-05-04 -0.726460
2001-05-05 -0.521669
Freq: D, dtype: float64
"Slicing with detetime objects works as well"

ts[datetime(2011, 1, 7):]
'Slicing with detetime objects works as well'

2011-01-07    0.077024
2011-01-08 -1.320742
2011-01-10 -1.109912
2011-01-12 -0.469907
dtype: float64

Because most time series data is ordered chrnologically(按年代顺序的), you can slice with time-stamps not contained in a time series to perform a range query:

ts
2011-01-02    0.825502
2011-01-05 0.453766
2011-01-07 0.077024
2011-01-08 -1.320742
2011-01-10 -1.109912
2011-01-12 -0.469907
dtype: float64
ts['1/6/2011': '1/11/2011']
2011-01-07    0.077024
2011-01-08 -1.320742
2011-01-10 -1.109912
dtype: float64

As before, you can pass either a string date, datetime or timestamp. Remember that slicing in this manner produces views on the source time series like slicing NumPy arrays. This means that no data is copied and modifications on the slice will be reflected in the orginal data.

There is an equivalent instance method,truncate that slices a Series between two dates:

ts.truncate(after='1/9/2011')
2011-01-02    0.825502
2011-01-05 0.453766
2011-01-07 0.077024
2011-01-08 -1.320742
dtype: float64

All of this holds true for DataFrame as well, indexing on its rows:

# periods: 多少个, freq: 间隔
dates = pd.date_range('1/1/2000', periods=100, freq='W-WED') long_df = pd.DataFrame(np.random.randn(100, 4),
index=dates,
columns=['Colorado', 'Texas', 'New York', 'Ohio']) long_df.loc['5-2001']

.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}

.dataframe tbody tr th {
vertical-align: top;
} .dataframe thead th {
text-align: right;
}
Colorado Texas New York Ohio
2001-05-02 0.972317 0.407519 0.628906 1.995901
2001-05-09 0.299961 -1.208505 1.019247 2.244728
2001-05-16 0.628163 -0.716498 0.621912 1.257635
2001-05-23 0.508852 0.753517 -0.793127 0.273496
2001-05-30 -1.443141 -0.878143 -0.680227 0.455401

重复索引

  • ts.is_unique
  • ts.groupby(level=0)

In some applications, there may be multiple data observations falling on a particular timestamp.Here is an example:

dates = pd.DatetimeIndex(['1/1/2000', '1/2/2000',
'1/2/2000', '1/2/2000', '1/3/2000'
]) dup_ts = pd.Series(np.arange(5), index=dates) dup_ts
2000-01-01    0
2000-01-02 1
2000-01-02 2
2000-01-02 3
2000-01-03 4
dtype: int32

We can tell that the index is not unique by checking its is_unique property:

dup_ts.index.is_unique
False

Indexing into this time series will now either produce scalar values or slice depending on whether a timestamp is duplicated:

dup_ts['1/3/2000']  # not duplicated
4
dup_ts['1/2/2000']  # duplicated
2000-01-02    1
2000-01-02 2
2000-01-02 3
dtype: int32

Suppose you wanted to aggregate the data having non-unique timestamps. One way to do this is use groupby and pass level=0

grouped = dup_ts.groupby(level=0)  # 没有level 会报错, 默认是None
grouped.mean()
2000-01-01    0
2000-01-02 2
2000-01-03 4
dtype: int32
grouped.count()
2000-01-01    1
2000-01-02 3
2000-01-03 1
dtype: int64

pandas 之 时间序列索引的更多相关文章

  1. 笔记 | pandas之时间序列学习随笔1

    1. 时间序列自动生成 ts = pd.Series(np.arange(1, 901), index=pd.date_range('2010-1-1', periods=900)) 最终生成了从20 ...

  2. pandas处理时间序列(2):DatetimeIndex、索引和选择、含有重复索引的时间序列、日期范围与频率和移位、时间区间和区间算术

    一.时间序列基础 1. 时间戳索引DatetimeIndex 生成20个DatetimeIndex from datetime import datetime dates = pd.date_rang ...

  3. pandas处理时间序列(3):重采样与频率转换

    五.重采样与频率转换 1. resample方法 rng = pd.date_range('1/3/2019',periods=1000,freq='D') rng 2. 降采样 (1)resampl ...

  4. 03. Pandas 2| 时间序列

    1.时间模块:datetime datetime模块,主要掌握:datetime.date(), datetime.datetime(), datetime.timedelta() 日期解析方法:pa ...

  5. pandas处理时间序列(1):pd.Timestamp()、pd.Timedelta()、pd.datetime( )、 pd.Period()、pd.to_timestamp()、datetime.strftime()、pd.to_datetime( )、pd.to_period()

      Pandas库是处理时间序列的利器,pandas有着强大的日期数据处理功能,可以按日期筛选数据.按日期显示数据.按日期统计数据.   pandas的实际类型主要分为: timestamp(时间戳) ...

  6. pandas之时间序列(data_range)、重采样(resample)、重组时间序列(PeriodIndex)

    1.data_range生成时间范围 a) pd.date_range(start=None, end=None, periods=None, freq='D') start和end以及freq配合能 ...

  7. pandas处理时间序列(4): 移动窗口函数

    六.移动窗口函数 移动窗口和指数加权函数类别如↓: rolling_mean 移动窗口的均值 pandas.rolling_mean(arg, window, min_periods=None, fr ...

  8. pandas之时间序列

    Pandas中提供了许多用来处理时间格式文本的方法,包括按不同方法生成一个时间序列,修改时间的格式,重采样等等. 按不同的方法生成时间序列 In [7]: import pandas as pd # ...

  9. pandas基础用法——索引

    # -*- coding: utf-8 -*- # Time : 2016/11/28 15:14 # Author : XiaoDeng # version : python3.5 # Softwa ...

随机推荐

  1. gevent实现协程

    gevent的好处:能够自动识别程序中的耗时操作,在耗时的时候自动切换到其他任务 # gevent的好处:能够自动识别程序中的耗时操作,在耗时的时候自动切换到其他任务 from gevent impo ...

  2. Matplotlib Date Index Formatter 日期索引格式化学习

    官方网站:https://matplotlib.org/gallery/ticks_and_spines/date_index_formatter2.html#sphx-glr-gallery-tic ...

  3. 『008』Zabbix

    『006』索引-Monitoring Zabbix [001]- 点我快速打开文章[001-Zabbix 服务安装] [002]- 点我快速打开文章[002-Zabbix 前端配置] 更新中

  4. CUDA 与 OpenGL 的互操作

    CUDA 与 OpenGL 的互操作一般是使用CUDA生成数据,然后在OpenGL中渲染数据对应的图形.这两者的结合有两种方式: 1.使用OpenGL中的PBO(像素缓冲区对象).CUDA生成像素数据 ...

  5. Tensorflow之MNIST手写数字识别:分类问题(1)

    一.MNIST数据集读取 one hot 独热编码独热编码是一种稀疏向量,其中:一个向量设为1,其他元素均设为0.独热编码常用于表示拥有有限个可能值的字符串或标识符优点:   1.将离散特征的取值扩展 ...

  6. Markdown & LaTex 常用语法

    目录 blog 的目录 博客园自带目录 用 javascript 自定义目录 主标题 副标题 h1,一级标题 h2,二级标题 h3,三级标题 注释 常用的符号及文本形式 如果你想在markdown中文 ...

  7. MongoDB介绍(一)

    MongoDB是一个基于分布式文件存储的数据库.由C++语言编写.旨在为WEB应用提供可扩展的高性能数据存储解决方案. MongoDB是一个介于关系数据库和非关系数据库之间的产品,是非关系数据库当中功 ...

  8. NOIP模拟赛 最佳组合

    题目描述 Description \(Bzeroth\) 大陆最终还是覆灭了,所以你需要为地灾军团服务了. 地灾军团军师黑袍不擅长写题面,所以你只需要看简化版的题意即可. 给定 \(3\) 个长度均为 ...

  9. Python连载18-closure闭包解释及其注意点

    一.闭包 1.定义:当一个函数在内部定义函数,并且内部的函数应用外部函数的参数或者局部变量,当内部函数被当做返回值的时候,相关参数和变量保存在返回的函数之中,这种结果,叫做闭包. 2.例子:连载17中 ...

  10. 九、Spring之BeanFactory源码分析(一)

    Spring之BeanFactory源码分析(一) ​ 注意:该随笔内容完全引自https://blog.csdn.net/u014634338/article/details/82865644,写的 ...