利用Python进行数据分析_Pandas

申明：本系列文章是自己在学习《利用Python进行数据分析》这本书的过程中，为了方便后期自己巩固知识而整理。

首先，需要导入pandas库的Series和DataFrame

In [21]: from pandas import Series,DataFrame

In [22]: import pandas as pd

Series

是一种类似一维数组的对象，是一组数据与索引的组合。如果没设置索引，默认会加上。

In [23]: obj = Series([4,3,5,7,8,1,2])

In [24]: obj

Out[24]:

0    4

1    3

2    5

3    7

4    8

5    1

6    2

dtype: int64

自定义索引

In [28]: obj = Series([4,3,2,1],index=['a','b','c','d'])

In [29]: obj

Out[29]:

a    4

b    3

c    2

d    1

dtype: int64

获取values和index的值

In [30]: obj.index

Out[30]: Index(['a', 'b', 'c', 'd'], dtype='object')

In [31]: obj.values

Out[31]: array([4, 3, 2, 1], dtype=int64)

通过索引获取Series的元素值

In [32]: obj['c']

Out[32]: 2

还能当字典

In [33]: if 'a' in obj:

    ...:     print("a在对象里！")

    ...:

a在对象里！

也能将字段转换成Series对象（有序）

In [56]: data = {'a':1,'b':2,'c':3,'d':4}

In [57]: obj = Series(data)

In [58]: obj
Out[58]:
a    1
b    2
c    3
d    4
dtype: int64

In [59]: data = {'a':1,'b':2,'d':3,'c':4}

In [60]: obj = Series(data)

In [61]: obj
Out[61]:
a    1
b    2
c    4
d    3
dtype: int64

字典data中，我加一个index会怎样？

In [72]: datas = {'a','b','d','c','e'}

In [73]: objs = Series(data,index=datas)

In [74]: objs

Out[74]:

c    4.0

e    NaN

b    2.0

d    3.0

a    1.0

dtype: float64

isnull 检测缺失

In [75]: pd.isnull(objs)

Out[75]:

c    False

e     True

b    False

d    False

a    False

dtype: bool

notnull 检测不缺失

In [76]: pd.notnull(objs)

Out[76]:

c     True

e    False

b     True

d     True

a     True

dtype: bool

Series的检测缺失方法

In [78]: objs.isnull()

Out[78]:

c    False

e     True

b    False

d    False

a    False

dtype: bool

In [79]: objs.notnull()

Out[79]:

c     True

e    False

b     True

d     True

a     True

dtype: bool

DataFrame

DataFrame 是表格型数据结构，含有一组有序的列。

In [86]: data = {'class':['语文','数学','英语'],'score':[120,130,140]}

In [87]: frame = DataFrame(data)

In [88]: frame

Out[88]:

  class  score

0    语文    120

1    数学    130

2    英语    140

In [95]: frame = DataFrame(data)

In [96]: frame

Out[96]:

  class  score

0    语文    120

1    数学    130

2    英语    140

按指定序列进行排序

In [98]: DataFrame(data,columns={'score','class'})

Out[98]:

   score class

0    120    语文

1    130    数学

2    140    英语

NaN补充

In [99]: DataFrame(data,columns={'score','class','teacher'})

Out[99]:

   score class teacher

0    120    语文     NaN

1    130    数学     NaN

2    140    英语     NaN

给NaN批量赋值

方法一：

In [107]: frame['teacher'] = '周老师'

In [108]: frame

Out[108]:

   score class teacher

0    120    语文     周老师

1    130    数学     周老师

2    140    英语     周老师

方法二：

In [110]: frame.teacher = '应老师'

In [111]: frame

Out[111]:

   score class teacher

0    120    语文     应老师

1    130    数学     应老师

2    140    英语     应老师

通过字典标记的方式，可以将DataFrame的列转成一个Series

In [112]: frame.teacher

Out[112]:

0    应老师

1    应老师

2    应老师

Name: teacher, dtype: object

将列表或数组赋值给Frame的某一列

In [114]: val = Series(['周老师','应老师','小周周'],index=[0,1,2])

In [115]: frame['teacher'] = val

In [116]: frame

Out[116]:

   score class teacher

0    120    语文     周老师

1    130    数学     应老师

2    140    英语     小周周

为Frame创建一个新的列

In [125]: frame['yesorno'] =0

In [126]: frame

Out[126]:

   score class teacher  yesorno

0  False    语文     周老师        0

1   True    数学     应老师        0

2  False    英语     小周周        0

创建一个新列，并赋值一个布尔类型的Series

In [119]: frame['yesorno'] = frame.teacher == '应老师'

In [120]: frame

Out[120]:

   score class teacher  yesorno

0  False    语文     周老师    False

1   True    数学     应老师     True

2  False    英语     小周周    False

删除Frame的列

In [122]: del frame['yesorno']

In [123]: frame

Out[123]:

   score class teacher

0  False    语文     周老师

1   True    数学     应老师

2  False    英语     小周周

嵌套字典

外层字典的键作为Frame的列，内层键作为行索引。

In [10]: from pandas import DataFrame,Series

In [11]: data = {'a':{'aa':2,'aaa':3},'b':{'bb':4,'bbb':5}}

In [12]: frame = DataFrame(data)

In [13]: frame

Out[13]:

       a    b

aa   2.0  NaN

aaa  3.0  NaN

bb   NaN  4.0

bbb  NaN  5.0

索引对象

pandas的索引index其实也是一个对象。由index类继承而衍生出来的还有Int64Index\MultiIndex\DatetimeIndex\PeriodIndex等。

In [31]: frame.index

Out[31]: Index(['aa', 'aaa', 'bb', 'bbb'], dtype='object')

index对象有以下属性（方法）：

insert(i,str)属性的使用案例：

In [31]: frame.index

Out[31]: Index(['aa', 'aaa', 'bb', 'bbb'], dtype='object')

In [32]: frame.index.insert(5,'fff')

Out[32]: Index(['aa', 'aaa', 'bb', 'bbb', 'fff'], dtype='object')