python pandas.Series&&DataFrame&& set_index&reset

参考CookBook :http://pandas.pydata.org/pandas-docs/stable/cookbook.html
Pandas set_index&reset_index

Pandas模块是Python用于数据导入及整理的模块，对数据挖掘前期数据的处理工作十分有用，因此这些基础的东西还是要好好的学学。Pandas模块的数据结构主要有两：1、Series ；2、DataFrame
先了解一下Series结构。

a.创建
a.1、pd.Series([list]，index=[list])//以list为参数，参数为一list;index为可选参数，若不填则默认index从0开始；若添则index长度与value长度相等

import pandas as pd

s=pd.Series([1,2,3,4,5],index= ['a','b','c','f','e'])

print(s)

a    1

b    2

c    3

f    4

e    5

dtype: int64

s=pd.Series({'a':3,'b':4,'c':5,'f':6,'e':8})

print(s)

a    3

b    4

c    5

e    8

f    6

dtype: int64

import numpy as np

v=np.random.random_sample(50)

s=pd.Series(v)

print (s.head())

print (s.tail(3))

0    0.785486

1    0.272487

2    0.182683

3    0.196650

4    0.654694

dtype: float64

47    0.701705

48    0.897344

49    0.478941

dtype: float64

Series相当于数组numpy.array类似

pandas中的isnull和notnull函数可以用于检测缺失数据

Series最重要的一个功能是：它在算术运算中会自动对齐不同索引的数据。

Series对象本身及其索引都有一个name属性，该属性跟pandas其他的关键功能关系非常密切

DataFrame相当于有表格，有行表头和列表头

a=pd.DataFrame(np.random.rand(4,5),index=list("ABCD"),columns=list('abcde'))

print (a)

          a         b         c         d         e

A  0.484914  0.256542  0.702622  0.997324  0.834293

B  0.802564  0.660622  0.246160  0.936310  0.841891

C  0.073188  0.369238  0.631770  0.967714  0.950021

D  0.136728  0.270609  0.102326  0.343002  0.789243

#增加列或修改列

a['f']=[1,2,3,4]

a['e']=10

print(a)

          a         b         c         d   e  f

A  0.484914  0.256542  0.702622  0.997324  10  1

B  0.802564  0.660622  0.246160  0.936310  10  2

C  0.073188  0.369238  0.631770  0.967714  10  3

D  0.136728  0.270609  0.102326  0.343002  10  4

#增加行或修改行

a.ix['D']=10

print(a)

           a          b          c          d   e   f

A   0.484914   0.256542   0.702622   0.997324  10   1

B   0.802564   0.660622   0.246160   0.936310  10   2

C   0.073188   0.369238   0.631770   0.967714  10   3

D  10.000000  10.000000  10.000000  10.000000  10  10

E:\Program Files\Anaconda3\envs\tensorflow_py35\lib\site-packages\ipykernel\__main__.py:2: DeprecationWarning:

.ix is deprecated. Please use

.loc for label based indexing or

.iloc for positional indexing

See the documentation here:

http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated

  from ipykernel import kernelapp as app

print (a[['b','e']]) #取'b','e'列

print (a.loc['A':'D',['a','c','f']]) #取'A'-'D'行'a','c','f'列

           b   e

A   0.256542  10

B   0.660622  10

C   0.369238  10

D  10.000000  10

           a          c   f

A   0.484914   0.702622   1

B   0.802564   0.246160   2

C   0.073188   0.631770   3

D  10.000000  10.000000  10

#减少行或减少列

a=a.drop(['C','D']) #删除'C'行和'D'

print (a)

a=a.drop('a',axis=1) #删除'a'列,axis=0表示行，axis=1表示列

print(a)

          a         b         c         d   e  f

A  0.484914  0.256542  0.702622  0.997324  10  1

B  0.802564  0.660622  0.246160  0.936310  10  2

          b         c         d   e  f

A  0.256542  0.702622  0.997324  10  1

B  0.660622  0.246160  0.936310  10  2

#缺省值处理

a=pd.DataFrame(np.random.rand(4,6),index=list('EFGH'),columns=list('abcdef'))

print(a)

a.iloc[2,3]=None #取第三行第4列值设为None

a.iloc[3,0]=None #取第五行第1列值设为None

print(a)

          a         b         c         d         e         f

E  0.559810  0.470429  0.966709  0.096261  0.220432  0.878908

F  0.567841  0.237288  0.117921  0.604651  0.055591  0.272852

G  0.267982  0.053754  0.410986  0.310045  0.058950  0.773051

H  0.595787  0.932286  0.839897  0.757793  0.554378  0.417178

          a         b         c         d         e         f

E  0.559810  0.470429  0.966709  0.096261  0.220432  0.878908

F  0.567841  0.237288  0.117921  0.604651  0.055591  0.272852

G  0.267982  0.053754  0.410986       NaN  0.058950  0.773051

H       NaN  0.932286  0.839897  0.757793  0.554378  0.417178

#缺省值处理

a=a.fillna(5)  #缺省值处(即NaN处填充为5)

print (a)

#缺省值去行即有缺省值的把这一行都去掉

a.iloc[2,3]=None

a.iloc[3,0]=None

print (a)

a=a.dropna() #删除缺省值为NaN的行

print (a)

          a         b         c         d         e         f

E  0.559810  0.470429  0.966709  0.096261  0.220432  0.878908

F  0.567841  0.237288  0.117921  0.604651  0.055591  0.272852

G  0.267982  0.053754  0.410986  5.000000  0.058950  0.773051

H  5.000000  0.932286  0.839897  0.757793  0.554378  0.417178

          a         b         c         d         e         f

E  0.559810  0.470429  0.966709  0.096261  0.220432  0.878908

F  0.567841  0.237288  0.117921  0.604651  0.055591  0.272852

G  0.267982  0.053754  0.410986       NaN  0.058950  0.773051

H       NaN  0.932286  0.839897  0.757793  0.554378  0.417178

          a         b         c         d         e         f

E  0.559810  0.470429  0.966709  0.096261  0.220432  0.878908

F  0.567841  0.237288  0.117921  0.604651  0.055591  0.272852

python pandas.DataFrame选取、修改数据最好用.loc，.iloc，.ix :

那么这三种选取数据的方式该怎么选择呢？

一、当每列已有column name时，用 df [ 'a' ] 就能选取出一整列数据。如果你知道column names 和index，且两者都很好输入，可以选择 .loc

df.loc[0, 'a']
df.loc[0:3, ['a', 'b']]
df.loc[[1, 5], ['b', 'c']]

由于这边我们没有命名index，所以是DataFrame自动赋予的，为数字0-9

二、如果我们嫌column name太长了，输入不方便，有或者index是一列时间序列，更不好输入，那就可以选择 .iloc了。这边的 i 我觉得代表index，比较好记点。

df.iloc[1,1]
df.iloc[0:3, [0,1]]
df.iloc[[0, 3, 5], 0:2]

iloc 使得我们可以对column使用slice（切片）的方法对数据进行选取。

三、.ix 的功能就更强大了，它允许我们混合使用下标和名称进行选取。可以说它涵盖了前面所有的用法。基本上把前面的都换成df.ix 都能成功，但是有一点，就是df.ix [ [ ..1.. ], [..2..] ], 1框内必须统一，必须同时是下标或者名称

df=pd.DataFrame({"A":[1,2,3,4],"B":[5,6,7,8],"C":[1,1,1,1]})

df.ix[df.A>1,'B']= -1

print (df)

   A  B  C

0  1  5  1

1  2 -1  1

2  3 -1  1

3  4 -1  1

df=pd.DataFrame({"A":[1,2,3,4],"B":[5,6,7,8],"C":[1,1,1,1]})

df["then"]=np.where(df.A<3,1,0)

print (df)

   A  B  C  then

0  1  5  1     1

1  2  6  1     1

2  3  7  1     0

3  4  8  1     0

df=pd.DataFrame({"A":[1,2,3,4],"B":[5,6,7,8],"C":[1,1,1,1]})

df=df.loc[df.A>2]

print (df)

   A  B  C

2  3  7  1

3  4  8  1

DataFrame可以通过set_index方法，可以设置单索引和复合索引。

reset_index可以还原索引，从新变为默认的整型索引。

python pandas.Series&&DataFrame&& set_index&reset_index的更多相关文章

python. pandas(series,dataframe,index) method test
python. pandas(series,dataframe,index,reindex,csv file read and write) method test import pandas as ...
python pandas ---Series,DataFrame 创建方法,操作运算操作(赋值,sort,get,del,pop,insert,+,-,*,/)
pandas 是基于 Numpy 构建的含有更高级数据结构和工具的数据分析包 pandas 也是围绕着 Series 和 DataFrame 两个核心数据结构展开的, 导入如下: from panda ...
Python Pandas -- Series
pandas.Series class pandas.Series(data=None, index=None, dtype=None, name=None, copy=False, fastpath ...
python基础：如何使用python pandas将DataFrame转换为dict
之前在知乎上看到有网友提问,如何将DataFrame转换为dict,专门研究了一下,pandas在0.21.0版本中是提供了这个方法的.下面一起学习一下,通过调用help方法,该方法只需传入一个参数, ...
pandas数据结构：Series/DataFrame；python函数：range/arange
1. Series Series 是一个类数组的数据结构,同时带有标签(lable)或者说索引(index). 1.1 下边生成一个最简单的Series对象,因为没有给Series指定索引,所以此时会 ...
利用Python进行数据分析：【Pandas】（Series+DataFrame）
一.pandas简单介绍 1.pandas是一个强大的Python数据分析的工具包.2.pandas是基于NumPy构建的.3.pandas的主要功能 --具备对其功能的数据结构DataFrame.S ...
python 数据处理学习pandas之DataFrame
请原谅没有一次写完,本文是自己学习过程中的记录,完善pandas的学习知识,对于现有网上资料的缺少和利用python进行数据分析这本书部分知识的过时,只好以记录的形势来写这篇文章.最如果后续工作定下来 ...
【跟着stackoverflow学Pandas】 - Adding new column to existing DataFrame in Python pandas - Pandas 添加列
最近做一个系列博客,跟着stackoverflow学Pandas. 以 pandas作为关键词,在stackoverflow中进行搜索,随后安照 votes 数目进行排序: https://stack ...
pandas set_index() reset_index()
set_index() 官方定义: 使用一个或多个现有列设置索引, 默认情况下生成一个新对象 DataFrame.set_index(keys, drop=True, append=False, ...

随机推荐

网络请求+Gson解析--Retrofit 2
其实内部是封装了Okhttp和Gson解析 public class CourseFragmentAPI { public static void get(String userId, BaseCal ...
CSU - 2056 a simple game
Description 这一天,小A和小B在玩一个游戏,他俩每人都有一个整数,然后两人轮流对他们的整数进行操作,每次在下列两个操作任选一个: (1)对整数进行翻转,如1234翻转成4321 ,1200 ...
安恒月赛WP
一月一叶飘零大佬的WP:安恒月赛一月二进制部分:zjgcjy大佬的WP reverse1更容易理解的一种解法 pwn1详解二月一叶飘零WP 二进制部分: reverse Pwn 三月 ...
.NET Core2.1下采用EFCore比较原生IOC、AspectCore、AutoFac之间的性能
一.前言 ASP.NET Core本身已经集成了一个轻量级的IOC容器,开发者只需要定义好接口后,在Startup.cs的ConfigureServices方法里使用对应生命周期的绑定方法即可,常见方 ...
windows svn 客户端连不上linux svn server
采坑记录:linux服务器上svn://127.0.0.1可以正常使用,windows客户端远程连接不上,说明是端口号的问题. linux正常配置了iptables开启了3690端口,连接不上. 干脆 ...
JZYZOJ 2002 [cf] 石江豪pk李震博弈论 sg函数
http://172.20.6.3/Problem_Show.asp?id=2002 https://blog.csdn.net/qq_24451605/article/details/5015497 ...
Jenkins 使用 maven 出现C:\Windows\system32\config\systemprofile的解决
jenkins 使用 maven 出现 C:\Windows\system32\config\systemprofile 的原因是 Jenkins 服务启动的账号使用了系统的账号,在服务里改成具体的桌 ...
BZOJ 3751: [NOIP2014]解方程数学
3751: [NOIP2014]解方程题目连接: http://www.lydsy.com/JudgeOnline/problem.php?id=3751 Description 已知多项式方程: ...
opencv 支持向量机SVM分类器
支持向量机SVM是从线性可分情况下的最优分类面提出的.所谓最优分类,就是要求分类线不但能够将两类无错误的分开,而且两类之间的分类间隔最大,前者是保证经验风险最小(为0),而通过后面的讨论我们看到,使分 ...
双频无线网安装设置（5g ） for linux
为了在局域网实现远程wifi调试,例如调试需要图像数据传输,则需要搭建局域网5g无线网络. 1.硬件要求 a. TP-Link(型号:TL-WDR6500,AC1300双频无线路由器,支持5g,2.4 ...

python pandas.Series&&DataFrame&& set_index&reset_index

参考CookBook :http://pandas.pydata.org/pandas-docs/stable/cookbook.html

Pandas set_index&reset_index

python pandas.Series&&DataFrame&& set_index&reset_index的更多相关文章

随机推荐

热门专题