pandas知识点（数据结构）

1.Series

生成一维数组，左边索引，右边值：

In [3]: obj = Series([1,2,3,4,5])

In [4]: obj

Out[4]:

0    1

1    2

2    3

3    4

4    5

dtype: int64

In [5]: obj.values

Out[5]: array([1, 2, 3, 4, 5], dtype=int64)

In [6]: obj.index

Out[6]: RangeIndex(start=0, stop=5, step=1)

创建对各个数据点进行标记的索引：

In [7]: obj2 = Series([4,1,9,7], index=["a","c","e","ff"])

In [8]: obj2

Out[8]:

a     4

c     1

e     9

ff    7

dtype: int64

In [9]: obj2.index

Out[9]: Index(['a', 'c', 'e', 'ff'], dtype='object')

取一个值或一组值：

In [10]: obj2["c"]

Out[10]: 1

In [11]: obj2[["c","e"]]

Out[11]:

c    1

e    9

dtype: int64

数组运算，会显示索引：

In []: obj2[obj2>]

Out[]:

a

e

ff

dtype: int64

Series还可以看作有序的字典，很多字典操作可以使用：

In [13]: "c" in obj2

Out[13]: True

直接用字典创建Series：

In [14]: data = {"name":"liu","year":18,"sex":"man"}

In [15]: obj3 = Series(data)

In [16]: obj3

Out[16]:

name    liu

year     18

sex     man

dtype: object

用字典结合列表创建Series：

In [17]: list1 = ["name","year","mobile"]

In [18]: obj4 = Series(data,index=list1)

In [19]: obj4

Out[19]:

name      liu

year       18

mobile    NaN

dtype: object

PS：因为data字典中没有mobile所以值为NaN

检测数据是否缺失：

In [20]: pd.isnull(obj4)

Out[20]:

name      False

year      False

mobile     True

dtype: bool

In [21]: pd.notnull(obj4)

Out[21]:

name       True

year       True

mobile    False

dtype: bool

In [22]: obj4.isnull()

Out[22]:

name      False

year      False

mobile     True

dtype: bool

In [23]: obj4.notnull()

Out[23]:

name       True

year       True

mobile    False

dtype: bool

Series的name属性：

In [7]: obj4.name = "hahaha"

In [8]: obj4.index.name = "state"

In [9]: obj4

Out[9]:

state

name      liu

year       18

mobile    NaN

Name: hahaha, dtype: object

2.DataFrame

构建DataFrame

In [13]: data = {

"state":[1,1,2,1,1],

"year":[2000,2001,2002,2004,2005],

"pop":[1.5,1.7,3.6,2.4,2.9]

}

In [14]: frame = DataFrame(data)

In [15]: frame

Out[15]:

   state  year  pop

0      1  2000  1.5

1      1  2001  1.7

2      2  2002  3.6

3      1  2004  2.4

4      1  2005  2.9

设定行与列的名称，如果数据找不到则产生NA值：

In [18]: frame2 = DataFrame(

data,

columns=["year","state","pop","debt"],

index=["one","two","three","four","five"]

)

In [19]: frame2

Out[19]:

       year  state  pop debt

one    2000      1  1.5  NaN

two    2001      1  1.7  NaN

three  2002      2  3.6  NaN

four   2004      1  2.4  NaN

five   2005      1  2.9  NaN

将DataFrame的列获取成为Series：

In [7]: frame2.year

Out[7]:

one      2000

two      2001

three    2002

four     2004

five     2005

Name: year, dtype: int64

PS：返回的索引不变，且name属性被设置了

获取行：

In [11]: frame2.loc["three"]

Out[11]:

year     2002

state       2

pop       3.6

debt      NaN

Name: three, dtype: object

赋值列：

In [12]: frame2['debt'] = 16.5

In [13]: frame2

Out[13]:

       year  state  pop  debt

one    2000      1  1.5  16.5

two    2001      1  1.7  16.5

three  2002      2  3.6  16.5

four   2004      1  2.4  16.5

five   2005      1  2.9  16.5

如果赋值列表或数组，长度需要相等；如果赋值Series，则精确匹配索引

In [17]: val = Series([1.2,1.5,1.7], index=["two","four","five"])

In [18]: frame2['debt'] = val

In [19]: frame2

Out[19]:

       year  state  pop  debt

one    2000      1  1.5   NaN

two    2001      1  1.7   1.2

three  2002      2  3.6   NaN

four   2004      1  2.4   1.5

five   2005      1  2.9   1.7

如果列不存在，则创建：

In [21]: frame2["eastern"] = frame2.state == 1

In [22]: frame2

Out[22]:

       year  state  pop  debt  eastern

one    2000      1  1.5   NaN     True

two    2001      1  1.7   1.2     True

three  2002      2  3.6   NaN    False

four   2004      1  2.4   1.5     True

five   2005      1  2.9   1.7     True

对于嵌套字典，DataFrame会解释为外层为列，内层为行索引：

In [23]: dic = {"name":{"one":"liu","two":"rui"},"year":{"one":"","two":""}}

In [24]: frame3 = DataFrame(dic)

In [25]: frame3

Out[25]:

    name year

one  liu   23

two  rui   22

显示行，列名：

In [26]: frame3.index.name = "index"

In [27]: frame3.columns.name = "state"

In [28]: frame3

Out[28]:

state name year

index

one    liu   23

two    rui   22

返回二维ndarray形式的数据：

In [29]: frame3.values

Out[29]:

array([['liu', ''],

       ['rui', '']], dtype=object)

3.索引对象

In [30]: obj = Series(range(3),index=["a","b","c"])

In [31]: index = obj.index

In [32]: index

Out[32]: Index(['a', 'b', 'c'], dtype='object')

index对象不可修改的，使得index在多个数据结构中可以共享

In [35]: index = pd.Index(np.arange(3))

In [36]: obj2 = Series([1.5,0.5,2],index=index)

In [37]: obj2.index is index

Out[37]: True

pandas知识点（数据结构）的更多相关文章

机器学习-Pandas 知识点汇总(吐血整理)
Pandas是一款适用很广的数据处理的组件,如果将来从事机械学习或者数据分析方面的工作,咱们估计70%的时间都是在跟这个框架打交道.那大家可能就有疑问了,心想这个破玩意儿值得花70%的时间吗?咱不是还 ...
Pandas 的数据结构
Pandas的数据结构导入pandas: 三剑客 from pandas import Series,DataFrame import pandas as pd import numpy as np ...
pandas的数据结构之series
Pandas的数据结构 1.Series Series是一种类似于一维数组的对象,由下面两个部分组成: index:相关的数据索引标签 values:一组数据(ndarray类型) series的创建 ...
Python数据分析--Pandas知识点(三)
本文主要是总结学习pandas过程中用到的函数和方法, 在此记录, 防止遗忘. Python数据分析--Pandas知识点(一) Python数据分析--Pandas知识点(二) 下面将是在知识点一, ...
Pandas的使用（3）---Pandas的数据结构
Pandas的使用(3) Pandas的数据结构 1.Series 2.DataFrame
Pandas之数据结构
pandas入门由于最近公司要求做数据分析,pandas每天必用,只能先跳过numpy的学习,先学习大Pandas库 Pandas是基于Numpy构建的,让以Numpy为中心的应用变得更加简单 pa ...
Python数据分析--Pandas知识点(二)
本文主要是总结学习pandas过程中用到的函数和方法, 在此记录, 防止遗忘. Python数据分析--Pandas知识点(一) 下面将是在知识点一的基础上继续总结. 13. 简单计算新建一个数据表 ...
pandas知识点脑图汇总
参考文献: [1]Pandas知识点脑图汇总
pandas中数据结构-Series
pandas中数据结构-Series pandas简介 Pandas是一个开源的,BSD许可的Python库,为Python编程语言提供了高性能,易于使用的数据结构和数据分析工具.Python与Pan ...

随机推荐

arch搭建SVN服务器
一.安装 Install the package Install subversion from the official repositories. Create a repository Crea ...
DevOps的工程化
孙敬云 --Worktile高级系统架构师,WTC成员 1.研发的困境互联网的环境互联网这个环境比较特别,包括现在不只是互联网,就算是被互联网赋能的这些“互联网+”的企业也在改变,用户在发生变化, ...
Java中的==和equals区别
概述: A.==可用于基本类型和引用类型:当用于基本类型时候,是比较值是否相同:当用于引用类型的时候,是比较对象是否相同. B.对于String a = “a”; Integer b = 1;这种类型 ...
WPF Virtualization
WPF虚拟化技术分为UI 虚拟化和数据虚拟化第一种方法被称为"UI 虚拟化".支持虚拟化用户界面的控件是足够聪明来创建只显示的是实际在屏幕上可见的数据项目所需的 UI 元素.例如 ...
Aspx比较简单的登录
客户端 <form id="form1" runat="server"> <div> 用户名:<input type=" ...
jQuery读取和设定KindEditor值的方法
转载自:https://www.jb51.net/article/43595.htm 侵删 jQuery读取和设定KindEditor值的方法更新时间:2013年11月22日 09:03:56 ...
【C++函数重载】求3个数中最大的数(分别考虑整数、双精度数、长整数的情况)。
#include using namespace std; int main( ) { int max(int a,int b,int c); //函数声明 double max(double a,d ...
iOS-加载数据的实现-MJRefresh
使用CocoaPods加载三方库: pod 'MJRefresh' MJRefresh类结构图: 具体实现方法和效果图: The drop-down refresh 01-Default self.t ...
【来龙去脉系列】QRCode二维码的生成细节和原理
二维码又称QR Code,QR全称Quick Response,是一个近几年来移动设备上超流行的一种编码方式,它比传统的Bar Code条形码能存更多的信息,也能表示更多的数据类型:比如:字符,数字, ...
【MFC】将当前的日期转化为1970年开始的秒计数
CTime time1 = CTime::GetCurrentTime(); int nTSeconds = time1.GetTime(); CTime time2(,,,,,); nTSecond ...

pandas知识点（数据结构）

pandas知识点（数据结构）的更多相关文章

随机推荐

热门专题