pandas基础：Series与DataFrame操作

　　pandas包

　　# 引入包

　　import pandas as pd

　　import numpy as np

　　import matplotlib.pyplot as plt

　　Series

　　Series 是一维带标签的数组，数组里可以放任意的数据(整数，浮点数，字符串，Python Object)。其基本的创建函数是：

　　s = pd.Series(data, index=index)

　　其中 index 是一个列表，用来作为数据的标签。data 可以是不同的数据类型：

　　Python 字典

　　ndarray 对象

　　一个标量值，如 5

　　Series创建

　　s = pd.Series([1,3,5,np.nan,6,8])

　　Series日期创建

　　# 生成日期从2013-01-01 生成至 2013-01-06

　　dates = pd.date_range('20130101', periods=6)

　　# DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04','2013-01-05', '2013-01-06'], dtype='datetime64[ns]', freq='D')

　　Series创建列表

　　# 生成2列第一列： 012345678910 第二列： abbcdabacad

　　s = pd.Series(list('abbcdabacad'))

　　# 统计不同的列名

　　s.unique()

　　# 统计列名出现的次数

　　s.value_counts()

　　# 判断第一列是否在列表中

　　s.isin(['a', 'b', 'c'])

　　Series索引

　　# 两列一列abcde 一列5个随机数

　　s = pd.Series(np.random.rand(5), index=list('abcde'))

　　# s的列名(第一列)，是Index对象

　　s.index

　　# 添加一行 alpha

　　s.index.name = 'alpha'

　　# 返回所有第一列为'a'的值

　　s['a']

　　# 是否有重复的index

　　s.index.is_unique

　　# 返回不重复index

　　s.index.unique()

　　# 按index分组，求出每组和

　　s.groupby(s.index).sum()

　　DataFrame

　　DataFrame 是二维带行标签和列标签的数组。可以把 DataFrame 想你成一个 Excel 表格或一个 SQL 数据库的表格，还可以相像成是一个 Series 对象字典。它是 Pandas 里最常用的数据结构。

　　DataFrame创建

　　df = pd.DataFrame(np.random.randn(4, 6), index=list('ADFH'), columns=['one', 'two', 'three', 'four', 'five', 'six'])

　　# 添加index 如果该index没有对应值设为NaN

　　df2 = df.reindex(index=list('ABCDEFGH'))

　　# 重新设置col(行头)

　　df.reindex(columns=['one', 'three', 'five', 'seven'])

　　# 把NaN值设为默认的0

　　df.reindex(columns=['one', 'three', 'five', 'seven'], fill_value=0)

　　# fill method 只对行有效

　　df.reindex(columns=['one', 'three', 'five', 'seven'], method='ffill')

　　# 重置列index

　　df.reindex(index=list('ABCDEFGH'), method='ffill')

　　DataFrame操作

　　df = pd.DataFrame(np.random.randn(4, 6), index=list('ADFH'), columns=['one', 'two', 'three', 'four', 'five', 'six'])

　　# 所有index 为'A' col 为'one'的位置的值设置为100

　　df.loc['A']['one'] = 100

　　# 舍弃index 为'A'的行

　　df.drop('A')

　　# 舍弃columns 为 'two' 'four'的列

　　df2 = df.drop(['two', 'four'], axis=1)

　　# 数据拷贝

　　df.iloc[0, 0] = 100

　　# 获取index 为'one'的行

　　df.loc['one']

　　DataFrame计算郑州人流手术多少钱 http://mobile.chnk120.com/

　　df = pd.DataFrame(np.arange(12).reshape(4, 3), index=['one', 'two', 'three', 'four'], columns=list('ABC'))

　　# 每一列作为一个 Series 作为参数传递给 lambda 函数

　　df.apply(lambda x: x.max() - x.min())

　　# 每一行作为一个 Series 作为参数传递给 lambda 函数

　　df.apply(lambda x: x.max() - x.min(), axis=1)

　　# 返回多个值组成的 Series

　　def min_max(x):

　　return pd.Series([x.min(), x.max()], index=['min', 'max'])

　　df.apply(min_max, axis=1)

　　# applymap 逐元素计算每个值保留2位小数

　　formater = '{0:.02f}'.format

　　df.applymap(formater)

　　DataFrame列选择/增加/删除

　　df = pd.DataFrame(np.random.randn(6, 4), columns=['one', 'two', 'three', 'four'])

　　# 第三列为第一列加上第二列

　　df['three'] = df['one'] + df['two']

　　# 添加一个flag列大于0为True 否则为False

　　df['flag'] = df['one'] > 0

　　# 删除col为'three'的列

　　del df['three']

　　# 获取被删的

　　four = df.pop('three')

　　# 选取col为 five

　　df['five'] = 5

　　df['one_trunc'] = df['one'][:2]

　　# 指定插入位置

　　df.insert(1, 'bar', df['one'])

　　使用assign() 方法来插入新列

　　df = pd.DataFrame(np.random.randint(1, 5, (6, 4)), columns=list('ABCD'))

　　# 新列Ratio 值为 df['A'] / df['B']

　　df.assign(Ratio = df['A'] / df['B'])

　　# 新列AB_Ratio CD_Ratio 值为lambda表达式的值

　　df.assign(AB_Ratio = lambda x: x.A / x.B, CD_Ratio = lambda x: x.C - x.D)

　　DataFrame排序

　　df = pd.DataFrame(np.random.randint(1, 10, (4, 3)), index=list('ABCD'), columns=['one', 'two', 'three'])

　　# 按index 为one 排序

　　df.sort_values(by='one')

　　s.rank()

　　DataFrame操作

　　DataFrame 在进行数据计算时，会自动按行和列进行数据对齐。最终的计算结果会合并两个 DataFrame。

　　df1 = pd.DataFrame(np.random.randn(10, 4), index=list('abcdefghij'), columns=['A', 'B', 'C', 'D'])

　　df2 = pd.DataFrame(np.random.randn(7, 3), index=list('cdefghi'), columns=['A', 'B', 'C'])

　　df1 + df2

　　df1 - df1.iloc[0]

pandas基础：Series与DataFrame操作的更多相关文章

利用Python进行数据分析(7) pandas基础: Series和DataFrame的简单介绍
一.pandas 是什么 pandas 是基于 NumPy 的一个 Python 数据分析包,主要目的是为了数据分析.它提供了大量高级的数据结构和对数据处理的方法. pandas 有两个主要的数据结构 ...
利用Python进行数据分析(8) pandas基础: Series和DataFrame的基本操作
一.reindex() 方法:重新索引针对 Series 重新索引指的是根据index参数重新进行排序. 如果传入的索引值在数据里不存在,则不会报错,而是添加缺失值的新行. 不想用缺失值,可以用 ...
pandas基础: Series和DataFrame的简单介绍
一.pandas 是什么 pandas 是基于 NumPy 的一个 Python 数据分析包,主要目的是为了数据分析.它提供了大量高级的数据结构和对数据处理的方法. pandas 有两个主要的数据结构 ...
Python之Pandas中Series、DataFrame
Python之Pandas中Series.DataFrame实践 1. pandas的数据结构Series 1.1 Series是一种类似于一维数组的对象,它由一组数据(各种NumPy数据类型)以及一 ...
Python之Pandas中Series、DataFrame实践
Python之Pandas中Series.DataFrame实践 1. pandas的数据结构Series 1.1 Series是一种类似于一维数组的对象,它由一组数据(各种NumPy数据类型)以及一 ...
pandas学习series和dataframe基础
PANDAS 的使用一.什么是pandas? 1.python Data Analysis Library 或pandas 是基于numpy的一种工具,该工具是为了解决数据分析人物而创建的. 2.p ...
Pandas中Series和DataFrame的索引
在对Series对象和DataFrame对象进行索引的时候要明确这么一个概念:是使用下标进行索引,还是使用关键字进行索引.比如list进行索引的时候使用的是下标,而dict索引的时候使用的是关键字. ...
[Python] Pandas 中 Series 和 DataFrame 的用法笔记
目录 1. Series对象自定义元素的行标签使用Series对象定义基于字典创建数据结构 2. DataFrame对象自定义行标签和列标签使用DataFrame对象可以基于字典创建数据结构 ...
Python数据分析-Pandas（Series与DataFrame）
Pandas介绍: pandas是一个强大的Python数据分析的工具包,是基于NumPy构建的. Pandas的主要功能: 1)具备对其功能的数据结构DataFrame.Series 2)集成时间序 ...

随机推荐

robotframework + appium实例
Open Application http://localhost:4723/wd/hub platformName=Android platformVersion=4.4.2 deviceName= ...
【转】Spring的IOC原理(通俗易懂)
1. IoC理论的背景我们都知道,在采用面向对象方法设计的软件系统中,它的底层实现都是由Ñ个对象组成的,所有的对象通过彼此的合作,最终实现系统的业务逻辑. 如果我们打开机械式手表的后盖,就会看到与上 ...
OSI网络七层模型、TCP/IP 模型（四）
OSI 是 Open System Interconnection 的缩写,译为“开放式系统互联”. OSI 模型把网络通信的工作分为 7 层,从下到上分别是物理层.数据链路层.网络层.传输层.会话层 ...
爬虫，爬取景点信息采用pandas整理数据
一.首先需要导入我们的库函数导语:通过看网上直播学习得到,如有雷同纯属巧合. import requests#请求网页链接import pandas as pd#建立数据模型from bs4 imp ...
请求与上传文件，Session简介，Restful API，Nodemon
作者 | Jeskson 来源 | 达达前端小酒馆请求与上传文件 GET请求和POST请求 const express = require('express'); const app = expre ...
spring cloud gateway网关启动报错：No qualifying bean of type 'org.springframework.web.reactive.DispatcherHandler'
网关配置好后启动报错如下: org.springframework.context.ApplicationContextException: Unable to start web server; n ...
关于交叉编译Nodejs的坑
前言交叉编译Nodejs到其他平台上的时候,遇到了2个坑,网上极少有人提及,花了整个晚上才解决,在此记录下. 我的编译目标环境为: 龙芯3A 编译脚本 cd 代码目录 export PREFIX=/ ...
ROS融合IMU笔记
ROS官网有一个叫robot_pose_ekf的包,是专门处理传感器融合的包,具体介绍:http://wiki.ros.org/robot_pose_ekf 其中主要功能是订阅主题包括odom(里程计 ...
prometheus exporter简介
一.服务分类在线服务:请求的客户端和发起者需要立即响应(高并发.低延迟:并发数.接口响应时间.错误数.延迟时间),面对突发流量能进行资源的自动伸缩离线服务:请求发送到服务端但不要求立即获取结果(监 ...
论文阅读: Infrastructure-Based Calibration of a Multi-Camera Rig
Abstract 在线标定很重要．但是目前的方法都计算量都很高．我们的方案不需要标定板之类的东西．我们的方案不需要假设相机有重合的FOV,也不需要任何的初始猜测. 当相机模组行驶穿过之前建过地图 ...

pandas基础：Series与DataFrame操作

pandas基础：Series与DataFrame操作的更多相关文章

随机推荐

热门专题