Summary of Indexing operation in DataFrame of Pandas
Summary of Indexing operation in DataFrame of Pandas
For new users of pandas, the index of DataFrame may seem confusing, so personally I list all its usage in detail and finally make a conclusion about the result of exploration on indexing operation on DataFrame of pandas.
import pandas as pd
import numpy as np
df=pd.DataFrame(np.arange(16).reshape(4,4),index=['Ohio','Colorado','Utah','New York'],columns=['one','two','three','four']);df
\3c pre>\3c code>.dataframe tbody tr th { vertical-align: top }
.dataframe thead th { text-align: right }
one | two | three | four | |
---|---|---|---|---|
Ohio | 0 | 1 | 2 | 3 |
Colorado | 4 | 5 | 6 | 7 |
Utah | 8 | 9 | 10 | 11 |
New York | 12 | 13 | 14 | 15 |
(1) df[val]
- when val is a number,df[val] selects single column from DataFrame,returnning Series type.
df['one']
Ohio 0
Colorado 4
Utah 8
New York 12
Name: one, dtype: int32
- when val is a list,df[val] selects sequence columns from DataFrame,returnning DataFrame type.
df[['one','two']]
\3c pre>\3c code>.dataframe tbody tr th { vertical-align: top }
.dataframe thead th { text-align: right }
one | two | |
---|---|---|
Ohio | 0 | 1 |
Colorado | 4 | 5 |
Utah | 8 | 9 |
New York | 12 | 13 |
- when val is
:num
, df[val] selects rows, and that is for a convenience purpose.That is equivalent to df.iloc[:num],which is specially used to deal with row selection.
df[:2]
\3c pre>\3c code>.dataframe tbody tr th { vertical-align: top }
.dataframe thead th { text-align: right }
one | two | three | four | |
---|---|---|---|---|
Ohio | 0 | 1 | 2 | 3 |
Colorado | 4 | 5 | 6 | 7 |
- df[val],when val is pd.Series whose index is the same with df,value is boolean,returns the index whose value in pd.Series is True.In this case,
pd.DataFrame.any
or
pd.DataFrame.all
always returns this kind of pd.Series as the input of val in df[val] for the purpose of filtering.
df.iloc[:2] # the same with above
\3c pre>\3c code>.dataframe tbody tr th { vertical-align: top }
.dataframe thead th { text-align: right }
one | two | three | four | |
---|---|---|---|---|
Ohio | 0 | 1 | 2 | 3 |
Colorado | 4 | 5 | 6 | 7 |
df[1:3]
\3c pre>\3c code>.dataframe tbody tr th { vertical-align: top }
.dataframe thead th { text-align: right }
one | two | three | four | |
---|---|---|---|---|
Colorado | 4 | 5 | 6 | 7 |
Utah | 8 | 9 | 10 | 11 |
df.iloc[1:3]
\3c pre>\3c code>.dataframe tbody tr th { vertical-align: top }
.dataframe thead th { text-align: right }
one | two | three | four | |
---|---|---|---|---|
Colorado | 4 | 5 | 6 | 7 |
Utah | 8 | 9 | 10 | 11 |
- when val is boolean DataFrame, df[val] sets values based on boolean
df<5
\3c pre>\3c code>.dataframe tbody tr th { vertical-align: top }
.dataframe thead th { text-align: right }
one | two | three | four | |
---|---|---|---|---|
Ohio | True | True | True | True |
Colorado | True | False | False | False |
Utah | False | False | False | False |
New York | False | False | False | False |
df[df<5]
\3c pre>\3c code>.dataframe tbody tr th { vertical-align: top }
.dataframe thead th { text-align: right }
one | two | three | four | |
---|---|---|---|---|
Ohio | 0.0 | 1.0 | 2.0 | 3.0 |
Colorado | 4.0 | NaN | NaN | NaN |
Utah | NaN | NaN | NaN | NaN |
New York | NaN | NaN | NaN | NaN |
df[df<5]=0;df
\3c pre>\3c code>.dataframe tbody tr th { vertical-align: top }
.dataframe thead th { text-align: right }
one | two | three | four | |
---|---|---|---|---|
Ohio | 0 | 0 | 0 | 0 |
Colorado | 0 | 5 | 6 | 7 |
Utah | 8 | 9 | 10 | 11 |
New York | 12 | 13 | 14 | 15 |
(2)df.loc[val]
- when val is a single index value,selects corresponding row,returnning Series type, and when val is list of index vale, selects corresponding rows,returnning DataFrame type.
df.loc['Colorado']
one 0
two 5
three 6
four 7
Name: Colorado, dtype: int32
df.loc[['Colorado','New York']]
\3c pre>\3c code>.dataframe tbody tr th { vertical-align: top }
.dataframe thead th { text-align: right }
one | two | three | four | |
---|---|---|---|---|
Colorado | 0 | 5 | 6 | 7 |
New York | 12 | 13 | 14 | 15 |
(3)df.loc[:,val]
- when val is a single column value,selects corresponding column,returning Series type and when val is list of columns,select corresponding columns,returnning DataFrame type.
df.loc[:,'two']
Ohio 0
Colorado 5
Utah 9
New York 13
Name: two, dtype: int32
df.loc[:,['two']] # Note that ,as long as val is a list even though containing just one element ,it will return DataFrame type.
\3c pre>\3c code>.dataframe tbody tr th { vertical-align: top }
.dataframe thead th { text-align: right }
two | |
---|---|
Ohio | 0 |
Colorado | 5 |
Utah | 9 |
New York | 13 |
df.loc[:,['one','two']]
\3c pre>\3c code>.dataframe tbody tr th { vertical-align: top }
.dataframe thead th { text-align: right }
one | two | |
---|---|---|
Ohio | 0 | 0 |
Colorado | 0 | 5 |
Utah | 8 | 9 |
New York | 12 | 13 |
df[['one','two']] # The same with above df.loc[:,['one','two']]
\3c pre>\3c code>.dataframe tbody tr th { vertical-align: top }
.dataframe thead th { text-align: right }
one | two | |
---|---|---|
Ohio | 0 | 0 |
Colorado | 0 | 5 |
Utah | 8 | 9 |
New York | 12 | 13 |
(3)df.loc[val1,val2]
- when val1 may be a single index value or list of index values,and val2 may be a single column value or list of column values,selects the combination data decided by both val1 and val2.And specially, val1 or val2 can both be : to participate in the combination.
df.loc['Ohio','one']
0
df.loc[['Ohio','Utah'],'one']
Ohio 0
Utah 8
Name: one, dtype: int32
df.loc['Ohio',['one','two']]
one 0
two 0
Name: Ohio, dtype: int32
df.loc[['Ohio','Utah'],['one','two']]
\3c pre>\3c code>.dataframe tbody tr th { vertical-align: top }
.dataframe thead th { text-align: right }
one | two | |
---|---|---|
Ohio | 0 | 0 |
Utah | 8 | 9 |
df.loc[:,:]
\3c pre>\3c code>.dataframe tbody tr th { vertical-align: top }
.dataframe thead th { text-align: right }
one | two | three | four | |
---|---|---|---|---|
Ohio | 0 | 0 | 0 | 0 |
Colorado | 0 | 5 | 6 | 7 |
Utah | 8 | 9 | 10 | 11 |
New York | 12 | 13 | 14 | 15 |
df.loc['Ohio',:]
one 0
two 0
three 0
four 0
Name: Ohio, dtype: int32
df.loc[:,'two']
Ohio 0
Colorado 5
Utah 9
New York 13
Name: two, dtype: int32
df.loc[:,['one','two']]
\3c pre>\3c code>.dataframe tbody tr th { vertical-align: top }
.dataframe thead th { text-align: right }
one | two | |
---|---|---|
Ohio | 0 | 0 |
Colorado | 0 | 5 |
Utah | 8 | 9 |
New York | 12 | 13 |
(4) df.iloc[val]
- Compared with df.loc,val shall be integer or lists of integer which represents the index number and the function is the same with df.loc
df.iloc[1]
one 0
two 5
three 6
four 7
Name: Colorado, dtype: int32
df.iloc[[1,3]]
\3c pre>\3c code>.dataframe tbody tr th { vertical-align: top }
.dataframe thead th { text-align: right }
one | two | three | four | |
---|---|---|---|---|
Colorado | 0 | 5 | 6 | 7 |
New York | 12 | 13 | 14 | 15 |
(5)df.iloc[:,val]
- The same with df.loc,except that val shall be integer or list of integers.
df
\3c pre>\3c code>.dataframe tbody tr th { vertical-align: top }
.dataframe thead th { text-align: right }
one | two | three | four | |
---|---|---|---|---|
Ohio | 0 | 0 | 0 | 0 |
Colorado | 0 | 5 | 6 | 7 |
Utah | 8 | 9 | 10 | 11 |
New York | 12 | 13 | 14 | 15 |
df.iloc[:,1]
Ohio 0
Colorado 5
Utah 9
New York 13
Name: two, dtype: int32
df.iloc[:,[1,3]]
\3c pre>\3c code>.dataframe tbody tr th { vertical-align: top }
.dataframe thead th { text-align: right }
two | four | |
---|---|---|
Ohio | 0 | 0 |
Colorado | 5 | 7 |
Utah | 9 | 11 |
New York | 13 | 15 |
(6)df.iloc[val1,val2]
- The same with df.loc,except val1 and val2 shall be integer or list of integers
df.iloc[1,2]
6
df.iloc[1,[1,2,3]]
two 5
three 6
four 7
Name: Colorado, dtype: int32
df.iloc[[1,2],2]
Colorado 6
Utah 10
Name: three, dtype: int32
df.iloc[[1,2],[1,2]]
\3c pre>\3c code>.dataframe tbody tr th { vertical-align: top }
.dataframe thead th { text-align: right }
two | three | |
---|---|---|
Colorado | 5 | 6 |
Utah | 9 | 10 |
df.iloc[:,[1,2]]
\3c pre>\3c code>.dataframe tbody tr th { vertical-align: top }
.dataframe thead th { text-align: right }
two | three | |
---|---|---|
Ohio | 0 | 0 |
Colorado | 5 | 6 |
Utah | 9 | 10 |
New York | 13 | 14 |
df.iloc[[1,2],:]
\3c pre>\3c code>.dataframe tbody tr th { vertical-align: top }
.dataframe thead th { text-align: right }
one | two | three | four | |
---|---|---|---|---|
Colorado | 0 | 5 | 6 | 7 |
Utah | 8 | 9 | 10 | 11 |
(7)df.at[val1,val2]
- val1 shall be a single index value,val2 shall be a single column value.
df.at['Utah','one']
8
df.loc['Utah','one'] # The same with above
8
df.at[['Utah','Colorado'],'one'] # Raise exception
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
D:\Anaconda\lib\site-packages\pandas\core\frame.py in _get_value(self, index, col, takeable)
2538 try:
-> 2539 return engine.get_value(series._values, index)
2540 except (TypeError, ValueError):
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_value()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_value()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
TypeError: '['Utah', 'Colorado']' is an invalid key
During handling of the above exception, another exception occurred:
TypeError Traceback (most recent call last)
<ipython-input-77-c52a9db91739> in <module>()
----> 1 df.at[['Utah','Colorado'],'one']
D:\Anaconda\lib\site-packages\pandas\core\indexing.py in __getitem__(self, key)
2140
2141 key = self._convert_key(key)
-> 2142 return self.obj._get_value(*key, takeable=self._takeable)
2143
2144 def __setitem__(self, key, value):
D:\Anaconda\lib\site-packages\pandas\core\frame.py in _get_value(self, index, col, takeable)
2543 # use positional
2544 col = self.columns.get_loc(col)
-> 2545 index = self.index.get_loc(index)
2546 return self._get_value(index, col, takeable=True)
2547 _get_value.__doc__ = get_value.__doc__
D:\Anaconda\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
3076 'backfill or nearest lookups')
3077 try:
-> 3078 return self._engine.get_loc(key)
3079 except KeyError:
3080 return self._engine.get_loc(self._maybe_cast_indexer(key))
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
TypeError: '['Utah', 'Colorado']' is an invalid key
(8) df.iat[val1,val2]
- The same with df.at,except val1 and val2 shall be both integer
df.iat[2,2]
10
df
\3c pre>\3c code>.dataframe tbody tr th { vertical-align: top }
.dataframe thead th { text-align: right }
one | two | three | four | |
---|---|---|---|---|
Ohio | 0 | 0 | 0 | 0 |
Colorado | 0 | 5 | 6 | 7 |
Utah | 8 | 9 | 10 | 11 |
New York | 12 | 13 | 14 | 15 |
Conclusion
- val in df[val] can be a column value or list of column values in this case to selecting the whole column,and specially can also be set :val meaning to select corresponding sliced rows.And also can be boolean DataFrame to set values.
- Generally speaking, df.loc[val] is mainly used to select rows or the combination of rows and columns,so val has the following forms:single row value,list of row values,val1,val2(val1 and val2 can be single value or list of values or :,and in this form,it selects the combination index value val1 and column value val2
- df.iloc[val] is the same with df.loc,except val demands integer,whatever single integer value or lists of integers.
- df.at[val1,val2] shall be only single value and this also applies to df.iat[val1,val2]
Summary of Indexing operation in DataFrame of Pandas的更多相关文章
- Pandas 之 DataFrame 常用操作
import numpy as np import pandas as pd This section will walk you(引导你) through the fundamental(基本的) ...
- 【338】Pandas.DataFrame
Ref: Pandas Tutorial: DataFrames in Python Ref: pandas.DataFrame Ref: Pandas:DataFrame对象的基础操作 Ref: C ...
- pandas DataFrame行或列的删除方法
pandas DataFrame的增删查改总结系列文章: pandas DaFrame的创建方法 pandas DataFrame的查询方法 pandas DataFrame行或列的删除方法 pand ...
- Python Pandas -- DataFrame
pandas.DataFrame class pandas.DataFrame(data=None, index=None, columns=None, dtype=None, copy=False) ...
- pandas之Seris和DataFrame
pandas是一个强大的python工具包,提供了大量处理数据的函数和方法,用于处理数据和分析数据. 使用pandas之前需要先安装pandas包,并通过import pandas as pd导入. ...
- pandas.DataFrame对行和列求和及添加新行和列
导入模块: from pandas import DataFrame import pandas as pd import numpy as np 生成DataFrame数据 df = DataFra ...
- python 数据处理学习pandas之DataFrame
请原谅没有一次写完,本文是自己学习过程中的记录,完善pandas的学习知识,对于现有网上资料的缺少和利用python进行数据分析这本书部分知识的过时,只好以记录的形势来写这篇文章.最如果后续工作定下来 ...
- pandas库学习笔记(二)DataFrame入门学习
Pandas基本介绍——DataFrame入门学习 前篇文章中,小生初步介绍pandas库中的Series结构的创建与运算,今天小生继续“死磕自己”为大家介绍pandas库的另一种最为常见的数据结构D ...
- Python 数据处理扩展包: pandas 模块的DataFrame介绍(创建和基本操作)
DataFrame是Pandas中的一个表结构的数据结构,包括三部分信息,表头(列的名称),表的内容(二维矩阵),索引(每行一个唯一的标记). 一.DataFrame的创建 有多种方式可以创建Data ...
- pandas.DataFrame学习系列1——定义及属性
定义: DataFrame是二维的.大小可变的.成分混合的.具有标签化坐标轴(行和列)的表数据结构.基于行和列标签进行计算.可以被看作是为序列对象(Series)提供的类似字典的一个容器,是panda ...
随机推荐
- spring官宣接入deepseek,真的太香了~
写在前面 经常逛Spring官网(https://docs.spring.io/spring-ai/reference/api/chat/deepseek-chat.html)的小伙伴会发现, Spr ...
- C++调用动态链接库DLL的隐式链接和显式链接基本方法小结
C++程序在运行时调用动态链接库,实现逻辑扩展,有两种基本链接方式:隐式链接和显式链接.下面就设立最基本情形实现上述链接. 创建DLL动态链接库 编辑头文件 mydll_3.h: #pragma on ...
- CTF-CRYPTO-ECC(2)
CTF-CRYPTO-ECC(2) 椭圆加密 4.BSGS(小步大步法) [HITCTF 2021 ] task.py #Elliptic Curve: y^2 = x^3 + 7 mod N whi ...
- 使用PySide6/PyQt6实现Python跨平台表格数据分页打印预览处理
我曾经在前面使用WxPython开发跨平台应用程序的时候,写了一篇<WxPython跨平台开发框架之列表数据的通用打印处理>,介绍在WxPython下实现表格数据分页打印处理的过程,在Wi ...
- 使用PySide6/PyQt6实现Python跨平台通用列表页面的基类设计
我在随笔<使用PySide6/PyQt6实现Python跨平台GUI框架的开发>中介绍过PySide6/PyQt6 框架架构的整体设计,本篇随笔继续深入探讨框架的设计开发工作,主要针对通用 ...
- linux服务器开启BBR
BBR TCP拥塞控制算法,是 Google 为优化网络传输性能而研发的网络优化协议,尤其是在高延迟.高丢包的网络环境下可以显著提升网络传输效率,改善用户体验.开启 BBR 的主要好处: 提高带宽利用 ...
- 【WPF开发】HandyControl Growl控件Error通知不自动消失的问题
需求 HandyControl Growl在Error类型的通知不自动消失,此时需要他跟其他的统一. 找寻原因 那么翻翻代码看看为啥不消失呗 1.这是决定关闭通知的计时器 2.这是通过_staysOp ...
- Windows 10 的 "邮件" 设置完成QQ账户,提示您的Qq帐户设置已过期的处置方法
引起这问题的原因可能是QQ未开启 1.POP3/SMTP服务 2.IMAP/SMTP服务 开启方法: 1.登录QQ邮箱(mail.qq.com) 2.点击"设置"->&q ...
- 一款 .NET 开源、功能强大的远程连接管理工具,支持 RDP、VNC、SSH 等多种主流协议!
前言 今天大姚给大家分享一款基于 .NET 开源(GPL-2.0 license).免费.功能强大的 Windows 远程连接管理工具,支持 RDP.VNC.SSH 等多种主流协议:mRemoteNG ...
- 深入理解 Java AQS 原理与 ReentrantLock 实现
目录 一.AQS 简介 二.AQS 核心设计 2.1 核心组成部分 2.2 AQS 的工作原理 2.3 AQS 的关键方法 三.ReentrantLock 与 AQS 的关系 3.1 Reentran ...