Summary of Indexing operation in DataFrame of Pandas

For new users of pandas, the index of DataFrame may seem confusing, so personally I list all its usage in detail and finally make a conclusion about the result of exploration on indexing operation on DataFrame of pandas.

import pandas as pd
import numpy as np
df=pd.DataFrame(np.arange(16).reshape(4,4),index=['Ohio','Colorado','Utah','New York'],columns=['one','two','three','four']);df
.dataframe tbody tr th:only-of-type { vertical-align: middle }
\3c pre>\3c code>.dataframe tbody tr th { vertical-align: top }
.dataframe thead th { text-align: right }

one two three four
Ohio 0 1 2 3
Colorado 4 5 6 7
Utah 8 9 10 11
New York 12 13 14 15

(1) df[val]

  • when val is a number,df[val] selects single column from DataFrame,returnning Series type.
df['one']
Ohio         0
Colorado 4
Utah 8
New York 12
Name: one, dtype: int32
  • when val is a list,df[val] selects sequence columns from DataFrame,returnning DataFrame type.
df[['one','two']]
.dataframe tbody tr th:only-of-type { vertical-align: middle }
\3c pre>\3c code>.dataframe tbody tr th { vertical-align: top }
.dataframe thead th { text-align: right }

one two
Ohio 0 1
Colorado 4 5
Utah 8 9
New York 12 13
  • when val is :num, df[val] selects rows, and that is for a convenience purpose.That is equivalent to df.iloc[:num],which is specially used to deal with row selection.
df[:2]
.dataframe tbody tr th:only-of-type { vertical-align: middle }
\3c pre>\3c code>.dataframe tbody tr th { vertical-align: top }
.dataframe thead th { text-align: right }

one two three four
Ohio 0 1 2 3
Colorado 4 5 6 7
  • df[val],when val is pd.Series whose index is the same with df,value is boolean,returns the index whose value in pd.Series is True.In this case,pd.DataFrame.any or

    pd.DataFrame.all always returns this kind of pd.Series as the input of val in df[val] for the purpose of filtering.
df.iloc[:2] # the same with above
.dataframe tbody tr th:only-of-type { vertical-align: middle }
\3c pre>\3c code>.dataframe tbody tr th { vertical-align: top }
.dataframe thead th { text-align: right }

one two three four
Ohio 0 1 2 3
Colorado 4 5 6 7
df[1:3]
.dataframe tbody tr th:only-of-type { vertical-align: middle }
\3c pre>\3c code>.dataframe tbody tr th { vertical-align: top }
.dataframe thead th { text-align: right }

one two three four
Colorado 4 5 6 7
Utah 8 9 10 11
df.iloc[1:3]
.dataframe tbody tr th:only-of-type { vertical-align: middle }
\3c pre>\3c code>.dataframe tbody tr th { vertical-align: top }
.dataframe thead th { text-align: right }

one two three four
Colorado 4 5 6 7
Utah 8 9 10 11
  • when val is boolean DataFrame, df[val] sets values based on boolean
df<5
.dataframe tbody tr th:only-of-type { vertical-align: middle }
\3c pre>\3c code>.dataframe tbody tr th { vertical-align: top }
.dataframe thead th { text-align: right }

one two three four
Ohio True True True True
Colorado True False False False
Utah False False False False
New York False False False False
df[df<5]
.dataframe tbody tr th:only-of-type { vertical-align: middle }
\3c pre>\3c code>.dataframe tbody tr th { vertical-align: top }
.dataframe thead th { text-align: right }

one two three four
Ohio 0.0 1.0 2.0 3.0
Colorado 4.0 NaN NaN NaN
Utah NaN NaN NaN NaN
New York NaN NaN NaN NaN
df[df<5]=0;df
.dataframe tbody tr th:only-of-type { vertical-align: middle }
\3c pre>\3c code>.dataframe tbody tr th { vertical-align: top }
.dataframe thead th { text-align: right }

one two three four
Ohio 0 0 0 0
Colorado 0 5 6 7
Utah 8 9 10 11
New York 12 13 14 15

(2)df.loc[val]

  • when val is a single index value,selects corresponding row,returnning Series type, and when val is list of index vale, selects corresponding rows,returnning DataFrame type.
df.loc['Colorado']
one      0
two 5
three 6
four 7
Name: Colorado, dtype: int32
df.loc[['Colorado','New York']]
.dataframe tbody tr th:only-of-type { vertical-align: middle }
\3c pre>\3c code>.dataframe tbody tr th { vertical-align: top }
.dataframe thead th { text-align: right }

one two three four
Colorado 0 5 6 7
New York 12 13 14 15

(3)df.loc[:,val]

  • when val is a single column value,selects corresponding column,returning Series type and when val is list of columns,select corresponding columns,returnning DataFrame type.
df.loc[:,'two']
Ohio         0
Colorado 5
Utah 9
New York 13
Name: two, dtype: int32
df.loc[:,['two']] # Note that ,as long as val is a list even though containing just one element ,it will return DataFrame type.
.dataframe tbody tr th:only-of-type { vertical-align: middle }
\3c pre>\3c code>.dataframe tbody tr th { vertical-align: top }
.dataframe thead th { text-align: right }

two
Ohio 0
Colorado 5
Utah 9
New York 13
df.loc[:,['one','two']]
.dataframe tbody tr th:only-of-type { vertical-align: middle }
\3c pre>\3c code>.dataframe tbody tr th { vertical-align: top }
.dataframe thead th { text-align: right }

one two
Ohio 0 0
Colorado 0 5
Utah 8 9
New York 12 13
df[['one','two']] # The same with above df.loc[:,['one','two']]
.dataframe tbody tr th:only-of-type { vertical-align: middle }
\3c pre>\3c code>.dataframe tbody tr th { vertical-align: top }
.dataframe thead th { text-align: right }

one two
Ohio 0 0
Colorado 0 5
Utah 8 9
New York 12 13

(3)df.loc[val1,val2]

  • when val1 may be a single index value or list of index values,and val2 may be a single column value or list of column values,selects the combination data decided by both val1 and val2.And specially, val1 or val2 can both be : to participate in the combination.
df.loc['Ohio','one']
0
df.loc[['Ohio','Utah'],'one']
Ohio    0
Utah 8
Name: one, dtype: int32
df.loc['Ohio',['one','two']]
one    0
two 0
Name: Ohio, dtype: int32
df.loc[['Ohio','Utah'],['one','two']]
.dataframe tbody tr th:only-of-type { vertical-align: middle }
\3c pre>\3c code>.dataframe tbody tr th { vertical-align: top }
.dataframe thead th { text-align: right }

one two
Ohio 0 0
Utah 8 9
df.loc[:,:]
.dataframe tbody tr th:only-of-type { vertical-align: middle }
\3c pre>\3c code>.dataframe tbody tr th { vertical-align: top }
.dataframe thead th { text-align: right }

one two three four
Ohio 0 0 0 0
Colorado 0 5 6 7
Utah 8 9 10 11
New York 12 13 14 15
df.loc['Ohio',:]
one      0
two 0
three 0
four 0
Name: Ohio, dtype: int32
df.loc[:,'two']
Ohio         0
Colorado 5
Utah 9
New York 13
Name: two, dtype: int32
df.loc[:,['one','two']]
.dataframe tbody tr th:only-of-type { vertical-align: middle }
\3c pre>\3c code>.dataframe tbody tr th { vertical-align: top }
.dataframe thead th { text-align: right }

one two
Ohio 0 0
Colorado 0 5
Utah 8 9
New York 12 13

(4) df.iloc[val]

  • Compared with df.loc,val shall be integer or lists of integer which represents the index number and the function is the same with df.loc
df.iloc[1]
one      0
two 5
three 6
four 7
Name: Colorado, dtype: int32
df.iloc[[1,3]]
.dataframe tbody tr th:only-of-type { vertical-align: middle }
\3c pre>\3c code>.dataframe tbody tr th { vertical-align: top }
.dataframe thead th { text-align: right }

one two three four
Colorado 0 5 6 7
New York 12 13 14 15

(5)df.iloc[:,val]

  • The same with df.loc,except that val shall be integer or list of integers.
df
.dataframe tbody tr th:only-of-type { vertical-align: middle }
\3c pre>\3c code>.dataframe tbody tr th { vertical-align: top }
.dataframe thead th { text-align: right }

one two three four
Ohio 0 0 0 0
Colorado 0 5 6 7
Utah 8 9 10 11
New York 12 13 14 15
df.iloc[:,1]
Ohio         0
Colorado 5
Utah 9
New York 13
Name: two, dtype: int32
df.iloc[:,[1,3]]
.dataframe tbody tr th:only-of-type { vertical-align: middle }
\3c pre>\3c code>.dataframe tbody tr th { vertical-align: top }
.dataframe thead th { text-align: right }

two four
Ohio 0 0
Colorado 5 7
Utah 9 11
New York 13 15

(6)df.iloc[val1,val2]

  • The same with df.loc,except val1 and val2 shall be integer or list of integers
df.iloc[1,2]
6
df.iloc[1,[1,2,3]]
two      5
three 6
four 7
Name: Colorado, dtype: int32
df.iloc[[1,2],2]
Colorado     6
Utah 10
Name: three, dtype: int32
df.iloc[[1,2],[1,2]]
.dataframe tbody tr th:only-of-type { vertical-align: middle }
\3c pre>\3c code>.dataframe tbody tr th { vertical-align: top }
.dataframe thead th { text-align: right }

two three
Colorado 5 6
Utah 9 10
df.iloc[:,[1,2]]
.dataframe tbody tr th:only-of-type { vertical-align: middle }
\3c pre>\3c code>.dataframe tbody tr th { vertical-align: top }
.dataframe thead th { text-align: right }

two three
Ohio 0 0
Colorado 5 6
Utah 9 10
New York 13 14
df.iloc[[1,2],:]
.dataframe tbody tr th:only-of-type { vertical-align: middle }
\3c pre>\3c code>.dataframe tbody tr th { vertical-align: top }
.dataframe thead th { text-align: right }

one two three four
Colorado 0 5 6 7
Utah 8 9 10 11

(7)df.at[val1,val2]

  • val1 shall be a single index value,val2 shall be a single column value.
df.at['Utah','one']
8
df.loc['Utah','one'] # The same with above
8
df.at[['Utah','Colorado'],'one'] # Raise exception
---------------------------------------------------------------------------

TypeError                                 Traceback (most recent call last)

D:\Anaconda\lib\site-packages\pandas\core\frame.py in _get_value(self, index, col, takeable)
2538 try:
-> 2539 return engine.get_value(series._values, index)
2540 except (TypeError, ValueError): pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_value() pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_value() pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc() TypeError: '['Utah', 'Colorado']' is an invalid key During handling of the above exception, another exception occurred: TypeError Traceback (most recent call last) <ipython-input-77-c52a9db91739> in <module>()
----> 1 df.at[['Utah','Colorado'],'one'] D:\Anaconda\lib\site-packages\pandas\core\indexing.py in __getitem__(self, key)
2140
2141 key = self._convert_key(key)
-> 2142 return self.obj._get_value(*key, takeable=self._takeable)
2143
2144 def __setitem__(self, key, value): D:\Anaconda\lib\site-packages\pandas\core\frame.py in _get_value(self, index, col, takeable)
2543 # use positional
2544 col = self.columns.get_loc(col)
-> 2545 index = self.index.get_loc(index)
2546 return self._get_value(index, col, takeable=True)
2547 _get_value.__doc__ = get_value.__doc__ D:\Anaconda\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
3076 'backfill or nearest lookups')
3077 try:
-> 3078 return self._engine.get_loc(key)
3079 except KeyError:
3080 return self._engine.get_loc(self._maybe_cast_indexer(key)) pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc() pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc() TypeError: '['Utah', 'Colorado']' is an invalid key

(8) df.iat[val1,val2]

  • The same with df.at,except val1 and val2 shall be both integer
df.iat[2,2]
10
df
.dataframe tbody tr th:only-of-type { vertical-align: middle }
\3c pre>\3c code>.dataframe tbody tr th { vertical-align: top }
.dataframe thead th { text-align: right }

one two three four
Ohio 0 0 0 0
Colorado 0 5 6 7
Utah 8 9 10 11
New York 12 13 14 15

Conclusion

  • val in df[val] can be a column value or list of column values in this case to selecting the whole column,and specially can also be set :val meaning to select corresponding sliced rows.And also can be boolean DataFrame to set values.
  • Generally speaking, df.loc[val] is mainly used to select rows or the combination of rows and columns,so val has the following forms:single row value,list of row values,val1,val2(val1 and val2 can be single value or list of values or :,and in this form,it selects the combination index value val1 and column value val2
  • df.iloc[val] is the same with df.loc,except val demands integer,whatever single integer value or lists of integers.
  • df.at[val1,val2] shall be only single value and this also applies to df.iat[val1,val2]

Summary of Indexing operation in DataFrame of Pandas的更多相关文章

  1. Pandas 之 DataFrame 常用操作

    import numpy as np import pandas as pd This section will walk you(引导你) through the fundamental(基本的) ...

  2. 【338】Pandas.DataFrame

    Ref: Pandas Tutorial: DataFrames in Python Ref: pandas.DataFrame Ref: Pandas:DataFrame对象的基础操作 Ref: C ...

  3. pandas DataFrame行或列的删除方法

    pandas DataFrame的增删查改总结系列文章: pandas DaFrame的创建方法 pandas DataFrame的查询方法 pandas DataFrame行或列的删除方法 pand ...

  4. Python Pandas -- DataFrame

    pandas.DataFrame class pandas.DataFrame(data=None, index=None, columns=None, dtype=None, copy=False) ...

  5. pandas之Seris和DataFrame

    pandas是一个强大的python工具包,提供了大量处理数据的函数和方法,用于处理数据和分析数据. 使用pandas之前需要先安装pandas包,并通过import pandas as pd导入. ...

  6. pandas.DataFrame对行和列求和及添加新行和列

    导入模块: from pandas import DataFrame import pandas as pd import numpy as np 生成DataFrame数据 df = DataFra ...

  7. python 数据处理学习pandas之DataFrame

    请原谅没有一次写完,本文是自己学习过程中的记录,完善pandas的学习知识,对于现有网上资料的缺少和利用python进行数据分析这本书部分知识的过时,只好以记录的形势来写这篇文章.最如果后续工作定下来 ...

  8. pandas库学习笔记(二)DataFrame入门学习

    Pandas基本介绍——DataFrame入门学习 前篇文章中,小生初步介绍pandas库中的Series结构的创建与运算,今天小生继续“死磕自己”为大家介绍pandas库的另一种最为常见的数据结构D ...

  9. Python 数据处理扩展包: pandas 模块的DataFrame介绍(创建和基本操作)

    DataFrame是Pandas中的一个表结构的数据结构,包括三部分信息,表头(列的名称),表的内容(二维矩阵),索引(每行一个唯一的标记). 一.DataFrame的创建 有多种方式可以创建Data ...

  10. pandas.DataFrame学习系列1——定义及属性

    定义: DataFrame是二维的.大小可变的.成分混合的.具有标签化坐标轴(行和列)的表数据结构.基于行和列标签进行计算.可以被看作是为序列对象(Series)提供的类似字典的一个容器,是panda ...

随机推荐

  1. 使用Visual Studio 调式NDK so 库时,调试工具无法显示vector内容

    最近在研究C++开发安卓端so库,demo使用xamarin.android作为载体来验证算法库文件的准确性.调试过程中发现vector中的内容无法显示集合详细.如下图 研究了半天(参考链接2.3), ...

  2. 解决 Docker 安装时“无法获取 dpkg 前端锁”错误的有效方法

    在安装 Docker 的过程中,不少用户可能会遇到"无法获取 dpkg 前端锁"的错误提示.这是一个较为常见但也令人困扰的问题.下面为您详细介绍几种可能的解决方法: 方法一:检查并 ...

  3. 从零开始!Jupyter Notebook的安装详细教程

    本文将引导你完成从零开始安装Jupyter Notebook的过程.Jupyter Notebook是一个开源的Web应用程序,允许用户创建和共享包含实时代码.方程.可视化和叙述文本的文档.它广泛应用 ...

  4. Typecho浏览统计和热门文章调用插件TePostViews

    TePostViews是一款简单的typecho热门文章调用插件,通过该插件可以显示每篇文章的阅读次数,以及调用阅读次数最多或者评论数最多的文章作为热门文章调用,用户可以自由选择调用依据和调用文章的数 ...

  5. TDH - 如何显示Guardian Client角色

    注意:本博客适用TDH版本4.3.x 默认该页面的 Guardian Client 是隐藏的,如果需要对 Guardian Client角色进行什么操作的话,需要先将 Guardian Client角 ...

  6. /proc的相关知识

    /proc的相关知识 /proc 介绍 /proc 是一种伪文件系统(也即虚拟文件系统),存储的是当前内核运行状态的一系列特殊文件,用户可以通过这些文件查看有关系统硬件及当前正在运行进程的信息,甚至可 ...

  7. Java Map一些基本使用方法

    1 // Map key值不能相同,value值可以相同 2 // HashMap中的Entry对象是无序排列的 3 4 // 实例化1 5 Map<String, String> map ...

  8. 超详细移动端侧AI口罩识别实现与部署(含源码)

    开发环境 数据标注:label studio :https://labelstud.io/ 模型训练:tensorflow 附完整的训练源码和数据 部署开发:Android studio + tens ...

  9. zstd压缩算法概述与基本使用

    本文仅关注zstd的使用,并不关心其算法的具体实现 并没有尝试使用zstd的所有功能模式,但是会简单介绍每种模式的应用场景,用到的时候去查api吧 step 0:why zstd? zstd是face ...

  10. vscode运行js文件

    一. 首先你需要下载安装 nodejs 下载地址 二. 在 VS Code中有一个插件 code runner,安装后可以直接运行在 node 环境中,然后就可以在 vscode 中输出文件的结果. ...