Summary of Indexing operation in DataFrame of Pandas
Summary of Indexing operation in DataFrame of Pandas
For new users of pandas, the index of DataFrame may seem confusing, so personally I list all its usage in detail and finally make a conclusion about the result of exploration on indexing operation on DataFrame of pandas.
import pandas as pd
import numpy as np
df=pd.DataFrame(np.arange(16).reshape(4,4),index=['Ohio','Colorado','Utah','New York'],columns=['one','two','three','four']);df
\3c pre>\3c code>.dataframe tbody tr th { vertical-align: top }
.dataframe thead th { text-align: right }
| one | two | three | four | |
|---|---|---|---|---|
| Ohio | 0 | 1 | 2 | 3 |
| Colorado | 4 | 5 | 6 | 7 |
| Utah | 8 | 9 | 10 | 11 |
| New York | 12 | 13 | 14 | 15 |
(1) df[val]
- when val is a number,df[val] selects single column from DataFrame,returnning Series type.
df['one']
Ohio 0
Colorado 4
Utah 8
New York 12
Name: one, dtype: int32
- when val is a list,df[val] selects sequence columns from DataFrame,returnning DataFrame type.
df[['one','two']]
\3c pre>\3c code>.dataframe tbody tr th { vertical-align: top }
.dataframe thead th { text-align: right }
| one | two | |
|---|---|---|
| Ohio | 0 | 1 |
| Colorado | 4 | 5 |
| Utah | 8 | 9 |
| New York | 12 | 13 |
- when val is
:num, df[val] selects rows, and that is for a convenience purpose.That is equivalent to df.iloc[:num],which is specially used to deal with row selection.
df[:2]
\3c pre>\3c code>.dataframe tbody tr th { vertical-align: top }
.dataframe thead th { text-align: right }
| one | two | three | four | |
|---|---|---|---|---|
| Ohio | 0 | 1 | 2 | 3 |
| Colorado | 4 | 5 | 6 | 7 |
- df[val],when val is pd.Series whose index is the same with df,value is boolean,returns the index whose value in pd.Series is True.In this case,
pd.DataFrame.anyor
pd.DataFrame.allalways returns this kind of pd.Series as the input of val in df[val] for the purpose of filtering.
df.iloc[:2] # the same with above
\3c pre>\3c code>.dataframe tbody tr th { vertical-align: top }
.dataframe thead th { text-align: right }
| one | two | three | four | |
|---|---|---|---|---|
| Ohio | 0 | 1 | 2 | 3 |
| Colorado | 4 | 5 | 6 | 7 |
df[1:3]
\3c pre>\3c code>.dataframe tbody tr th { vertical-align: top }
.dataframe thead th { text-align: right }
| one | two | three | four | |
|---|---|---|---|---|
| Colorado | 4 | 5 | 6 | 7 |
| Utah | 8 | 9 | 10 | 11 |
df.iloc[1:3]
\3c pre>\3c code>.dataframe tbody tr th { vertical-align: top }
.dataframe thead th { text-align: right }
| one | two | three | four | |
|---|---|---|---|---|
| Colorado | 4 | 5 | 6 | 7 |
| Utah | 8 | 9 | 10 | 11 |
- when val is boolean DataFrame, df[val] sets values based on boolean
df<5
\3c pre>\3c code>.dataframe tbody tr th { vertical-align: top }
.dataframe thead th { text-align: right }
| one | two | three | four | |
|---|---|---|---|---|
| Ohio | True | True | True | True |
| Colorado | True | False | False | False |
| Utah | False | False | False | False |
| New York | False | False | False | False |
df[df<5]
\3c pre>\3c code>.dataframe tbody tr th { vertical-align: top }
.dataframe thead th { text-align: right }
| one | two | three | four | |
|---|---|---|---|---|
| Ohio | 0.0 | 1.0 | 2.0 | 3.0 |
| Colorado | 4.0 | NaN | NaN | NaN |
| Utah | NaN | NaN | NaN | NaN |
| New York | NaN | NaN | NaN | NaN |
df[df<5]=0;df
\3c pre>\3c code>.dataframe tbody tr th { vertical-align: top }
.dataframe thead th { text-align: right }
| one | two | three | four | |
|---|---|---|---|---|
| Ohio | 0 | 0 | 0 | 0 |
| Colorado | 0 | 5 | 6 | 7 |
| Utah | 8 | 9 | 10 | 11 |
| New York | 12 | 13 | 14 | 15 |
(2)df.loc[val]
- when val is a single index value,selects corresponding row,returnning Series type, and when val is list of index vale, selects corresponding rows,returnning DataFrame type.
df.loc['Colorado']
one 0
two 5
three 6
four 7
Name: Colorado, dtype: int32
df.loc[['Colorado','New York']]
\3c pre>\3c code>.dataframe tbody tr th { vertical-align: top }
.dataframe thead th { text-align: right }
| one | two | three | four | |
|---|---|---|---|---|
| Colorado | 0 | 5 | 6 | 7 |
| New York | 12 | 13 | 14 | 15 |
(3)df.loc[:,val]
- when val is a single column value,selects corresponding column,returning Series type and when val is list of columns,select corresponding columns,returnning DataFrame type.
df.loc[:,'two']
Ohio 0
Colorado 5
Utah 9
New York 13
Name: two, dtype: int32
df.loc[:,['two']] # Note that ,as long as val is a list even though containing just one element ,it will return DataFrame type.
\3c pre>\3c code>.dataframe tbody tr th { vertical-align: top }
.dataframe thead th { text-align: right }
| two | |
|---|---|
| Ohio | 0 |
| Colorado | 5 |
| Utah | 9 |
| New York | 13 |
df.loc[:,['one','two']]
\3c pre>\3c code>.dataframe tbody tr th { vertical-align: top }
.dataframe thead th { text-align: right }
| one | two | |
|---|---|---|
| Ohio | 0 | 0 |
| Colorado | 0 | 5 |
| Utah | 8 | 9 |
| New York | 12 | 13 |
df[['one','two']] # The same with above df.loc[:,['one','two']]
\3c pre>\3c code>.dataframe tbody tr th { vertical-align: top }
.dataframe thead th { text-align: right }
| one | two | |
|---|---|---|
| Ohio | 0 | 0 |
| Colorado | 0 | 5 |
| Utah | 8 | 9 |
| New York | 12 | 13 |
(3)df.loc[val1,val2]
- when val1 may be a single index value or list of index values,and val2 may be a single column value or list of column values,selects the combination data decided by both val1 and val2.And specially, val1 or val2 can both be : to participate in the combination.
df.loc['Ohio','one']
0
df.loc[['Ohio','Utah'],'one']
Ohio 0
Utah 8
Name: one, dtype: int32
df.loc['Ohio',['one','two']]
one 0
two 0
Name: Ohio, dtype: int32
df.loc[['Ohio','Utah'],['one','two']]
\3c pre>\3c code>.dataframe tbody tr th { vertical-align: top }
.dataframe thead th { text-align: right }
| one | two | |
|---|---|---|
| Ohio | 0 | 0 |
| Utah | 8 | 9 |
df.loc[:,:]
\3c pre>\3c code>.dataframe tbody tr th { vertical-align: top }
.dataframe thead th { text-align: right }
| one | two | three | four | |
|---|---|---|---|---|
| Ohio | 0 | 0 | 0 | 0 |
| Colorado | 0 | 5 | 6 | 7 |
| Utah | 8 | 9 | 10 | 11 |
| New York | 12 | 13 | 14 | 15 |
df.loc['Ohio',:]
one 0
two 0
three 0
four 0
Name: Ohio, dtype: int32
df.loc[:,'two']
Ohio 0
Colorado 5
Utah 9
New York 13
Name: two, dtype: int32
df.loc[:,['one','two']]
\3c pre>\3c code>.dataframe tbody tr th { vertical-align: top }
.dataframe thead th { text-align: right }
| one | two | |
|---|---|---|
| Ohio | 0 | 0 |
| Colorado | 0 | 5 |
| Utah | 8 | 9 |
| New York | 12 | 13 |
(4) df.iloc[val]
- Compared with df.loc,val shall be integer or lists of integer which represents the index number and the function is the same with df.loc
df.iloc[1]
one 0
two 5
three 6
four 7
Name: Colorado, dtype: int32
df.iloc[[1,3]]
\3c pre>\3c code>.dataframe tbody tr th { vertical-align: top }
.dataframe thead th { text-align: right }
| one | two | three | four | |
|---|---|---|---|---|
| Colorado | 0 | 5 | 6 | 7 |
| New York | 12 | 13 | 14 | 15 |
(5)df.iloc[:,val]
- The same with df.loc,except that val shall be integer or list of integers.
df
\3c pre>\3c code>.dataframe tbody tr th { vertical-align: top }
.dataframe thead th { text-align: right }
| one | two | three | four | |
|---|---|---|---|---|
| Ohio | 0 | 0 | 0 | 0 |
| Colorado | 0 | 5 | 6 | 7 |
| Utah | 8 | 9 | 10 | 11 |
| New York | 12 | 13 | 14 | 15 |
df.iloc[:,1]
Ohio 0
Colorado 5
Utah 9
New York 13
Name: two, dtype: int32
df.iloc[:,[1,3]]
\3c pre>\3c code>.dataframe tbody tr th { vertical-align: top }
.dataframe thead th { text-align: right }
| two | four | |
|---|---|---|
| Ohio | 0 | 0 |
| Colorado | 5 | 7 |
| Utah | 9 | 11 |
| New York | 13 | 15 |
(6)df.iloc[val1,val2]
- The same with df.loc,except val1 and val2 shall be integer or list of integers
df.iloc[1,2]
6
df.iloc[1,[1,2,3]]
two 5
three 6
four 7
Name: Colorado, dtype: int32
df.iloc[[1,2],2]
Colorado 6
Utah 10
Name: three, dtype: int32
df.iloc[[1,2],[1,2]]
\3c pre>\3c code>.dataframe tbody tr th { vertical-align: top }
.dataframe thead th { text-align: right }
| two | three | |
|---|---|---|
| Colorado | 5 | 6 |
| Utah | 9 | 10 |
df.iloc[:,[1,2]]
\3c pre>\3c code>.dataframe tbody tr th { vertical-align: top }
.dataframe thead th { text-align: right }
| two | three | |
|---|---|---|
| Ohio | 0 | 0 |
| Colorado | 5 | 6 |
| Utah | 9 | 10 |
| New York | 13 | 14 |
df.iloc[[1,2],:]
\3c pre>\3c code>.dataframe tbody tr th { vertical-align: top }
.dataframe thead th { text-align: right }
| one | two | three | four | |
|---|---|---|---|---|
| Colorado | 0 | 5 | 6 | 7 |
| Utah | 8 | 9 | 10 | 11 |
(7)df.at[val1,val2]
- val1 shall be a single index value,val2 shall be a single column value.
df.at['Utah','one']
8
df.loc['Utah','one'] # The same with above
8
df.at[['Utah','Colorado'],'one'] # Raise exception
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
D:\Anaconda\lib\site-packages\pandas\core\frame.py in _get_value(self, index, col, takeable)
2538 try:
-> 2539 return engine.get_value(series._values, index)
2540 except (TypeError, ValueError):
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_value()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_value()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
TypeError: '['Utah', 'Colorado']' is an invalid key
During handling of the above exception, another exception occurred:
TypeError Traceback (most recent call last)
<ipython-input-77-c52a9db91739> in <module>()
----> 1 df.at[['Utah','Colorado'],'one']
D:\Anaconda\lib\site-packages\pandas\core\indexing.py in __getitem__(self, key)
2140
2141 key = self._convert_key(key)
-> 2142 return self.obj._get_value(*key, takeable=self._takeable)
2143
2144 def __setitem__(self, key, value):
D:\Anaconda\lib\site-packages\pandas\core\frame.py in _get_value(self, index, col, takeable)
2543 # use positional
2544 col = self.columns.get_loc(col)
-> 2545 index = self.index.get_loc(index)
2546 return self._get_value(index, col, takeable=True)
2547 _get_value.__doc__ = get_value.__doc__
D:\Anaconda\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
3076 'backfill or nearest lookups')
3077 try:
-> 3078 return self._engine.get_loc(key)
3079 except KeyError:
3080 return self._engine.get_loc(self._maybe_cast_indexer(key))
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
TypeError: '['Utah', 'Colorado']' is an invalid key
(8) df.iat[val1,val2]
- The same with df.at,except val1 and val2 shall be both integer
df.iat[2,2]
10
df
\3c pre>\3c code>.dataframe tbody tr th { vertical-align: top }
.dataframe thead th { text-align: right }
| one | two | three | four | |
|---|---|---|---|---|
| Ohio | 0 | 0 | 0 | 0 |
| Colorado | 0 | 5 | 6 | 7 |
| Utah | 8 | 9 | 10 | 11 |
| New York | 12 | 13 | 14 | 15 |
Conclusion
- val in df[val] can be a column value or list of column values in this case to selecting the whole column,and specially can also be set :val meaning to select corresponding sliced rows.And also can be boolean DataFrame to set values.
- Generally speaking, df.loc[val] is mainly used to select rows or the combination of rows and columns,so val has the following forms:single row value,list of row values,val1,val2(val1 and val2 can be single value or list of values or :,and in this form,it selects the combination index value val1 and column value val2
- df.iloc[val] is the same with df.loc,except val demands integer,whatever single integer value or lists of integers.
- df.at[val1,val2] shall be only single value and this also applies to df.iat[val1,val2]
Summary of Indexing operation in DataFrame of Pandas的更多相关文章
- Pandas 之 DataFrame 常用操作
import numpy as np import pandas as pd This section will walk you(引导你) through the fundamental(基本的) ...
- 【338】Pandas.DataFrame
Ref: Pandas Tutorial: DataFrames in Python Ref: pandas.DataFrame Ref: Pandas:DataFrame对象的基础操作 Ref: C ...
- pandas DataFrame行或列的删除方法
pandas DataFrame的增删查改总结系列文章: pandas DaFrame的创建方法 pandas DataFrame的查询方法 pandas DataFrame行或列的删除方法 pand ...
- Python Pandas -- DataFrame
pandas.DataFrame class pandas.DataFrame(data=None, index=None, columns=None, dtype=None, copy=False) ...
- pandas之Seris和DataFrame
pandas是一个强大的python工具包,提供了大量处理数据的函数和方法,用于处理数据和分析数据. 使用pandas之前需要先安装pandas包,并通过import pandas as pd导入. ...
- pandas.DataFrame对行和列求和及添加新行和列
导入模块: from pandas import DataFrame import pandas as pd import numpy as np 生成DataFrame数据 df = DataFra ...
- python 数据处理学习pandas之DataFrame
请原谅没有一次写完,本文是自己学习过程中的记录,完善pandas的学习知识,对于现有网上资料的缺少和利用python进行数据分析这本书部分知识的过时,只好以记录的形势来写这篇文章.最如果后续工作定下来 ...
- pandas库学习笔记(二)DataFrame入门学习
Pandas基本介绍——DataFrame入门学习 前篇文章中,小生初步介绍pandas库中的Series结构的创建与运算,今天小生继续“死磕自己”为大家介绍pandas库的另一种最为常见的数据结构D ...
- Python 数据处理扩展包: pandas 模块的DataFrame介绍(创建和基本操作)
DataFrame是Pandas中的一个表结构的数据结构,包括三部分信息,表头(列的名称),表的内容(二维矩阵),索引(每行一个唯一的标记). 一.DataFrame的创建 有多种方式可以创建Data ...
- pandas.DataFrame学习系列1——定义及属性
定义: DataFrame是二维的.大小可变的.成分混合的.具有标签化坐标轴(行和列)的表数据结构.基于行和列标签进行计算.可以被看作是为序列对象(Series)提供的类似字典的一个容器,是panda ...
随机推荐
- Deepseek学习随笔(11)--- 普通人如何抓住DeepSeek红利(附网盘链接)
一.文档简介 这个文档是清华大学新闻与传播学院新媒体研究中心发布的<普通人如何抓住DeepSeek红利>,该文件详细介绍了DeepSeek的功能.应用场景.使用技巧以及如何通过提示词驱动提 ...
- QT5笔记: 21. QStandardItemModel
QStandardItemModel 存放数据 QItemSelectionModel 选择项模型 例子:本例子中QListView 没有做任何处理,只是拖放至ui文件,设置了布局 mainwindo ...
- MySQL - [10] 时间处理函数
题记部分 (1)获取当前日期时间:select current_date; (2)获取当前时间戳:select current_timestamp; (3)返回日期中的年/季度/月/日/时/分/秒 s ...
- 大数据之路Week10_day01 (练习:通过设计rowkey来实现查询需求)
1.准备数据 链接:https://pan.baidu.com/s/1fRECXp0oWM1xgxc0uoniAA 提取码:4k43 2.需求如下 (1)查询出10条某个人的最近出现的位置信息 (2) ...
- 【由技及道】CI/CD的量子纠缠术:Jenkins与Gitea的自动化交响曲【人工智障AI2077的开发日志】
摘要:当代码提交触发量子涟漪,当构建流水线穿越时空维度--欢迎来到自动化构建的十一维世界.本文记录一个未来AI如何用Jenkins和Gitea搭建量子纠缠式CI/CD管道,让每次代码提交都成为时空交响 ...
- Windows和Ubuntu间TCP连接测试
起因是想在Ubuntu上用Synergy,但是发现爱你怎么都连不上.鼓捣了半天发现似乎Ubuntu监听,Windows测试TCP连接总是不成功,反之却能成功,大概问题就在这,尚未解决先记录一下. 基本 ...
- golang 逃逸分析详解
疑问 请问main调用GetUserInfo后返回的&User{...}.这个变量是分配到栈上了呢,还是分配到堆上了? package main type User struct { ID i ...
- PHP Fatal error: Uncaught RedisException: Redis server went away in
PHP Fatal error: Uncaught RedisException: Redis server went away in 导致这个问题的原因可能有 1.redis未安装,php没有开启r ...
- 关于oracle pfile和spfile文件说明
•Pfile(Parameter File,参数文件):是基于文本格式的参数文件,含有数据库的配置参数. 默认的名称为"init+例程名.ora",这是一个文本文件,可以用任何文本 ...
- 《HelloGitHub》第 108 期
兴趣是最好的老师,HelloGitHub 让你对开源感兴趣! 简介 HelloGitHub 分享 GitHub 上有趣.入门级的开源项目. github.com/521xueweihan/HelloG ...