Pandas中DataFrame数据合并、连接(concat、merge、join)之join
pandas.DataFrame.join
自己弄了很久,一看官网。感觉自己宛如智障。不要脸了,直接抄
DataFrame.join(other, on=None, how='left', lsuffix='', rsuffix='', sort=False)-
Join columns with other DataFrame either on index or on a key column. Efficiently Join multiple DataFrame objects by index at once by passing a list.
Parameters: other : DataFrame, Series with name field set, or list of DataFrame
Index should be similar to one of the columns in this one. If a Series is passed, its name attribute must be set, and that will be used as the column name in the resulting joined DataFrame
on : column name, tuple/list of column names, or array-like
Column(s) in the caller to join on the index in other, otherwise joins index-on-index. If multiples columns given, the passed DataFrame must have a MultiIndex. Can pass an array as the join key if not already contained in the calling DataFrame. Like an Excel VLOOKUP operation
how : {‘left’, ‘right’, ‘outer’, ‘inner’}, default: ‘left’
How to handle the operation of the two objects.
left: use calling frame’s index (or column if on is specified)
right: use other frame’s index
- outer: form union of calling frame’s index (or column if on is
-
specified) with other frame’s index
- inner: form intersection of calling frame’s index (or column if
-
on is specified) with other frame’s index
lsuffix : string
Suffix to use from left frame’s overlapping columns
rsuffix : string
Suffix to use from right frame’s overlapping columns
sort : boolean, default False
Order result DataFrame lexicographically by the join key. If False, preserves the index order of the calling (left) DataFrame
Returns: joined : DataFrame
See also
DataFrame.merge- For column(s)-on-columns(s) operations
Notes
on, lsuffix, and rsuffix options are not supported when passing a list of DataFrame objects
Examples
>>> caller = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3', 'K4', 'K5'],
... 'A': ['A0', 'A1', 'A2', 'A3', 'A4', 'A5']})>>> caller
A key
0 A0 K0
1 A1 K1
2 A2 K2
3 A3 K3
4 A4 K4
5 A5 K5>>> other = pd.DataFrame({'key': ['K0', 'K1', 'K2'],
... 'B': ['B0', 'B1', 'B2']})>>> other
B key
0 B0 K0
1 B1 K1
2 B2 K2Join DataFrames using their indexes.==》join on indexes
>>> caller.join(other, lsuffix='_caller', rsuffix='_other')
>>> A key_caller B key_other
0 A0 K0 B0 K0
1 A1 K1 B1 K1
2 A2 K2 B2 K2
3 A3 K3 NaN NaN
4 A4 K4 NaN NaN
5 A5 K5 NaN NaNIf we want to join using the key columns, we need to set key to be the index in both caller and other. The joined DataFrame will have key as its index.
>>> caller.set_index('key').join(other.set_index('key'))>>> A B
key
K0 A0 B0
K1 A1 B1
K2 A2 B2
K3 A3 NaN
K4 A4 NaN
K5 A5 NaNAnother option to join using the key columns is to use the on parameter. DataFrame.join always uses other’s index but we can use any column in the caller. This method preserves the original caller’s index in the result.
>>> caller.join(other.set_index('key'), on='key')>>> A key B
0 A0 K0 B0
1 A1 K1 B1
2 A2 K2 B2
3 A3 K3 NaN
4 A4 K4 NaN
5 A5 K5 NaN
Pandas中DataFrame数据合并、连接(concat、merge、join)之join的更多相关文章
- Pandas中DataFrame数据合并、连接(concat、merge、join)之concat
一.concat:沿着一条轴,将多个对象堆叠到一起 concat(objs, axis=0, join='outer', join_axes=None, ignore_index=False, key ...
- Pandas中DataFrame数据合并、连接(concat、merge、join)之merge
二.merge:通过键拼接列 类似于关系型数据库的连接方式,可以根据一个或多个键将不同的DatFrame连接起来. 该函数的典型应用场景是,针对同一个主键存在两张不同字段的表,根据主键整合到一张表里面 ...
- Python基础 | pandas中dataframe的整合与形变(merge & reshape)
目录 行的union pd.concat df.append 列的join pd.concat pd.merge df.join 行列转置 pivot stack & unstack melt ...
- Spark与Pandas中DataFrame对比
Pandas Spark 工作方式 单机single machine tool,没有并行机制parallelism不支持Hadoop,处理大量数据有瓶颈 分布式并行计算框架,内建并行机制paral ...
- Spark与Pandas中DataFrame对比(详细)
Pandas Spark 工作方式 单机single machine tool,没有并行机制parallelism不支持Hadoop,处理大量数据有瓶颈 分布式并行计算框架,内建并行机制paral ...
- 将pandas的DataFrame数据写入MySQL数据库 + sqlalchemy
将pandas的DataFrame数据写入MySQL数据库 + sqlalchemy import pandas as pd from sqlalchemy import create_engine ...
- 排序合并连接(sort merge join)的原理
排序合并连接(sort merge join)的原理 排序合并连接(sort merge join)的原理 排序合并连接(sort merge join) 访问次数:两张表都只会访 ...
- Python3 Pandas的DataFrame数据的增、删、改、查
Python3 Pandas的DataFrame数据的增.删.改.查 一.DataFrame数据准备 增.删.改.查的方法有很多很多种,这里只展示出常用的几种. 参数inplace默认为False,只 ...
- Pandas中DataFrame修改列名
Pandas中DataFrame修改列名:使用 rename df = pd.read_csv('I:/Papers/consumer/codeandpaper/TmallData/result01- ...
随机推荐
- Springboot使用launch.script打包后解压缩
今天拿到一个SpringBoot的jar需要反编译看一下.直接用工具试了下提示报错. 于是用文本工具打开看了下,发现此JAR包在打包时候引入了启动脚本.如下图: 为了反编译,可以直接将所有启动脚本相关 ...
- 本地代码库,提交远程git
1.在git上新建项目,并填好相关信息 2.新建成功后,复制项目地址 3.idea新建本地仓库 4.Add所有文件,然后提交(commit) 5.先打开push界面,设置git远程地址,然后关掉,先p ...
- BZOJ 4835: 遗忘之树
传送门 首先设 $f[x]$ 表示点分树上 $x$ 的子树内的方案数 发现对于 $x$ 的每个儿子 $v$ ,$x$ 似乎可以向 $v$ 子树内的每个节点连边,因为不管怎么连重心都不会变 显然是错的, ...
- django 项目开发及部署遇到的坑
1.django 连接oracle数据库遇到的坑 需求:通过plsql建立的oracle数据表,想要django操作这几个表 python manage.py inspectdb table_name ...
- nnginx配置代理服务器
因为有些服务有ip白名单的限制,部署多节点后ip很容易就不够用了,所以可以将这些服务部署到其中的一些机器上, 并且部署代理服务器,然后其余机器以代理的方式访问服务.开始是以tinyproxy作为代理服 ...
- Flink的时间类型和watermark机制
一FlinkTime类型 有3类时间,分别是数据本身的产生时间.进入Flink系统的时间和被处理的时间,在Flink系统中的数据可以有三种时间属性: Event Time 是每条数据在其生产设备上发生 ...
- java实现spark常用算子之Take
import org.apache.spark.SparkConf;import org.apache.spark.api.java.JavaRDD;import org.apache.spark.a ...
- Presto基础知识
背景 MapReduce不能满足大数据快速实时adhoc查询计算的性能要求. Facebook的数据仓库存储在少量大型Hadoop/HDFS集群.Hive是Facebook在几年前专为Hadoop打造 ...
- 微信小程序onLoad、onShow、onHide、onUnload区别
onLoad:页面第一次加载时触发,从跳转页面返回时不能触发,可以传递参数 onShow:页面显示或从后台跳回小程序时显示此页面时触发,从跳转页面返回时触发,不能传递参数 onHide:页面隐藏,例如 ...
- python中进制转换
使用Python内置函数:bin().oct().int().hex()可实现进制转换. 先看Python官方文档中对这几个内置函数的描述: bin(x)Convert an integer numb ...