(四)pandas的拼接操作
pandas的拼接操作
#重点
pandas的拼接分为两种:
- 级联:pd.concat, pd.append
- 合并:pd.merge, pd.join
0. 回顾numpy的级联
import numpy as np
import pandas as pd
from pandas import Series,DataFrame
============================================
练习12:
- 生成2个3*3的矩阵,对其分别进行两个维度上的级联
============================================
nd1 =np.array([1,2,3])
nd2 =np.array([-1,-2,-3,-4])
np.concatenate([nd1,nd2])
array([ 1, 2, 3, -1, -2, -3, -4])
nd3 = np.array([[-1,-2,-3],[0,2,4]])
nd1 + nd3
array([[0, 0, 0],
[1, 4, 7]])
nd1.shape
(3,)
nd3.shape
(2, 3)
nd1 + nd2
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-10-cffcceec071c> in <module>()
----> 1 nd1 + nd2
ValueError: operands could not be broadcast together with shapes (3,) (4,)
为方便讲解,我们首先定义一个生成DataFrame的函数:
def make_df(cols,inds):
data = {c:[c+str(i) for i in inds] for c in cols}
return DataFrame(data,index = inds)
#当c = a c:a1 a2 a3
#当c =b c: b1 b2 b3
df1 = make_df(list("abc"),[1,2,3])
df1
#
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
| a | b | c | |
|---|---|---|---|
| 1 | a1 | b1 | c1 |
| 2 | a2 | b2 | c2 |
| 3 | a3 | b3 | c3 |
df2 = make_df(list('abc'),[4,5,6])
df2
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
| a | b | c | |
|---|---|---|---|
| 4 | a4 | b4 | c4 |
| 5 | a5 | b5 | c5 |
| 6 | a6 | b6 | c6 |
1. 使用pd.concat()级联
pandas使用pd.concat函数,与np.concatenate函数类似,只是多了一些参数:
pd.concat(objs, axis=0, join='outer', join_axes=None, ignore_index=False,
keys=None, levels=None, names=None, verify_integrity=False,
copy=True)
1) 简单级联
和np.concatenate一样,优先增加行数(默认axis=0)
pd.concat([df1,df2])
#在级联的时候,一定要注意他的轴!!!
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
| a | b | c | |
|---|---|---|---|
| 1 | a1 | b1 | c1 |
| 2 | a2 | b2 | c2 |
| 3 | a3 | b3 | c3 |
| 4 | a4 | b4 | c4 |
| 5 | a5 | b5 | c5 |
| 6 | a6 | b6 | c6 |
df1
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
| a | b | c | |
|---|---|---|---|
| 1 | a1 | b1 | c1 |
| 2 | a2 | b2 | c2 |
| 3 | a3 | b3 | c3 |
df3 =make_df(list("def"),[1,2,3])
df3
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
| d | e | f | |
|---|---|---|---|
| 1 | d1 | e1 | f1 |
| 2 | d2 | e2 | f2 |
| 3 | d3 | e3 | f3 |
df1 + df3
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
| a | b | c | d | e | f | |
|---|---|---|---|---|---|---|
| 1 | NaN | NaN | NaN | NaN | NaN | NaN |
| 2 | NaN | NaN | NaN | NaN | NaN | NaN |
| 3 | NaN | NaN | NaN | NaN | NaN | NaN |
pd.concat([df1, df3], axis = 1)
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
| a | b | c | d | e | f | |
|---|---|---|---|---|---|---|
| 1 | a1 | b1 | c1 | d1 | e1 | f1 |
| 2 | a2 | b2 | c2 | d2 | e2 | f2 |
| 3 | a3 | b3 | c3 | d3 | e3 | f3 |
pd.concat([df1,df2],axis = 1)
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
| a | b | c | a | b | c | |
|---|---|---|---|---|---|---|
| 1 | a1 | b1 | c1 | NaN | NaN | NaN |
| 2 | a2 | b2 | c2 | NaN | NaN | NaN |
| 3 | a3 | b3 | c3 | NaN | NaN | NaN |
| 4 | NaN | NaN | NaN | a4 | b4 | c4 |
| 5 | NaN | NaN | NaN | a5 | b5 | c5 |
| 6 | NaN | NaN | NaN | a6 | b6 | c6 |
可以通过设置axis来改变级联方向
注意index在级联时可以重复
df1
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
| a | b | c | |
|---|---|---|---|
| 1 | a1 | b1 | c1 |
| 2 | a2 | b2 | c2 |
| 3 | a3 | b3 | c3 |
df4 = make_df(list('abc'),[2,3,4])
df4
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
| a | b | c | |
|---|---|---|---|
| 2 | a2 | b2 | c2 |
| 3 | a3 | b3 | c3 |
| 4 | a4 | b4 | c4 |
pd.concat([df1,df4])
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
| a | b | c | |
|---|---|---|---|
| 1 | a1 | b1 | c1 |
| 2 | a2 | b2 | c2 |
| 3 | a3 | b3 | c3 |
| 2 | a2 | b2 | c2 |
| 3 | a3 | b3 | c3 |
| 4 | a4 | b4 | c4 |
也可以选择忽略ignore_index,重新索引
pd.concat([df1,df4],ignore_index=True)
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
| a | b | c | |
|---|---|---|---|
| 0 | a1 | b1 | c1 |
| 1 | a2 | b2 | c2 |
| 2 | a3 | b3 | c3 |
| 3 | a2 | b2 | c2 |
| 4 | a3 | b3 | c3 |
| 5 | a4 | b4 | c4 |
或者使用多层索引 keys
concat([x,y],keys=['x','y'])
pd.concat([df1,df4],keys = ["三班","四班"])
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
| a | b | c | ||
|---|---|---|---|---|
| 三班 | 1 | a1 | b1 | c1 |
| 2 | a2 | b2 | c2 | |
| 3 | a3 | b3 | c3 | |
| 四班 | 2 | a2 | b2 | c2 |
| 3 | a3 | b3 | c3 | |
| 4 | a4 | b4 | c4 |
============================================
练习13:
想一想级联的应用场景?
使用昨天的知识,建立一个期中考试张三、李四的成绩表ddd
假设新增考试学科"计算机",如何实现?
新增王老五同学的成绩,如何实现?
============================================
2) 不匹配级联
不匹配指的是级联的维度的索引不一致。例如纵向级联时列索引不一致,横向级联时行索引不一致
df1
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
| a | b | c | |
|---|---|---|---|
| 1 | a1 | b1 | c1 |
| 2 | a2 | b2 | c2 |
| 3 | a3 | b3 | c3 |
df5 = make_df(list("abcd"),[3,4,5,6])
df5
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
| a | b | c | d | |
|---|---|---|---|---|
| 3 | a3 | b3 | c3 | d3 |
| 4 | a4 | b4 | c4 | d4 |
| 5 | a5 | b5 | c5 | d5 |
| 6 | a6 | b6 | c6 | d6 |
pd.concat([df1,df5])
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
| a | b | c | d | |
|---|---|---|---|---|
| 1 | a1 | b1 | c1 | NaN |
| 2 | a2 | b2 | c2 | NaN |
| 3 | a3 | b3 | c3 | NaN |
| 3 | a3 | b3 | c3 | d3 |
| 4 | a4 | b4 | c4 | d4 |
| 5 | a5 | b5 | c5 | d5 |
| 6 | a6 | b6 | c6 | d6 |
有3种连接方式:
- 外连接:补NaN(默认模式)
#上面的这种情况 默认的这种情况!!!!
#join='outer'
- 内连接:只连接匹配的项
pd.concat([df1,df5],join = "inner")
#只匹配你能够匹配上去的项
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
| a | b | c | |
|---|---|---|---|
| 1 | a1 | b1 | c1 |
| 2 | a2 | b2 | c2 |
| 3 | a3 | b3 | c3 |
| 3 | a3 | b3 | c3 |
| 4 | a4 | b4 | c4 |
| 5 | a5 | b5 | c5 |
| 6 | a6 | b6 | c6 |
- 连接指定轴 join_axes
df6 = make_df(list("abcz"), [3,4,7,8])
df6
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
| a | b | c | z | |
|---|---|---|---|---|
| 3 | a3 | b3 | c3 | z3 |
| 4 | a4 | b4 | c4 | z4 |
| 7 | a7 | b7 | c7 | z7 |
| 8 | a8 | b8 | c8 | z8 |
type(df6.columns)
pandas.core.indexes.base.Index
df6.columns
Index(['a', 'b', 'c', 'z'], dtype='object')
pd.concat([df6,df5,df2,df1], join_axes=[df6.columns])
#axis 轴 axes 轴面
#join_axes list of Index objects
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
| a | b | c | z | |
|---|---|---|---|---|
| 3 | a3 | b3 | c3 | z3 |
| 4 | a4 | b4 | c4 | z4 |
| 7 | a7 | b7 | c7 | z7 |
| 8 | a8 | b8 | c8 | z8 |
| 3 | a3 | b3 | c3 | NaN |
| 4 | a4 | b4 | c4 | NaN |
| 5 | a5 | b5 | c5 | NaN |
| 6 | a6 | b6 | c6 | NaN |
| 4 | a4 | b4 | c4 | NaN |
| 5 | a5 | b5 | c5 | NaN |
| 6 | a6 | b6 | c6 | NaN |
| 1 | a1 | b1 | c1 | NaN |
| 2 | a2 | b2 | c2 | NaN |
| 3 | a3 | b3 | c3 | NaN |
============================================
练习14:
假设【期末】考试ddd2的成绩没有张三的,只有李四、王老五、赵小六的,使用多种方法级联
============================================
3) 使用append()函数添加
由于在后面级联的使用非常普遍,因此有一个函数append专门用于在后面添加
s1 = ["123"]
s1.append('456')
s1
['123', '456']
#append和concat非常类似
df1.append(df2)
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
| a | b | c | |
|---|---|---|---|
| 1 | a1 | b1 | c1 |
| 2 | a2 | b2 | c2 |
| 3 | a3 | b3 | c3 |
| 4 | a4 | b4 | c4 |
| 5 | a5 | b5 | c5 |
| 6 | a6 | b6 | c6 |
df5
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
| a | b | c | d | |
|---|---|---|---|---|
| 3 | a3 | b3 | c3 | d3 |
| 4 | a4 | b4 | c4 | d4 |
| 5 | a5 | b5 | c5 | d5 |
| 6 | a6 | b6 | c6 | d6 |
df5.append(df1)
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
| a | b | c | d | |
|---|---|---|---|---|
| 3 | a3 | b3 | c3 | d3 |
| 4 | a4 | b4 | c4 | d4 |
| 5 | a5 | b5 | c5 | d5 |
| 6 | a6 | b6 | c6 | d6 |
| 1 | a1 | b1 | c1 | NaN |
| 2 | a2 | b2 | c2 | NaN |
| 3 | a3 | b3 | c3 | NaN |
============================================
练习15:
新建一个只有张三李四王老五的期末考试成绩单ddd3,使用append()与期中考试成绩表ddd级联
============================================
2. 使用pd.merge()合并
#重点
#必须是两个DataFrame有相同属性的时候才能进行merge
merge与concat的区别在于,merge需要依据某一共同的行或列来进行合并
使用pd.merge()合并时,会自动根据两者相同column名称的那一列,作为key来进行合并。
注意每一列元素的顺序不要求一致
1) 一对一合并
df1 = DataFrame({"age":[30,22,36],"work":['tech',"accounting","sell"],"sex":["男","女","女"]}, index = list("abc"))
df1
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
| age | sex | work | |
|---|---|---|---|
| a | 30 | 男 | tech |
| b | 22 | 女 | accounting |
| c | 36 | 女 | sell |
df2 = DataFrame({"home":["上海","安徽","山东"],"work":['tech',"accounting","sell"],"weight":[60,50,45]},
index = list("abc"))
df2
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
| home | weight | work | |
|---|---|---|---|
| a | 上海 | 60 | tech |
| b | 安徽 | 50 | accounting |
| c | 山东 | 45 | sell |
pd.concat([df1,df2],axis = 1)
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
| age | sex | work | home | weight | work | |
|---|---|---|---|---|---|---|
| a | 30 | 男 | tech | 上海 | 60 | tech |
| b | 22 | 女 | accounting | 安徽 | 50 | accounting |
| c | 36 | 女 | sell | 山东 | 45 | sell |
df1.merge(df2)
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
| age | sex | work | home | weight | |
|---|---|---|---|---|---|
| 0 | 30 | 男 | tech | 上海 | 60 |
| 1 | 22 | 女 | accounting | 安徽 | 50 |
| 2 | 36 | 女 | sell | 山东 | 45 |
2) 多对一合并
df1
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
| age | sex | work | |
|---|---|---|---|
| a | 30 | 男 | tech |
| b | 22 | 女 | accounting |
| c | 36 | 女 | sell |
df3 = DataFrame({"home":["深圳","北京","上海","安徽","山东"],
"work":["tech","tech","tech","accounting","sell"],
"weight":[60,75,80,54,63]},index = list("abcde"))
df3
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
| home | weight | work | |
|---|---|---|---|
| a | 深圳 | 60 | tech |
| b | 北京 | 75 | tech |
| c | 上海 | 80 | tech |
| d | 安徽 | 54 | accounting |
| e | 山东 | 63 | sell |
df1.merge(df3)
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
| age | sex | work | home | weight | |
|---|---|---|---|---|---|
| 0 | 30 | 男 | tech | 深圳 | 60 |
| 1 | 30 | 男 | tech | 北京 | 75 |
| 2 | 30 | 男 | tech | 上海 | 80 |
| 3 | 22 | 女 | accounting | 安徽 | 54 |
| 4 | 36 | 女 | sell | 山东 | 63 |
3) 多对多合并
df5 = DataFrame({"age":[28,30,22,36], "work":['tech',"tech","accounting","sell"],"sex":["女","男","女","女"]}, index = list("abce"))
df5
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
| age | sex | work | |
|---|---|---|---|
| a | 28 | 女 | tech |
| b | 30 | 男 | tech |
| c | 22 | 女 | accounting |
| e | 36 | 女 | sell |
df3
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
| home | weight | work | |
|---|---|---|---|
| a | 深圳 | 60 | tech |
| b | 北京 | 75 | tech |
| c | 上海 | 80 | tech |
| d | 安徽 | 54 | accounting |
| e | 山东 | 63 | sell |
df3.merge(df5)
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
| home | weight | work | age | sex | |
|---|---|---|---|---|---|
| 0 | 深圳 | 60 | tech | 28 | 女 |
| 1 | 深圳 | 60 | tech | 30 | 男 |
| 2 | 北京 | 75 | tech | 28 | 女 |
| 3 | 北京 | 75 | tech | 30 | 男 |
| 4 | 上海 | 80 | tech | 28 | 女 |
| 5 | 上海 | 80 | tech | 30 | 男 |
| 6 | 安徽 | 54 | accounting | 22 | 女 |
| 7 | 山东 | 63 | sell | 36 | 女 |
4) key的规范化
- 使用on=显式指定哪一列为key,当有多个key相同时使用
df5
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
| age | sex | work | |
|---|---|---|---|
| a | 28 | 女 | tech |
| b | 30 | 男 | tech |
| c | 22 | 女 | accounting |
| e | 36 | 女 | sell |
df6 = DataFrame({"age":[30,27,36],"work":["tech","leader","sell"],"hoppy":["sixdog","diaofish","playcat"]}, index = list("abc"))
df6
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
| age | hoppy | work | |
|---|---|---|---|
| a | 30 | sixdog | tech |
| b | 27 | diaofish | leader |
| c | 36 | playcat | sell |
df5.merge(df6, on = "age", suffixes=["_总部","_分部"])
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
| age | sex | work_总部 | hoppy | work_分部 | |
|---|---|---|---|---|---|
| 0 | 30 | 男 | tech | sixdog | tech |
| 1 | 36 | 女 | sell | playcat | sell |
df5.merge(df6,on = "work")
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
| age_x | sex | work | age_y | hoppy | |
|---|---|---|---|---|---|
| 0 | 28 | 女 | tech | 30 | sixdog |
| 1 | 30 | 男 | tech | 30 | sixdog |
| 2 | 36 | 女 | sell | 36 | playcat |
- 使用left_on和right_on指定左右两边的列作为key,当左右两边的key都不想等时使用
df5
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
| age | sex | work | |
|---|---|---|---|
| a | 28 | 女 | tech |
| b | 30 | 男 | tech |
| c | 22 | 女 | accounting |
| e | 36 | 女 | sell |
df7 = DataFrame({"年龄":[30,22,36],"工作":["tech","accounting","sell"],"性别":["男","女","女"]},index = list("abc"))
df7
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
| 工作 | 年龄 | 性别 | |
|---|---|---|---|
| a | tech | 30 | 男 |
| b | accounting | 22 | 女 |
| c | sell | 36 | 女 |
df5.merge(df7,left_on = "work", right_on = "工作")
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
| age | sex | work | 工作 | 年龄 | 性别 | |
|---|---|---|---|---|---|---|
| 0 | 28 | 女 | tech | tech | 30 | 男 |
| 1 | 30 | 男 | tech | tech | 30 | 男 |
| 2 | 22 | 女 | accounting | accounting | 22 | 女 |
| 3 | 36 | 女 | sell | sell | 36 | 女 |
df5
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
| age | sex | work | |
|---|---|---|---|
| a | 28 | 女 | tech |
| b | 30 | 男 | tech |
| c | 22 | 女 | accounting |
| e | 36 | 女 | sell |
s = df5[["age"]]*1000
s.columns = ["salary"]
s
#可以对列的名字进行修改
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
| salary | |
|---|---|
| a | 28000 |
| b | 30000 |
| c | 22000 |
| e | 36000 |
df5.merge(s, left_index = True,right_index=True)
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
| age | sex | work | salary | |
|---|---|---|---|---|
| a | 28 | 女 | tech | 28000 |
| b | 30 | 男 | tech | 30000 |
| c | 22 | 女 | accounting | 22000 |
| e | 36 | 女 | sell | 36000 |
pd.concat([df5,s],axis = 1)
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
| age | sex | work | salary | |
|---|---|---|---|---|
| a | 28 | 女 | tech | 28000 |
| b | 30 | 男 | tech | 30000 |
| c | 22 | 女 | accounting | 22000 |
| e | 36 | 女 | sell | 36000 |
============================================
练习16:
假设有两份成绩单,除了ddd是张三李四王老五之外,还有ddd4是张三和赵小六的成绩单,如何合并?
如果ddd4中张三的名字被打错了,成为了张十三,怎么办?
自行练习多对一,多对多的情况
============================================
5) 内合并与外合并
- 内合并:只保留两者都有的key(默认模式)
df3
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
| home | weight | work | |
|---|---|---|---|
| a | 深圳 | 60 | tech |
| b | 北京 | 75 | tech |
| c | 上海 | 80 | tech |
| d | 安徽 | 54 | accounting |
| e | 山东 | 63 | sell |
df5
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
| age | sex | work | |
|---|---|---|---|
| a | 28 | 女 | tech |
| b | 30 | 男 | tech |
| c | 22 | 女 | accounting |
| e | 36 | 女 | sell |
df6
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
| age | hoppy | work | |
|---|---|---|---|
| a | 30 | sixdog | tech |
| b | 27 | diaofish | leader |
| c | 36 | playcat | sell |
df3
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
| home | weight | work | |
|---|---|---|---|
| a | 深圳 | 60 | tech |
| b | 北京 | 75 | tech |
| c | 上海 | 80 | tech |
| d | 安徽 | 54 | accounting |
| e | 山东 | 63 | sell |
df3.merge(df6)
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
| home | weight | work | age | hoppy | |
|---|---|---|---|---|---|
| 0 | 深圳 | 60 | tech | 30 | sixdog |
| 1 | 北京 | 75 | tech | 30 | sixdog |
| 2 | 上海 | 80 | tech | 30 | sixdog |
| 3 | 山东 | 63 | sell | 36 | playcat |
- 外合并 how='outer':补NaN
df3.merge(df6,how = "outer")
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
| home | weight | work | age | hoppy | |
|---|---|---|---|---|---|
| 0 | 深圳 | 60.0 | tech | 30.0 | sixdog |
| 1 | 北京 | 75.0 | tech | 30.0 | sixdog |
| 2 | 上海 | 80.0 | tech | 30.0 | sixdog |
| 3 | 安徽 | 54.0 | accounting | NaN | NaN |
| 4 | 山东 | 63.0 | sell | 36.0 | playcat |
| 5 | NaN | NaN | leader | 27.0 | diaofish |
- 左合并、右合并:how='left',how='right',
df3
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
| home | weight | work | |
|---|---|---|---|
| a | 深圳 | 60 | tech |
| b | 北京 | 75 | tech |
| c | 上海 | 80 | tech |
| d | 安徽 | 54 | accounting |
| e | 山东 | 63 | sell |
df6
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
| age | hoppy | work | |
|---|---|---|---|
| a | 30 | sixdog | tech |
| b | 27 | diaofish | leader |
| c | 36 | playcat | sell |
df3.merge(df6, how = "left")
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
| home | weight | work | age | hoppy | |
|---|---|---|---|---|---|
| 0 | 深圳 | 60 | tech | 30.0 | sixdog |
| 1 | 北京 | 75 | tech | 30.0 | sixdog |
| 2 | 上海 | 80 | tech | 30.0 | sixdog |
| 3 | 安徽 | 54 | accounting | NaN | NaN |
| 4 | 山东 | 63 | sell | 36.0 | playcat |
df3.merge(df6, how = "right")
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
| home | weight | work | age | hoppy | |
|---|---|---|---|---|---|
| 0 | 深圳 | 60.0 | tech | 30 | sixdog |
| 1 | 北京 | 75.0 | tech | 30 | sixdog |
| 2 | 上海 | 80.0 | tech | 30 | sixdog |
| 3 | 山东 | 63.0 | sell | 36 | playcat |
| 4 | NaN | NaN | leader | 27 | diaofish |
============================================
练习17:
如果只有张三赵小六语数英三个科目的成绩,如何合并?
考虑应用情景,使用多种方式合并ddd与ddd4
============================================
6) 列冲突的解决
当列冲突时,即有多个列名称相同时,需要使用on=来指定哪一个列作为key,配合suffixes指定冲突列名
可以使用suffixes=自己指定后缀
============================================
练习18:
假设有两个同学都叫李四,ddd5、ddd6都是张三和李四的成绩表,如何合并?
============================================
作业
3. 案例分析:美国各州人口数据分析
首先导入文件,并查看数据样本
pop = pd.read_csv("./state-population.csv")
pop.head(20)
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
| state/region | ages | year | population | |
|---|---|---|---|---|
| 0 | AL | under18 | 2012 | 1117489.0 |
| 1 | AL | total | 2012 | 4817528.0 |
| 2 | AL | under18 | 2010 | 1130966.0 |
| 3 | AL | total | 2010 | 4785570.0 |
| 4 | AL | under18 | 2011 | 1125763.0 |
| 5 | AL | total | 2011 | 4801627.0 |
| 6 | AL | total | 2009 | 4757938.0 |
| 7 | AL | under18 | 2009 | 1134192.0 |
| 8 | AL | under18 | 2013 | 1111481.0 |
| 9 | AL | total | 2013 | 4833722.0 |
| 10 | AL | total | 2007 | 4672840.0 |
| 11 | AL | under18 | 2007 | 1132296.0 |
| 12 | AL | total | 2008 | 4718206.0 |
| 13 | AL | under18 | 2008 | 1134927.0 |
| 14 | AL | total | 2005 | 4569805.0 |
| 15 | AL | under18 | 2005 | 1117229.0 |
| 16 | AL | total | 2006 | 4628981.0 |
| 17 | AL | under18 | 2006 | 1126798.0 |
| 18 | AL | total | 2004 | 4530729.0 |
| 19 | AL | under18 | 2004 | 1113662.0 |
pop.shape
(2544, 4)
areas = pd.read_csv("./state-areas.csv")
areas
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
| state | area (sq. mi) | |
|---|---|---|
| 0 | Alabama | 52423 |
| 1 | Alaska | 656425 |
| 2 | Arizona | 114006 |
| 3 | Arkansas | 53182 |
| 4 | California | 163707 |
| 5 | Colorado | 104100 |
| 6 | Connecticut | 5544 |
| 7 | Delaware | 1954 |
| 8 | Florida | 65758 |
| 9 | Georgia | 59441 |
| 10 | Hawaii | 10932 |
| 11 | Idaho | 83574 |
| 12 | Illinois | 57918 |
| 13 | Indiana | 36420 |
| 14 | Iowa | 56276 |
| 15 | Kansas | 82282 |
| 16 | Kentucky | 40411 |
| 17 | Louisiana | 51843 |
| 18 | Maine | 35387 |
| 19 | Maryland | 12407 |
| 20 | Massachusetts | 10555 |
| 21 | Michigan | 96810 |
| 22 | Minnesota | 86943 |
| 23 | Mississippi | 48434 |
| 24 | Missouri | 69709 |
| 25 | Montana | 147046 |
| 26 | Nebraska | 77358 |
| 27 | Nevada | 110567 |
| 28 | New Hampshire | 9351 |
| 29 | New Jersey | 8722 |
| 30 | New Mexico | 121593 |
| 31 | New York | 54475 |
| 32 | North Carolina | 53821 |
| 33 | North Dakota | 70704 |
| 34 | Ohio | 44828 |
| 35 | Oklahoma | 69903 |
| 36 | Oregon | 98386 |
| 37 | Pennsylvania | 46058 |
| 38 | Rhode Island | 1545 |
| 39 | South Carolina | 32007 |
| 40 | South Dakota | 77121 |
| 41 | Tennessee | 42146 |
| 42 | Texas | 268601 |
| 43 | Utah | 84904 |
| 44 | Vermont | 9615 |
| 45 | Virginia | 42769 |
| 46 | Washington | 71303 |
| 47 | West Virginia | 24231 |
| 48 | Wisconsin | 65503 |
| 49 | Wyoming | 97818 |
| 50 | District of Columbia | 68 |
| 51 | Puerto Rico | 3515 |
areas.shape
(52, 2)
abbr = pd.read_csv("./state-abbrevs.csv")
abbr.head()
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
| state | abbreviation | |
|---|---|---|
| 0 | Alabama | AL |
| 1 | Alaska | AK |
| 2 | Arizona | AZ |
| 3 | Arkansas | AR |
| 4 | California | CA |
abbr.shape
(51, 2)
合并pop与abbrevs两个DataFrame,分别依据state/region列和abbreviation列来合并。
为了保留所有信息,使用外合并。
#pop :2544行的数据 abbr 51的条数据
pop2 = pop.merge(abbr,left_on = "state/region", right_on = "abbreviation", how = "left")
pop2.head()
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
| state/region | ages | year | population | state | abbreviation | |
|---|---|---|---|---|---|---|
| 0 | AL | under18 | 2012 | 1117489.0 | Alabama | AL |
| 1 | AL | total | 2012 | 4817528.0 | Alabama | AL |
| 2 | AL | under18 | 2010 | 1130966.0 | Alabama | AL |
| 3 | AL | total | 2010 | 4785570.0 | Alabama | AL |
| 4 | AL | under18 | 2011 | 1125763.0 | Alabama | AL |
去除abbreviation的那一列(axis=1)
pop2.drop("abbreviation", axis = 1,inplace=True)
pop2
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
| state/region | ages | year | population | state | |
|---|---|---|---|---|---|
| 0 | AL | under18 | 2012 | 1117489.0 | Alabama |
| 1 | AL | total | 2012 | 4817528.0 | Alabama |
| 2 | AL | under18 | 2010 | 1130966.0 | Alabama |
| 3 | AL | total | 2010 | 4785570.0 | Alabama |
| 4 | AL | under18 | 2011 | 1125763.0 | Alabama |
| 5 | AL | total | 2011 | 4801627.0 | Alabama |
| 6 | AL | total | 2009 | 4757938.0 | Alabama |
| 7 | AL | under18 | 2009 | 1134192.0 | Alabama |
| 8 | AL | under18 | 2013 | 1111481.0 | Alabama |
| 9 | AL | total | 2013 | 4833722.0 | Alabama |
| 10 | AL | total | 2007 | 4672840.0 | Alabama |
| 11 | AL | under18 | 2007 | 1132296.0 | Alabama |
| 12 | AL | total | 2008 | 4718206.0 | Alabama |
| 13 | AL | under18 | 2008 | 1134927.0 | Alabama |
| 14 | AL | total | 2005 | 4569805.0 | Alabama |
| 15 | AL | under18 | 2005 | 1117229.0 | Alabama |
| 16 | AL | total | 2006 | 4628981.0 | Alabama |
| 17 | AL | under18 | 2006 | 1126798.0 | Alabama |
| 18 | AL | total | 2004 | 4530729.0 | Alabama |
| 19 | AL | under18 | 2004 | 1113662.0 | Alabama |
| 20 | AL | total | 2003 | 4503491.0 | Alabama |
| 21 | AL | under18 | 2003 | 1113083.0 | Alabama |
| 22 | AL | total | 2001 | 4467634.0 | Alabama |
| 23 | AL | under18 | 2001 | 1120409.0 | Alabama |
| 24 | AL | total | 2002 | 4480089.0 | Alabama |
| 25 | AL | under18 | 2002 | 1116590.0 | Alabama |
| 26 | AL | under18 | 1999 | 1121287.0 | Alabama |
| 27 | AL | total | 1999 | 4430141.0 | Alabama |
| 28 | AL | total | 2000 | 4452173.0 | Alabama |
| 29 | AL | under18 | 2000 | 1122273.0 | Alabama |
| ... | ... | ... | ... | ... | ... |
| 2514 | USA | under18 | 1999 | 71946051.0 | NaN |
| 2515 | USA | total | 2000 | 282162411.0 | NaN |
| 2516 | USA | under18 | 2000 | 72376189.0 | NaN |
| 2517 | USA | total | 1999 | 279040181.0 | NaN |
| 2518 | USA | total | 2001 | 284968955.0 | NaN |
| 2519 | USA | under18 | 2001 | 72671175.0 | NaN |
| 2520 | USA | total | 2002 | 287625193.0 | NaN |
| 2521 | USA | under18 | 2002 | 72936457.0 | NaN |
| 2522 | USA | total | 2003 | 290107933.0 | NaN |
| 2523 | USA | under18 | 2003 | 73100758.0 | NaN |
| 2524 | USA | total | 2004 | 292805298.0 | NaN |
| 2525 | USA | under18 | 2004 | 73297735.0 | NaN |
| 2526 | USA | total | 2005 | 295516599.0 | NaN |
| 2527 | USA | under18 | 2005 | 73523669.0 | NaN |
| 2528 | USA | total | 2006 | 298379912.0 | NaN |
| 2529 | USA | under18 | 2006 | 73757714.0 | NaN |
| 2530 | USA | total | 2007 | 301231207.0 | NaN |
| 2531 | USA | under18 | 2007 | 74019405.0 | NaN |
| 2532 | USA | total | 2008 | 304093966.0 | NaN |
| 2533 | USA | under18 | 2008 | 74104602.0 | NaN |
| 2534 | USA | under18 | 2013 | 73585872.0 | NaN |
| 2535 | USA | total | 2013 | 316128839.0 | NaN |
| 2536 | USA | total | 2009 | 306771529.0 | NaN |
| 2537 | USA | under18 | 2009 | 74134167.0 | NaN |
| 2538 | USA | under18 | 2010 | 74119556.0 | NaN |
| 2539 | USA | total | 2010 | 309326295.0 | NaN |
| 2540 | USA | under18 | 2011 | 73902222.0 | NaN |
| 2541 | USA | total | 2011 | 311582564.0 | NaN |
| 2542 | USA | under18 | 2012 | 73708179.0 | NaN |
| 2543 | USA | total | 2012 | 313873685.0 | NaN |
2544 rows × 5 columns
查看存在缺失数据的列。
使用.isnull().any(),只有某一列存在一个缺失数据,就会显示True。
cond = pop2.isnull().any(axis = 1)
pop2[cond]
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
| state/region | ages | year | population | state | |
|---|---|---|---|---|---|
| 2448 | PR | under18 | 1990 | NaN | NaN |
| 2449 | PR | total | 1990 | NaN | NaN |
| 2450 | PR | total | 1991 | NaN | NaN |
| 2451 | PR | under18 | 1991 | NaN | NaN |
| 2452 | PR | total | 1993 | NaN | NaN |
| 2453 | PR | under18 | 1993 | NaN | NaN |
| 2454 | PR | under18 | 1992 | NaN | NaN |
| 2455 | PR | total | 1992 | NaN | NaN |
| 2456 | PR | under18 | 1994 | NaN | NaN |
| 2457 | PR | total | 1994 | NaN | NaN |
| 2458 | PR | total | 1995 | NaN | NaN |
| 2459 | PR | under18 | 1995 | NaN | NaN |
| 2460 | PR | under18 | 1996 | NaN | NaN |
| 2461 | PR | total | 1996 | NaN | NaN |
| 2462 | PR | under18 | 1998 | NaN | NaN |
| 2463 | PR | total | 1998 | NaN | NaN |
| 2464 | PR | total | 1997 | NaN | NaN |
| 2465 | PR | under18 | 1997 | NaN | NaN |
| 2466 | PR | total | 1999 | NaN | NaN |
| 2467 | PR | under18 | 1999 | NaN | NaN |
| 2468 | PR | total | 2000 | 3810605.0 | NaN |
| 2469 | PR | under18 | 2000 | 1089063.0 | NaN |
| 2470 | PR | total | 2001 | 3818774.0 | NaN |
| 2471 | PR | under18 | 2001 | 1077566.0 | NaN |
| 2472 | PR | total | 2002 | 3823701.0 | NaN |
| 2473 | PR | under18 | 2002 | 1065051.0 | NaN |
| 2474 | PR | total | 2004 | 3826878.0 | NaN |
| 2475 | PR | under18 | 2004 | 1035919.0 | NaN |
| 2476 | PR | total | 2003 | 3826095.0 | NaN |
| 2477 | PR | under18 | 2003 | 1050615.0 | NaN |
| ... | ... | ... | ... | ... | ... |
| 2514 | USA | under18 | 1999 | 71946051.0 | NaN |
| 2515 | USA | total | 2000 | 282162411.0 | NaN |
| 2516 | USA | under18 | 2000 | 72376189.0 | NaN |
| 2517 | USA | total | 1999 | 279040181.0 | NaN |
| 2518 | USA | total | 2001 | 284968955.0 | NaN |
| 2519 | USA | under18 | 2001 | 72671175.0 | NaN |
| 2520 | USA | total | 2002 | 287625193.0 | NaN |
| 2521 | USA | under18 | 2002 | 72936457.0 | NaN |
| 2522 | USA | total | 2003 | 290107933.0 | NaN |
| 2523 | USA | under18 | 2003 | 73100758.0 | NaN |
| 2524 | USA | total | 2004 | 292805298.0 | NaN |
| 2525 | USA | under18 | 2004 | 73297735.0 | NaN |
| 2526 | USA | total | 2005 | 295516599.0 | NaN |
| 2527 | USA | under18 | 2005 | 73523669.0 | NaN |
| 2528 | USA | total | 2006 | 298379912.0 | NaN |
| 2529 | USA | under18 | 2006 | 73757714.0 | NaN |
| 2530 | USA | total | 2007 | 301231207.0 | NaN |
| 2531 | USA | under18 | 2007 | 74019405.0 | NaN |
| 2532 | USA | total | 2008 | 304093966.0 | NaN |
| 2533 | USA | under18 | 2008 | 74104602.0 | NaN |
| 2534 | USA | under18 | 2013 | 73585872.0 | NaN |
| 2535 | USA | total | 2013 | 316128839.0 | NaN |
| 2536 | USA | total | 2009 | 306771529.0 | NaN |
| 2537 | USA | under18 | 2009 | 74134167.0 | NaN |
| 2538 | USA | under18 | 2010 | 74119556.0 | NaN |
| 2539 | USA | total | 2010 | 309326295.0 | NaN |
| 2540 | USA | under18 | 2011 | 73902222.0 | NaN |
| 2541 | USA | total | 2011 | 311582564.0 | NaN |
| 2542 | USA | under18 | 2012 | 73708179.0 | NaN |
| 2543 | USA | total | 2012 | 313873685.0 | NaN |
96 rows × 5 columns
查看缺失数据
根据数据是否缺失情况显示数据,如果缺失为True,那么显示
找到有哪些state/region使得state的值为NaN,使用unique()查看非重复值
pop2.head()
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
| state/region | ages | year | population | state | |
|---|---|---|---|---|---|
| 0 | AL | under18 | 2012 | 1117489.0 | Alabama |
| 1 | AL | total | 2012 | 4817528.0 | Alabama |
| 2 | AL | under18 | 2010 | 1130966.0 | Alabama |
| 3 | AL | total | 2010 | 4785570.0 | Alabama |
| 4 | AL | under18 | 2011 | 1125763.0 | Alabama |
#让你查看哪一个州的有空值的 州的缩写
cond_state = pop2["state"].isnull()
cond_state
0 False
1 False
2 False
3 False
4 False
5 False
6 False
7 False
8 False
9 False
10 False
11 False
12 False
13 False
14 False
15 False
16 False
17 False
18 False
19 False
20 False
21 False
22 False
23 False
24 False
25 False
26 False
27 False
28 False
29 False
...
2514 True
2515 True
2516 True
2517 True
2518 True
2519 True
2520 True
2521 True
2522 True
2523 True
2524 True
2525 True
2526 True
2527 True
2528 True
2529 True
2530 True
2531 True
2532 True
2533 True
2534 True
2535 True
2536 True
2537 True
2538 True
2539 True
2540 True
2541 True
2542 True
2543 True
Name: state, Length: 2544, dtype: bool
pop2[cond_state]["state/region"].unique()
array(['PR', 'USA'], dtype=object)
为找到的这些state/region的state项补上正确的值,从而去除掉state这一列的所有NaN!
记住这样清除缺失数据NaN的方法!
合并各州面积数据areas,使用左合并。
思考一下为什么使用外合并?
继续寻找存在缺失数据的列
我们会发现area(sq.mi)这一列有缺失数据,为了找出是哪一行,我们需要找出是哪个state没有数据
去除含有缺失数据的行
查看数据是否缺失
找出2010年的全民人口数据,df.query(查询语句)
对查询结果进行处理,以state列作为新的行索引:set_index
计算人口密度。注意是Series/Series,其结果还是一个Series。
排序,并找出人口密度最高的五个州sort_values()
找出人口密度最低的五个州
要点总结:
- 统一用loc()索引
- 善于使用.isnull().any()找到存在NaN的列
- 善于使用.unique()确定该列中哪些key是我们需要的
- 一般使用外合并、左合并,目的只有一个:宁愿该列是NaN也不要丢弃其他列的信息
回顾:Series/DataFrame运算与ndarray运算的区别
- Series与DataFrame没有广播,如果对应index没有值,则记为NaN;或者使用add的fill_value来补缺失值
- ndarray有广播,通过重复已有值来计算
(四)pandas的拼接操作的更多相关文章
- Pandas的拼接操作
pandas的拼接操作 pandas的拼接分为两种: 级联:pd.concat, pd.append 合并:pd.merge, pd.join import pandas as pd import n ...
- Pandas 拼接操作 数据处理
数据分析 生成器 迭代器 装饰器 (两层传参) 单例模式() ios七层 io多路 数据分析:是把隐藏在一些看似杂乱无章的数据背后的信息提炼出来,总结出所研究对象的内在规律 pandas的拼接操作 p ...
- 深度学习实践-强化学习-bird游戏 1.np.stack(表示进行拼接操作) 2.cv2.resize(进行图像的压缩操作) 3.cv2.cvtColor(进行图片颜色的转换) 4.cv2.threshold(进行图片的二值化操作) 5.random.sample(样本的随机抽取)
1. np.stack((x_t, x_t, x_t, x_t), axis=2) 将图片进行串接的操作,使得图片的维度为[80, 80, 4] 参数说明: (x_t, x_t, x_t, x_t) ...
- 数据分析05 /pandas的高级操作
数据分析05 /pandas的高级操作 目录 数据分析05 /pandas的高级操作 1. 替换操作 2. 映射操作 3. 运算工具 4. 映射索引 / 更改之前索引 5. 排序实现的随机抽样/打乱表 ...
- 实验四 简单的PV操作
实验四 简单的PV操作 专业 网络工程 姓名 方俊晖 学号 201406114309 一. 实验目的 1.掌握临界区的概念及临界区的设计原则: 2.掌握信号量的概念.PV操作的含义以 ...
- 十天学Linux内核之第四天---如何处理输入输出操作
原文:十天学Linux内核之第四天---如何处理输入输出操作 真的是悲喜交加呀,本来这个寒假早上8点都去练车,两个小时之后再来实验室陪伴Linux内核,但是今天教练说没名额考试了,好纠结,不过想想就可 ...
- 第四章 使用jQuery操作DOM
第四章 使用jQuery操作DOM 一.DOM操作 在jQuery中的DOM操作主要可分为样式操作.文本和value属性值操作.节点操作: 节点操作又包含属性操作.节点遍历和CSS-DOM操作. 其中 ...
- Python/MySQL(四、MySQL数据库操作)
Python/MySQL(四.MySQL数据库操作) 一.数据库条件语句: case when id>9 then ture else false 二.三元运算: if(isnull(xx)0, ...
- pandas的apply操作
pandas的apply操作类似于Scala的udf一样方便,假设存在如下dataframe: id_part pred pred_class v_id 0 d [0.722817, 0.650064 ...
随机推荐
- 解决Maven静态资源过滤问题
在项目的pom.xml中添加下面的内容 <build> <resources> <resource> <directory>src/main/java& ...
- D2大全
年初看到cnblogs上有人说看这本旧书,自己也只是瞟了下,后来在看些OOP东西时,想想没事也看看老古董,于是网购了一本电子版可参考下,它们是怎么一步步来,还没来得及多看,贴图于此.
- 如何在python列表中查找某个元素的索引
如何在python列表中查找某个元素的索引 2019-03-15 百度上回复别人的问题,几种方式的回答: 1) print('*'*15,'想找出里面有重复数据的索引值','*'*15) listA ...
- (四)Parameters,简单参数就用这个
Parameters注解在测试方法上指定参数列表,然后在测试方法中声明对应的形参,形参与参数列表一一对应,但名字可以不同,如下所示: public class Test1 { @Parameters( ...
- gdb基本命令总结
本文介绍使用gdb调试程序的常用命令. 主要内容: [简介] [举例] [其他] [简介] ============= GDB是GNU开源组织发布的一个强大的UNIX下的程序调试工具.如果你是在 ...
- Typescript的interface、class和abstract class
interface,class,和abstract class这3个概念,既有联系,又有区别,本文尝试着结合官方文档来阐述这三者之间的关系. 1. Declaration Merging Declar ...
- Cannot instantiate the type Map
今天在使用Map中犯了个低级错误,记录一下: 打算使用map为一个视频List观看记录的统计标识,key为vid,value默认为false,当该key已经统计,标识value为true,初始实例化M ...
- Redis Wendows安装步骤
1.打开cmd命令提示符2.打开cmd如下图,输入Redis下载磁盘名称+“:” ,然后回车 3.如下图:输入"cd" 然后空格,后面是Redis的路径 回车 4.设置服务命令 ...
- vwware虚拟机网卡的三种模式
这里在虚拟机中必须要保证右上角的两个勾选上 三种模式:简单一个比如宿主机器直接连接路由器上网,那虚拟机和宿主机器是一定的可以上外网,相当于虚拟机直接连接在路由器上面,虚拟机需要配置可以上外网的IP地址 ...
- Python3-subprocess模块-子进程管理
简单介绍 subprocess模块可以创建新的进程,执行shell命令.Python脚本等 代码示例 import subprocess # 1.执行进程,并获取返回码 return_code = s ...