【跟着stackoverflow学Pandas】 - Adding new column to existing DataFrame in Python pandas

最近做一个系列博客，跟着stackoverflow学Pandas。

以 pandas作为关键词，在stackoverflow中进行搜索，随后安照 votes 数目进行排序：

https://stackoverflow.com/questions/tagged/pandas?sort=votes&pageSize=15

Adding new column to existing DataFrame in Python pandas - Pandas 添加列

https://stackoverflow.com/questions/12555323/adding-new-column-to-existing-dataframe-in-python-pandas

pandas官方给出了对列的操作，可以参考：

http://pandas.pydata.org/pandas-docs/stable/dsintro.html#column-selection-addition-deletion

数据准备

随机生成8*3的DataFrame df1，筛选 a 列大于0.5的行组成df2，作为我们的初始数据。

import numpy as np
import pandas as pd

print pd.__version__
#0.19.2
np.random.seed(0)
df1 = pd.DataFrame(np.random.randn(8, 3), columns=['a', 'b', 'c'])
print df1
          a         b         c
# 0  1.764052  0.400157  0.978738
# 1  2.240893  1.867558 -0.977278
# 2  0.950088 -0.151357 -0.103219
# 3  0.410599  0.144044  1.454274
# 4  0.761038  0.121675  0.443863
# 5  0.333674  1.494079 -0.205158
# 6  0.313068 -0.854096 -2.552990
# 7  0.653619  0.864436 -0.742165

df2 = df1[df1['a']> 0.5]
df3 = df2

sLength = len(df2['a'])
d = pd.Series(np.random.randn(sLength))

直接赋值

采用 df2['d'] = d 或者 df2.loc[:, 'd'] = d 直接进行赋值。

print df2
#           a         b         c
# 0  1.764052  0.400157  0.978738
# 1  2.240893  1.867558 -0.977278
# 2  0.950088 -0.151357 -0.103219
# 4  0.761038  0.121675  0.443863
# 7  0.653619  0.864436 -0.742165

print d
# 0    2.269755
# 1   -1.454366
# 2    0.045759
# 3   -0.187184
# 4    1.532779

print type(d)
#<class 'pandas.core.series.Series'>
# 下面的方法可以，但是会有SettingWithCopyWarning警告
df2['d'] = d
# /Library/Python/2.7/site-packages/ipykernel/__main__.py:1: SettingWithCopyWarning:
# A value is trying to be set on a copy of a slice from a DataFrame.
# Try using .loc[row_indexer,col_indexer] = value instead

# See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
#   if __name__ == '__main__':
# 为了避免警告我们可以采用这种方式来进行直接赋值
df2.loc[:, 'd'] = d
print df2
          a         b         c         d
# 0  1.764052  0.400157  0.978738  2.269755
# 1  2.240893  1.867558 -0.977278 -1.454366
# 2  0.950088 -0.151357 -0.103219  0.045759
# 4  0.761038  0.121675  0.443863  1.532779
# 7  0.653619  0.864436 -0.742165       NaN

df2.loc[:, 'd1'] = d.tolist() # 或者 d.values()
# d.tolist() 返回list
# d.values 返回 numpy.ndarray

print df2
#           a         b         c         d        d1
# 0  1.764052  0.400157  0.978738  2.269755  2.269755
# 1  2.240893  1.867558 -0.977278 -1.454366 -1.454366
# 2  0.950088 -0.151357 -0.103219  0.045759  0.045759
# 4  0.761038  0.121675  0.443863  1.532779 -0.187184
# 7  0.653619  0.864436 -0.742165       NaN  1.532779

我们可以发现，df2是5行数据， d 也是5个数据，但是赋值之后d列仅有4个值，深究发现，d是Series类型，df2['d'] = d 是根据index对其进行赋值，只有 0 1 2 4 等4个index在d中有对应， 7 没有对应所以为NaN.

如果忽略index影响，我们可以采用d.tolist() 或者 d.values()

同时，在 pandas 0.19.2 中，采用 df2['d'] = d，提示SettingWithCopyWarning，尽量避免这种方式，采用df2.loc[:, 'd'] = d的方式进行列的增加。

assign 赋值

官方推荐，assign 为DataFrame增加新列。

pandas官方参考：

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.assign.html

print df3
#           a         b         c
# 0  1.764052  0.400157  0.978738
# 1  2.240893  1.867558 -0.977278
# 2  0.950088 -0.151357 -0.103219
# 4  0.761038  0.121675  0.443863
# 7  0.653619  0.864436 -0.742165

print d
# 0    2.269755
# 1   -1.454366
# 2    0.045759
# 3   -0.187184
# 4    1.532779

# 对 d.values (numpy.ndarray)进行赋值
df3 = df3.assign(d = d.values)
print df3

#           a         b         c         d
# 0  1.764052  0.400157  0.978738  2.269755
# 1  2.240893  1.867558 -0.977278 -1.454366
# 2  0.950088 -0.151357 -0.103219  0.045759
# 4  0.761038  0.121675  0.443863 -0.187184
# 7  0.653619  0.864436 -0.742165  1.532779

# 对 d(Series) 进行赋值
df4 = df3.assign(d = d)
print df4

          a         b         c         d
# 0  1.764052  0.400157  0.978738  2.269755
# 1  2.240893  1.867558 -0.977278 -1.454366
# 2  0.950088 -0.151357 -0.103219  0.045759
# 4  0.761038  0.121675  0.443863  1.532779
# 7  0.653619  0.864436 -0.742165       NaN

可以发现 df3 采用 assign 进行赋值，可以得到跟loc直接赋值相同的结果，区别在于赋值的类型是 Series还是 numpy.ndarray 或者是list。

同时，assign还可以进行多种操作，比如：

df4 = df3.assign(ln_A = lambda x: np.log(x['a']))
print df4

#           a         b         c         d      ln_A
# 0  1.764052  0.400157  0.978738  2.269755  0.567614
# 1  2.240893  1.867558 -0.977278 -1.454366  0.806875
# 2  0.950088 -0.151357 -0.103219  0.045759 -0.051200
# 4  0.761038  0.121675  0.443863 -0.187184 -0.273072
# 7  0.653619  0.864436 -0.742165  1.532779 -0.425231

【跟着stackoverflow学Pandas】 - Adding new column to existing DataFrame in Python pandas - Pandas 添加列的更多相关文章

【跟着stackoverflow学Pandas】 -Get list from pandas DataFrame column headers - Pandas 获取列名
最近做一个系列博客,跟着stackoverflow学Pandas. 以 pandas作为关键词,在stackoverflow中进行搜索,随后安照 votes 数目进行排序: https://stack ...
【跟着stackoverflow学Pandas】Select rows from a DataFrame based on values in a column -pandas 筛选
最近做一个系列博客,跟着stackoverflow学Pandas. 以 pandas作为关键词,在stackoverflow中进行搜索,随后安照 votes 数目进行排序: https://stack ...
【跟着stackoverflow学Pandas】Delete column from pandas DataFrame-删除列
最近做一个系列博客,跟着stackoverflow学Pandas. 以 pandas作为关键词,在stackoverflow中进行搜索,随后安照 votes 数目进行排序: https://stack ...
【跟着stackoverflow学Pandas】add one row in a pandas.DataFrame -DataFrame添加行
最近做一个系列博客,跟着stackoverflow学Pandas. 以 pandas作为关键词,在stackoverflow中进行搜索,随后安照 votes 数目进行排序: https://stack ...
【跟着stackoverflow学Pandas】How to iterate over rows in a DataFrame in Pandas-DataFrame按行迭代
最近做一个系列博客,跟着stackoverflow学Pandas. 以 pandas作为关键词,在stackoverflow中进行搜索,随后安照 votes 数目进行排序: https://stack ...
【跟着stackoverflow学Pandas】“Large data” work flows using pandas-pandas大数据处理流程
最近做一个系列博客,跟着stackoverflow学Pandas. 以 pandas作为关键词,在stackoverflow中进行搜索,随后安照 votes 数目进行排序: https://stack ...
【跟着stackoverflow学Pandas】Renaming columns in pandas-列的重命名
最近做一个系列博客,跟着stackoverflow学Pandas. 以 pandas作为关键词,在stackoverflow中进行搜索,随后安照 votes 数目进行排序: https://stack ...
学机器学习，不会数据处理怎么行？—— 二、Pandas详解
在上篇文章学机器学习,不会数据处理怎么行?—— 一.NumPy详解中,介绍了NumPy的一些基本内容,以及使用方法,在这篇文章中,将接着介绍另一模块——Pandas.(本文所用代码在这里) Panda ...
跟着百度学PHP[14]-PDO之Mysql的事务处理2
前面所将仅仅是在纯mysql下的讲解,这节就是要将其搬到PDO台面上来了. 将自动提交关闭. SetAttribute下有一个PDO::ATTR_AUTOCOMMIT 将其设置为0即可关闭,如:$pd ...

随机推荐

list_01
双向链表不支持随机存取([?] / at(?)) A.头尾添加/移除 A.1.list::push_back(elemValue); A.2.list::pop_back(); A.3.list: ...
[ios]iOS8 定位
参考:http://www.2cto.com/kf/201410/342392.html http://blog.csdn.net/yongyinmg/article/details/39521523 ...
IntelliJ IDE 开发Java GUI 入门
j主要对java 的GUI相关知识进行简单的介绍和总结,整个博客按照创建一个java GUI的顺序进行介绍,期间穿插讲解用到的java Swing的布局.控件等相关知识.本博客所进行的讲解及工程的创建 ...
Thunder团队项目视频展示
视频链接:http://v.youku.com/v_show/id_XMzA5MjMzMzcyMA==.html?spm=a2h3j.8428770.3416059.1 视频简介:通过一个小情景开篇, ...
《剑指offer》第三十七题（序列化二叉树）
// 面试题37:序列化二叉树 // 题目:请实现两个函数,分别用来序列化和反序列化二叉树. #include "BinaryTree.h" #include <iostre ...
《剑指offer》第四题（二维数组中的查找）
// 二维数组中的查找 // 题目:在一个二维数组中,每一行都按照从左到右递增的顺序排序,每一列都按 // 照从上到下递增的顺序排序.请完成一个函数,输入这样的一个二维数组和一个 // 整数,判断数组 ...
R-CNN（Rich feature hierarchies for accurate object detection and semantic segmentation）论文理解
论文地址:https://arxiv.org/pdf/1311.2524.pdf 翻译请移步: https://www.cnblogs.com/xiaotongtt/p/6691103.html ht ...
RabbitMQ入门_06_深入了解ack
A. Delivery Tag 参考资料:https://www.rabbitmq.com/confirms.html 仔细查看一下 Consumer 的回调方法: public void handl ...
WCF配置后支持通过URL进行http方式调用
最近遇到一个小型项目,主要就是通过手机写入NFC信息,思考许久后决定就写一个简单的CS程序来搞定这个问题,可是当涉及到手机和PC通信的时候首先考虑到的就是IIS,同时因为数据库是SQLite,思前想后 ...
C#读写记事本（txt）文件
C#写入记事本(txt)文件方法一: FileStream stream = new FileStream(@"d:\aa.txt",FileMode.Create);//file ...

【跟着stackoverflow学Pandas】 - Adding new column to existing DataFrame in Python pandas - Pandas 添加列

Adding new column to existing DataFrame in Python pandas - Pandas 添加列

直接赋值

assign 赋值

【跟着stackoverflow学Pandas】 - Adding new column to existing DataFrame in Python pandas - Pandas 添加列的更多相关文章

随机推荐

热门专题