python——pandas技巧（处理dataframe每个元素，不用for，而用apply）

用apply处理pandas比用for循环，快了无数倍，测试如下：

我们有一个pandas加载的dataframe如下，features是0和1特征的组合，可惜都是str形式（字符串形式），我们要将其转换成一个装有整型int 0和1的list

（1）用for循坏（耗时约3小时）

1 from tqdm import tqdm #计时器函数

2 for i in tqdm(range(df.shape[0])):

3     df['features'][i] = df['features'][i].split(",")   #每一行形如0，0，1，1，0，1，1的string，所以按照逗号切割，返回一个list

4     for j in range(len(df['features'][i])):            #遍历该list，对于每个元素进行int转换

5         df['features'][i][j] = int(df['features'][i][j])

6

7 print(type(df['features'][0]))

（2）推荐用apply方法（耗时约30秒）

 1 from time import time

 2 from tqdm import tqdm

 3

 4 def func(x):

 5     l = x.split(",")

 6     for i in range(len(l)):

 7         l[i] = int(l[i])

 8     return l

 9

10 stime = time()

11 df['new_features'] = df['features'].apply(func)

12 endtime = time()

13

14 print("time:"+str(endtime-stime)+"s")

15 #df.head()

16 print("over")

python——pandas技巧（处理dataframe每个元素，不用for，而用apply）的更多相关文章

python数据分析之pandas库的DataFrame应用二
本节介绍Series和DataFrame中的数据的基本手段重新索引 pandas对象的一个重要方法就是reindex,作用是创建一个适应新索引的新对象 ''' Created on 2016-8-1 ...
[转]python中pandas库中DataFrame对行和列的操作使用方法
转自:http://blog.csdn.net/u011089523/article/details/60341016 用pandas中的DataFrame时选取行或列: import numpy a ...
python. pandas(series,dataframe,index) method test
python. pandas(series,dataframe,index,reindex,csv file read and write) method test import pandas as ...
oracle数据据 Python+Pandas 获取Oracle数据库并加入DataFrame
import pandas as pd import sys import imp imp.reload(sys) from sqlalchemy import create_engine impor ...
【跟着stackoverflow学Pandas】 - Adding new column to existing DataFrame in Python pandas - Pandas 添加列
最近做一个系列博客,跟着stackoverflow学Pandas. 以 pandas作为关键词,在stackoverflow中进行搜索,随后安照 votes 数目进行排序: https://stack ...
python pandas.Series&&DataFrame&& set_index&reset_index
参考CookBook :http://pandas.pydata.org/pandas-docs/stable/cookbook.html Pandas set_index&reset_ind ...
python中pandas库中DataFrame对行和列的操作使用方法
用pandas中的DataFrame时选取行或列: import numpy as np import pandas as pd from pandas import Sereis, DataFram ...
python pandas ---Series,DataFrame 创建方法,操作运算操作(赋值,sort,get,del,pop,insert,+,-,*,/)
pandas 是基于 Numpy 构建的含有更高级数据结构和工具的数据分析包 pandas 也是围绕着 Series 和 DataFrame 两个核心数据结构展开的, 导入如下: from panda ...
Python Pandas -- DataFrame
pandas.DataFrame class pandas.DataFrame(data=None, index=None, columns=None, dtype=None, copy=False) ...

随机推荐

iptables-centos|mysql navicat登陆不上
iptables-centos: vi /etc/sysconfig/iptables service iptables restart ====================== mysql na ...
极客mysql16
1.MySQL会为每个线程分配一个内存(sort_buffer)用于排序该内存大小为sort_buffer_size 1>如果排序的数据量小于sort_buffer_size,排序将会在内存中完 ...
<连接器和加载器>——概述连接器和加载器
0.涉及术语 (1)地址绑定将抽象的符号与更抽象的符号绑定,如 sqrt 符号与地址 0x0020010绑定. (2)符号解析程序相互作用通过符号进行,如主程序调用库函数sqrt,连接器通过表明分 ...
linux定时任务（crontab和at）
查看定时任务:crontab -l [root@localhost test]# crontab -l no crontab for root 创建编辑定时任务:crontab -e [root@lo ...
Ceph的Mon数据重新构建工具
关于mon的数据的问题,一般正常情况下都是配置的3个mon的,但是还是有人会担心 Mon 万一三个同时都挂掉了怎么办,那么集群所有的数据是不是都丢了,关于后台真实数据恢复,有去后台取对象,然后一个个拼 ...
Ceph中的Copyset概念和使用方法
前言 copyset运用好能带来什么好处降低故障情况下的数据丢失概率(增加可用性) 降低资源占用,从而降低负载 copyset的概念首先我们要理解copyset的概念,用通俗的话说就是,包含一个数 ...
信息论-Turbo码学习
1.Turbo码: 信道编码的初期:分组码实现编码,缺点有二:只有当码字全部接收才可以开始译码,需要精确的帧同步时延大,增益损失多解决方案:卷积码:充分利用前一时刻和后一时刻的码组,延时小,缺点:计 ...
单线程的Redis有哪些慢动作？
持续原创输出,点击上方蓝字关注我目录前言为什么 Redis 这么火? 键和值的保存形式? 为什么哈希表操作变慢了? 集合的操作效率? 有哪些数据结构? 不同操作的复杂度? 总结前言现在一提到 ...
使用SpringBoot进行优雅的数据验证
JSR-303 规范在程序进行数据处理之前,对数据进行准确性校验是我们必须要考虑的事情.尽早发现数据错误,不仅可以防止错误向核心业务逻辑蔓延,而且这种错误非常明显,容易发现解决. JSR303 规范 ...
Vue—新版本router-view 与 keep-alive 的互动
1. <keep-alive> 直接嵌套到 <router-view> 上会失效,正确写法: <router-view #="{ Component }&quo ...

python——pandas技巧（处理dataframe每个元素，不用for，而用apply）

python——pandas技巧（处理dataframe每个元素，不用for，而用apply）的更多相关文章

随机推荐

热门专题