Python使用Zero-Copy和Buffer Protocol实现高性能编程

无论你程序是做什么的，它经常都需要处理大量的数据。这些数据大部分表现形式为strings（字符串）。然而，当你对字符串大批量的拷贝，切片和修改操作时是相当低效的。为什么？

让我们假设一个读取二进制数据的大文件示例，然后将部分数据拷贝到另外一个文件。要展示该程序所使用的内存，我们使用memory_profiler，一个强大的Python包，让我们可以一行一行观察程序所使用的内存。

@profile

def read_random():

    with open("/dev/urandom", "rb") as source:

        content = source.read(1024 * 10000)

        content_to_write = content[1024:]

    print(f"content length: {len(content)}, content to write length {len(content_to_write)}")

    with open("/dev/null", "wb") as target:

        target.write(content_to_write)

if __name__ == "__main__":

    read_random()

使用memory_profiler模块来执行以上程序，输出如下：

$ python -m memory_profiler example.py

content length: 10240000, content to write length 10238976

Filename: example.py

Line #    Mem usage    Increment   Line Contents

================================================

     1   14.320 MiB   14.320 MiB   @profile

     2                             def read_random():

     3   14.320 MiB    0.000 MiB       with open("/dev/urandom", "rb") as source:

     4   24.117 MiB    9.797 MiB           content = source.read(1024 * 10000)

     5   33.914 MiB    9.797 MiB           content_to_write = content[1024:]

     6   33.914 MiB    0.000 MiB       print(f"content length: {len(content)}, content to write length {len(content_to_write)}")

     7   33.914 MiB    0.000 MiB       with open("/dev/null", "wb") as target:

     8   33.914 MiB    0.000 MiB           target.write(content_to_write)

我们通过source.read从/dev/unrandom加载了10 MB数据。Python需要大概需要分配10 MB内存来以字符串存储这个数据。之后的content[1024:]指令越过开头的一个单位的KB数据进行数据拷贝，也分配了大概10 MB。

这里有趣的是在哪里呢，也就是构建content_to_write时10 MB的程序内存增长。切片操作拷贝了除了开头的一个单位的KB其他所有的数据到一个新的字符串对象。

如果处理类似大量的字节数组对象操作那是简直就是灾难。如果你之前写过C语言，在使用memcpy()需要注意点是：在内存使用以及总体性能来说，复制内存很慢。

然而，作为C程序员的你，知道字符串其实就是由字符数组构成，你不非得通过拷贝也能只处理部分字符，通过使用基本的指针运算——只需要确保整个字符串是连续的内存区域。

在Python同样提供了buffer protocol实现。buffer protocol定义在PEP 3118，描述了使用C语言API实现各种类型的支持，例如字符串。

当一个对象实现了该协议，你就可以使用memoryview类构造一个memoryview对象引用原始内存对象。

>>> s = b"abcdefgh"

>>> view = memoryview(s)

>>> view[1]

98

>>> limited = view[1:3]

>>> limited

<memory at 0x7f6ff2df1108>

>>> bytes(view[1:3])

b'bc'

注意：98是字符b的ACSII码

在上面的例子中，在使用memoryview对象的切片操作，同样返回一个memoryview对象。意味着它并没有拷贝任何数据，而是通过引用部分数据实现的。

下面图示解释发生了什么：

因此，我们可以将之前的程序改造得更加高效。我们需要使用memoryview对象来引用数据，而不是开辟一个新的字符串。

@profile

def read_random():

    with open("/dev/urandom", "rb") as source:

        content = source.read(1024 * 10000)

        content_to_write = memoryview(content)[1024:]

    print(f"content length: {len(content)}, content to write length {len(content_to_write)}")

    with open("/dev/null", "wb") as target:

        target.write(content_to_write)

if __name__ == "__main__":

    read_random()

我们再一次使用memory profiler执行上面程序：

$ python -m memory_profiler example.py

content length: 10240000, content to write length 10238976

Filename: example.py

Line #    Mem usage    Increment   Line Contents

================================================

     1   14.219 MiB   14.219 MiB   @profile

     2                             def read_random():

     3   14.219 MiB    0.000 MiB       with open("/dev/urandom", "rb") as source:

     4   24.016 MiB    9.797 MiB           content = source.read(1024 * 10000)

     5   24.016 MiB    0.000 MiB           content_to_write = memoryview(content)[1024:]

     6   24.016 MiB    0.000 MiB       print(f"content length: {len(content)}, content to write length {len(content_to_write)}")

     7   24.016 MiB    0.000 MiB       with open("/dev/null", "wb") as target:

     8   24.016 MiB    0.000 MiB           target.write(content_to_write)

在该程序中，source.read仍然分配了10 MB内存来读取文件内容。然而，使用memoryview来引用部分内容时，并没有额外在分配内存。

相比之前的版本，这里节省了大概50%的内存开销。

该技巧，在处理sockets通信的时候极其有用。当通过socket发送数据时，所有的数据可能并没有在一次调用就发送。

import socket

s = socket.socket(…)

s.connect(…)

# Build a bytes object with more than 100 millions times the letter `a`

data = b"a" * (1024 * 100000)

while data:

    sent = s.send(data)

    # Remove the first `sent` bytes sent

    data = data[sent:] <2>

使用如下实现，程序一次次拷贝直到所有的数据发出。通过使用memoryview，可以实现zero-copy（零拷贝）方式来完成该工作，具有更高的性能：

import socket

s = socket.socket(…)

s.connect(…)

# Build a bytes object with more than 100 millions times the letter `a`

data = b"a" * (1024 * 100000)

mv = memoryview(data)

while mv:

    sent = s.send(mv)

    # Build a new memoryview object pointing to the data which remains to be sent

    mv = mv[sent:]

在这里就不会发生任何拷贝，也不会在给data分配了100 MB内存之后再分配多余的内存来进行多次发送了。

目前，我们通过使用memoryview对象实现高效数据写入，但在某些情况下读取也同样适用。在Python中大部分 I/O 操作已经实现了buffer protocol机制。在本例中，我们并不需要memoryview对象，我可以请求 I/O 函数写入我们预定义好的对象：

>>> ba = bytearray(8)

>>> ba

bytearray(b'\x00\x00\x00\x00\x00\x00\x00\x00')

>>> with open("/dev/urandom", "rb") as source:

...     source.readinto(ba)

...

8

>>> ba

bytearray(b'`m.z\x8d\x0fp\xa1')

通过该机制，我们可以很简单写入到预定义的buffer中（在C语言中，你可能需要多次调用malloc())。

适用memoryview，你甚至可以将数据放入到内存区域任意点：

>>> ba = bytearray(8)

>>> # Reference the _bytearray_ from offset 4 to its end

>>> ba_at_4 = memoryview(ba)[4:]

>>> with open("/dev/urandom", "rb") as source:

... # Write the content of /dev/urandom from offset 4 to the end of the

... # bytearray, effectively reading 4 bytes only

...     source.readinto(ba_at_4)

...

4

>>> ba

bytearray(b'\x00\x00\x00\x00\x0b\x19\xae\xb2')

buffer protocol是实现低内存开销的基础，具备很强的性能。虽然Python隐藏了所有的内存分配，开发者不需要关系内部是怎么样实现的。

可以再去了解一下array模块和struct模块是如何处理buffer protocol的，zero copy操作是相当高效的。

Python使用Zero-Copy和Buffer Protocol实现高性能编程的更多相关文章

python之模块copy,了解概念即可
# -*- coding: utf-8 -*- #python 27 #xiaodeng #python之模块copy,了解概念即可 import copy #浅拷贝 #copy拷贝一个对象,但是对象 ...
Python的深浅copy详解
Python的深浅copy详解目录 Python的深浅copy详解一.浅copy的原理 1.1 浅copy的定义 1.2 浅copy的方法二.深copy的原理 2.1 深copy的定义 2.2 ...
Python 字典(Dictionary) copy()方法
描述 Python 字典(Dictionary) copy() 函数返回一个字典的浅复制.高佣联盟 www.cgewang.com 语法 copy()方法语法: dict.copy() 参数 NA. ...
《Python高性能编程》——列表、元组、集合、字典特性及创建过程
这里的内容仅仅是本人阅读<Python高性能编程>后总结的一些知识,用于自己更好的了解Python机制.本人现在并不从事计算密集型工作:人工智能.数据分析等.仅仅只是出于好奇而去阅读这本书 ...
Python猫荐书系列之五：Python高性能编程
稍微关心编程语言的使用趋势的人都知道,最近几年,国内最火的两种语言非 Python 与 Go 莫属,于是,隔三差五就会有人问:这两种语言谁更厉害/好找工作/高工资…… 对于编程语言的争论,就是猿界的生 ...
进阶《Python高性能编程》中文PDF+英文PDF+源代码
入门使用高性能 Python,建议参考<Python高性能编程>,例子给的很多,讲到高性能就会提到性能监控,里面有cpu mem 方法的度量,网络讲了一点异步,net profiler 没 ...
Python中利用原始套接字进行网络编程的示例
Python中利用原始套接字进行网络编程的示例在实验中需要自己构造单独的HTTP数据报文,而使用SOCK_STREAM进行发送数据包,需要进行完整的TCP交互. 因此想使用原始套接字进行编程,直接构 ...
简介Python设计模式中的代理模式与模板方法模式编程
简介Python设计模式中的代理模式与模板方法模式编程这篇文章主要介绍了Python设计模式中的代理模式与模板方法模式编程,文中举了两个简单的代码片段来说明,需要的朋友可以参考下代理模式 Prox ...
python高性能编程方法一
python高性能编程方法一阅读 Zen of Python,在Python解析器中输入 import this. 一个犀利的Python新手可能会注意到"解析"一词, 认为 ...

随机推荐

QListWidget拖放
setDragEnabled() 允许拖 setAcceptDrops()允许放 setDragDropMode(QAbstractItemView.DragDrop)设置拖拽模式 setSelect ...
POJ1112 Team Them Up!
Team them up! Input file teams.in Output file teams.out Your task is to divide a number of persons i ...
UML Design Via Visual Studio-Class Diagram
用过几个建模设计工具,小的有staruml,大的有rational rose,EA.最后发现还是Visual Studio建模比较舒服(个人观点,不要争论). 打算对自己经常用的几个建模图做一个介绍, ...
洛谷【P1303】A*B Problem
题目传送门:https://www.luogu.org/problemnew/show/P1303 高精度乘法板子题,灵性地回忆一下小学时期列竖式的草稿纸即可. 时间复杂度:\(O(len^2)\) ...
bzoj 1257 余数之和 —— 数论分块
题目:https://www.lydsy.com/JudgeOnline/problem.php?id=1257 \( \sum\limits_{i=1}^{n}k\%i = \sum\limits_ ...
oop的方式来操纵时间
减少return 减少传参. 主要是在调用上比以前强大很多,以前很怕操作时间,在一堆函数中传来传去.这个调用爽. class DatetimeConverter: DATETIME_FORMATTER ...
sql server 表索引碎片处理
DBCC SHOWCONTIG (Transact-SQL) SQL Server 2005 其他版本更新日期: 2007 年 9 月 15 日显示指定的表或视图的数据和索引的碎片信息. 重要提示 ...
atoi函数实现
#include int my_atoi(const char *str) { int result; char sign; for (; str && isspace(*str); ...
AJAX经常遇到的那些问题
本文主要介绍了AJAX工作原理以及在面试题经常会遇到的问题,目录如下: 什么是Ajax 为什么要使用Ajax? Ajax特点? AJAX优缺点? Ajax流程? XMLhttprequest对象 AJ ...
树莓派 Learning 002 装机后的必要操作 --- 04 添加软件源之添加公钥 --- 解决“由于没有公钥，无法验证下列签名”问题
树莓派装机后的必要操作 - 添加软件源解决添加公钥时会遇到的问题当你添加完Debian的软件源后,在终端中执行sudo apt-get update时,会出现下面的错误:(这里我添加了3个软 ...

Python使用Zero-Copy和Buffer Protocol实现高性能编程

Python使用Zero-Copy和Buffer Protocol实现高性能编程的更多相关文章

随机推荐

热门专题