Python的功能模块[1] -> struct -> struct 在网络编程中的使用

struct模块 / struct Module

在网络编程中，利用 socket 进行通信时，常常会用到 struct 模块，在网络通信中，大多数传递的数据以二进制流（binary data）存在。传递字符串时无需过多担心，但传递 int，char 之类的基本数据时，就需要一种机制将某些特定的结构体类型打包成二进制流的字符串，然后在进行网络传输，而接收端也可以通过某种机制进行解包还原出原始数据。struct 模块便提供了这种机制，该模块主要作用就是对 python 基本类型值与用 python 字符串格式表示的 C struc 类型间的转化（This module performs conversions between Python values and C structs represented as Python strings.）。

另一种使用场景在与 python 与其他语言之间数据交换时使用，例如 C++ 写的客户端发送了一个 int 型（4字节）的数据到 Python 的服务器，Python 接收到这个数据时，需要解析成 Python 认识的整数，此时就将用到 struct 模块。

函数 / Function

pack()函数

函数调用: bs = struct.pack(format, *argv)

函数功能: 用于将数据按照格式打包成 bytes 形式

传入参数: format, *argv

format: str 类型，为打包的格式

*argv: 需要打包的数据，多种不同数据由 ’,’ 隔开，与 format 对应

返回参数: bs

bs: byte 类型，为打包后的二进制数据

unpack()函数

函数调用: data = struct.unpack(format, bstr)

函数功能: 用于将数据按照格式从 bytes 形式中解码出来

传入参数: format, bstr

format: str 类型，为解码的格式

bstr: byte 类型，需要解码的数据，多种不同数据由 ’,’ 隔开，与 format 对应

返回参数: data

data: tuple 类型，为解码获得的数据组

calcsize()函数

函数调用: len = struct.calcsize(format)

函数功能: 用于计算以 format 格式形成的数据占用的字节数

传入参数: format

format: str 类型，为打包解包的格式，如 ‘H’，‘h’，‘I’ 等

返回参数: len

len: int 类型，格式化数据占用的字节数

pack_into()函数

函数调用: data = struct.pack_into(format, buffer, offset, *argv)

函数功能: 用于将数据依照 format 进行打包，以 offset 开头存入 buffer 中

传入参数: format, buffer, offset, *argv

format: str 类型，为打包解包的格式

buffer: instance 类型，可由 ctypes.create_string_buffer(size) 生成，buf.raw 查看

offset: int 类型，buffer 中的起始位数偏移

*argv: 需要处理的数据

返回参数: 无

pack_from()函数

函数调用: struct.pack_from(format, buffer, offset=0)

函数功能: 用于将数据以 offset 开头从 buffer 中取出，依照 format 进行格式化

传入参数: format

format: str 类型，为打包解包的格式

buffer: instance 类型，可由 ctypes.create_string_buffer(size) 生成，buf.raw 查看

offset: int 类型，buffer 中的起始位数偏移

返回参数: data

data: tuple 类型，解码出来的数据组

补充内容 / Complement

pack_into() 函数与 pack_from() 函数的使用

使用二进制打包数据的场景大部分都是对性能要求比较高的使用环境。而在上面提到的 pack 方法都是对输入数据进行操作后重新创建了一个内存空间用于返回，也就是说我们每次pack 都会在内存中分配出相应的内存资源，这有时是一种很大的性能浪费，struct 模块还提供了 pack_into() 和 unpack_from() 的方法用来解决这样的问题，也就是对一个已经提前分配好的 buffer 进行字节的填充，而不会每次都产生一个新对象对字节进行存储。对比使用 pack 方法打包，pack_into 方法一直是在对 buffer 对象进行操作，没有产生多余的内存浪费。另外需要注意的一点是，pack_into 和 unpack_from 方法均是对 string buffer 对象进行操作，并提供了 offset 参数，用户可以通过指定相应的 offset，使相应的处理变得更加灵活。例如，我们可以把多个对象 pack 到一个 buffer 里面，然后通过指定不同的 offset 进行 unpack。

struct类型参照表

Format	C Type	Python type	Standard size	Notes
x	pad byte	no value
c	char	string of length 1	1
b	signed char	integer	1	(3)
B	unsigned char	integer	1	(3)
?	_Bool	bool	1	(1)
h	short	integer	2	(3)
H	unsigned short	integer	2	(3)
i	int	integer	4	(3)
I	unsigned int	integer	4	(3)
l	long	integer	4	(3)
L	unsigned long	integer	4	(3)
q	long long	integer	8	(2), (3)
Q	Unsigned long long	integer	8	(2), (3)
F	float	float	4	(4)
d	double	float	8	(4)
s	char[]	string	1
p	char[]	string
P	void *	integer		(5), (3)

Notes:

# 以下内容由后文英文说明翻译而成，有待后续验证

1. 在 C99 中，’?’ 转换码代表 C 语言的 _bool 类型，如果不行，可以尝试 char 型模式，对应 1 个 byte;

2.6 版本新增

2. ’q’ 和 ’Q’ 转换码只有在 C 的平台编译器支持 C long long 类型，或 Windows 下才可在本地模式下使用，在标准模式下均可使用，无限制;

2.2 版本新增

3. 当尝试给 pack 一个 non-integer 非整数的类型到一个 integer 整数类型时，如果非整数类型有 __index__() 方法，则在 pack 之前会调用该方法，若没有 __index__() 方法，或者调用 __index__() 方法是出现 TypeError，则会尝试 __init__() 方法。然而，__init()__ 的使用并不被推荐，并且会产生 DeprecationWarning;

2.7 版本修改: 对 non-integer 使用 __index__() 方法为 2.7 版本新增

2.7 版本修改: 在 2.7 版本之前，并非所有的 integer 转换码会调用 __init__() 方法进行转换，且 DeprecationWarning 只会在 float 浮点型转换时产生

4. 对于 ’f’ 和 ’d’ 转换码，不论平台是什么，打包时均使用 IEEE754 二进制 32 格式(对于 ’f’ )和二进制 64 格式(对于 ’d’)

5. P 格式仅对本地字节命令可用(默认的类型或带有 ’@’ 的二进制命令字符)。二进制命令字符 ’=’ 根据主机系统的不同，使用了小端法/大端法排序，struct 不会将其解释为本地命令，为此 ’P’ 不可用。

以下为原版英文内容

1. The '?' conversion code corresponds to the _Bool type defined by C99. If this type is not available, it is simulated using a char. In standard mode, it is always represented by one byte.

New in version 2.6.

2. The 'q' and 'Q' conversion codes are available in native mode only if the platform C compiler supports C long long, or, on Windows, __int64. They are always available in standard modes.

New in version 2.2.

3. When attempting to pack a non-integer using any of the integer conversion codes, if the non-integer has a __index__() method then that method is called to convert the argument to an integer before packing. If no __index__() method exists, or the call to __index__() raises TypeError, then the __int__() method is tried. However, the use of __int__() is deprecated, and will raise DeprecationWarning.

Changed in version 2.7: Use of the __index__() method for non-integers is new in 2.7.

Changed in version 2.7: Prior to version 2.7, not all integer conversion codes would use the __int__() method to convert, and DeprecationWarning was raised only for float arguments.

4. For the 'f' and 'd' conversion codes, the packed representation uses the IEEE 754 binary32 (for 'f') or binary64 (for 'd') format, regardless of the floating-point format used by the platform.

5. he 'P' format character is only available for the native byte ordering (selected as the default or with the '@' byte order character). The byte order character '=' chooses to use little- or big-endian ordering based on the host system. The struct module does not interpret this as native ordering, so the 'P' format is not available.

A format character may be preceded by an integral repeat count. For example, the format string '4h' means exactly the same as 'hhhh'.

Whitespace characters between formats are ignored; a count and its format must not contain whitespace though.

For the 's' format character, the count is interpreted as the size of the string, not a repeat count like for the other format characters; for example, '10s' means a single 10-byte string, while '10c' means 10 characters. For packing, the string is truncated or padded with null bytes as appropriate to make it fit. For unpacking, the resulting string always has exactly the specified number of bytes. As a special case, '0s' means a single, empty string (while '0c' means 0 characters).

The 'p' format character encodes a “Pascal string”, meaning a short variable-length string stored in a fixed number of bytes, given by the count. The first byte stored is the length of the string, or 255, whichever is smaller. The bytes of the string follow. If the string passed in to pack() is too long (longer than the count minus 1), only the leading count-1 bytes of the string are stored. If the string is shorter than count-1, it is padded with null bytes so that exactly count bytes in all are used. Note that for unpack(), the 'p' format character consumes count bytes, but that the string returned can never contain more than 255 characters.

For the 'P' format character, the return value is a Python integer or long integer, depending on the size needed to hold a pointer when it has been cast to an integer type. A NULL pointer will always be returned as the Python integer 0. When packing pointer-sized values, Python integer or long integer objects may be used. For example, the Alpha and Merced processors use 64-bit pointer values, meaning a Python long integer will be used to hold the pointer; other platforms use 32-bit pointers and will use a Python integer.

For the '?' format character, the return value is either True or False. When packing, the truth value of the argument object is used. Either 0 or 1 in the native or standard bool representation will be packed, and any non-zero value will be True when unpacking.

struct模块基本函数应用

 import struct

 from ctypes import create_string_buffer

 a = 20

 b = 400

 # python data --> bytes

 def pack():

     s = struct.pack('ii', a, b)

     x = struct.pack('!ii', a, b)

     print('length:', len(s))

     print('pack without "!"', s)

     print(repr(s))

     print('pack with "!"', x)

     return s

 # bytes --> python data

 def unpack():

     s = struct.unpack('ii', struct.pack('ii', a, b))

     x = struct.unpack('!ii', struct.pack('ii', a, b))

     print('length:', len(s))

     print('pack without "!"', s)

     print(repr(s))

     print('pack with "!"', x)

     return s

 # calculate the size of corrsponding format

 def cal():

     print("len: ", struct.calcsize('i'))       # len:  4

     print("len: ", struct.calcsize('ii'))      # len:  8

     print("len: ", struct.calcsize('f'))       # len:  4

     print("len: ", struct.calcsize('ff'))      # len:  8

     print("len: ", struct.calcsize('s'))       # len:  1

     print("len: ", struct.calcsize('ss'))      # len:  2

     print("len: ", struct.calcsize('d'))       # len:  8

     print("len: ", struct.calcsize('dd'))      # len:  16

 def _into():

     buf = create_string_buffer(12)

     # '\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'

     print(repr(buf.raw))

     print(struct.pack_into("iii", buf, 0, 1, 2, -1))

 def _from():

     buf = create_string_buffer(12)

     # '\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'

     print(repr(buf.raw))

     print(struct.unpack_from("iii", buf, 0)) 

 pack()

 unpack()

 cal()

 _into()

 _from()

Note: 当 format 为 ’ii’ 时，需要 2 个参数，当为 ’2i’ 时，只需要 1 个。

参考链接

http://blog.csdn.net/ithomer/article/details/5974029

http://www.cnblogs.com/coser/archive/2011/12/17/2291160.html