Packing data with Python
Defining how a sequence of bytes sits in a memory buffer or on disk can be challenging from time to time. Since everything that you’ll work with is a byte, it makes sense that we have an intuitive way to work with this information agnostic of the overlying type restrictions that the language will enforce on us.
In today’s post, I’m going to run through Python’s byte string packing and unpacking using the struct package.
Basics
From the Python documentation:
This module performs conversions between Python values and C structs represented as Python bytes objects. This can be used in handling binary data stored in files or from network connections, among other sources. It uses Format Strings as compact descriptions of the layout of the C structs and the intended conversion to/from Python values.
When working with a byte string in Python, you prefix your literals with b.
>>> b'Hello'
'Hello'
The ord function call is used to convert a text character into its character code representation.
>>> ord(b'H')
72
>>> ord(b'e')
101
>>> ord(b'l')
108
We can use list to convert a whole string of byte literals into an array.
>>> list(b'Hello')
[72, 101, 108, 108, 111]
The compliment to the ord call is chr, which converts the byte-value back into a character.
Packing
Using the struct module, we’re offered the pack function call. This function takes in a format of data and then the data itself. The first parameter defines how the data supplied in the second parameter should be laid out. We get started:
>>> import struct
If we pack the string 'Hello' as single bytes:
>>> list(b'Hello')
[72, 101, 108, 108, 111]
>>> struct.pack(b'BBBBB', 72, 101, 108, 108, 111)
b'Hello'
The format string b'BBBBB' tells pack to pack the values supplied into a string of 5 unsigned values. If we were to use a lower case b in our format string, pack would expect the byte value to be signed.
>>> struct.pack(b'bbbbb', 72, 101, 108, 108, 111)
b'Hello'
This only gets interesting once we send a value that would make the request overflow:
>>> struct.pack(b'bbbbb', 72, 101, 108, 129, 111)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
struct.error: byte format requires -128 <= number <= 127
The following tables have been re-produced from the Python documentation.
Byte order, size and alignment
| Character | Byte order | Size | Alignment |
|---|---|---|---|
@ |
native | native | native |
= |
native | standard | none |
< |
little-endian | standard | none |
> |
big-endian | standard | none |
! |
network (= big-endian) | standard | none |
Types
| Format | C Type | Python type | Standard size | Notes |
|---|---|---|---|---|
x |
pad byte | no value | ||
c |
char | bytes of length 1 | 1 | |
b |
signed char | integer | 1 | (1),(3) |
B |
unsigned char | integer | 1 | (3) |
? |
_Bool | bool | 1 | (1) |
h |
short | integer | 2 | (3) |
H |
unsigned short | integer | 2 | (3) |
i |
int | integer | 4 | (3) |
I |
unsigned int | integer | 4 | (3) |
l |
long | integer | 4 | (3) |
L |
unsigned long | integer | 4 | (3) |
q |
long long | integer | 8 | (2), (3) |
Q |
unsigned long long | integer | 8 | (2), (3) |
n |
ssize_t | integer | (4) | |
N |
size_t | integer | (4) | |
f |
float | float | 4 | (5) |
d |
double | float | 8 | (5) |
s |
char[] | bytes | ||
p |
char[] | bytes | ||
P |
void * | integer | (6) |
Unpacking
The direct reverse process of packing bytes into an array, is unpacking them again into usable variables inside of your python code.
>>> struct.unpack(b'BBBBB', struct.pack(b'BBBBB', 72, 101, 108, 108, 111))
(72, 101, 108, 108, 111)
>>> struct.unpack(b'5s', struct.pack(b'BBBBB', 72, 101, 108, 108, 111))
(b'Hello',)
Packing data with Python的更多相关文章
- 使用Python对Twitter进行数据挖掘(Mining Twitter Data with Python)
目录 1.Collecting data 1.1 Register Your App 1.2 Accessing the Data 1.3 Streaming 2.Text Pre-processin ...
- python data analysis | python数据预处理(基于scikit-learn模块)
原文:http://www.jianshu.com/p/94516a58314d Dataset transformations| 数据转换 Combining estimators|组合学习器 Fe ...
- Mining Twitter Data with Python
目录 1.Collecting data 1.1 Register Your App 1.2 Accessing the Data 1.3 Streaming 2.Text Pre-processin ...
- Working with Binary Data in Python
http://www.devdungeon.com/content/working-binary-data-python
- 7 Tools for Data Visualization in R, Python, and Julia
7 Tools for Data Visualization in R, Python, and Julia Last week, some examples of creating visualiz ...
- 一句Python,一句R︱pandas模块——高级版data.frame
先学了R,最近刚刚上手python,所以想着将python和R结合起来互相对比来更好理解python.最好就是一句python,对应写一句R. pandas可谓如雷贯耳,数据处理神器. 以下符号: = ...
- Python - 2. Built-in Collection Data Types
From: http://interactivepython.org/courselib/static/pythonds/Introduction/GettingStartedwithData.htm ...
- A Complete Tutorial to Learn Data Science with Python from Scratch
A Complete Tutorial to Learn Data Science with Python from Scratch Introduction It happened few year ...
- python接口测试(post,get)-传参(data和json之间的区别)
python接口测试如何正确传参: POST 传data:data是python字典格式:传参data=json.dumps(data)是字符串类型传参 #!/usr/bin/env python3 ...
随机推荐
- CUDA Cudnn pytorch 安装及错误 RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED解决
看我结论,大家试试看最后装pytorch看行不行,不行就去冲了PyTorch /Doge ubuntu 20.04 下安装CUDA,参考这个博主写的,先看显卡支持的最高CUDA版本,之后找一个较新 ...
- RepVGG
RepVGG: Making VGG-style ConvNets Great Again 作者:elfin 资料来源:RepVGG论文解析 目录 1.摘要 2.背景介绍 3.相关工作 3.1 单 ...
- Django使用数据库(配置数据库,基本的增删改查a)
第一步在setting文件中配置DATABASES设置 然后更改__init__文件 打开APP中models文件,导入并创建数据库 最后打开终端执行以下命令 python manage.py mak ...
- DAOS 分布式异步对象存储|架构设计
分布式异步对象存储 (DAOS) 是一个开源的对象存储系统,专为大规模分布式非易失性内存 (NVM, Non-Volatile Memory) 设计,利用了SCM(Storage-Class Memo ...
- 第26 章 : 理解 CNI 和 CNI 插件
理解 CNI 和 CNI 插件 本文将主要分享以下几方面的内容: CNI 是什么? Kubernetes 中如何使用 CNI? 哪个 CNI 插件适合我? 如何开发自己的 CNI 插件? CNI 是什 ...
- Shell十三问更新总结版 -- 什么叫做 Shell?-- Shell十三问<第一问>
Shell十三问更新总结版 简介 ChinaUnix 论坛 Shell 版名为網中人的前辈于 2004 年发布的精华贴,最近回炉这块内容,觉得很多东西讲的实在透彻,非常感谢前辈網中人,同时我个人也对这 ...
- B1029/A1048 旧键盘损坏了,在输入一段文字时坏了的键不可以正常使用,现给出应输入的一段文字,和实际输出的文字,找出坏掉的键。
#include<cstdio> #include<cstring> const int maxn = 1000; bool HashTable[maxn] = { false ...
- Java后端进阶-网络编程(Netty线程模型)
前言 我们在使用Netty进行服务端开发的时候,一般来说会定义两个NioEventLoopGroup线程池,一个"bossGroup"线程池去负责处理客户端连接,一个"w ...
- Hadoop完整搭建过程(二):伪分布模式
1 伪分布模式 伪分布模式是运行在单个节点以及多个Java进程上的模式.相比起本地模式,需要进行更多配置文件的设置以及ssh.YARN相关设置. 2 Hadoop配置文件 修改Hadoop安装目录下的 ...
- GO-04-变量
GO变量 Go 语言的变量名由字母.数字.下画线组成,首个字符不能为数字: Go 语法规定,定义的局部变量若没有被调用会发生编译错误. 变量的声明 var 变量名 变量类型 批量声明变量 var ( ...