reference and transporting from: http://eli.thegreenplace.net/2012/01/30/the-bytesstr-dichotomy-in-python-3/

Arguably the most significant new feature of Python 3 is a much cleaner separation between text and binary data. Text is always Unicode and is represented by the str type, and binary data is represented by the bytes type. What makes the separation particularly clean is that str and bytes can't be mixed in Python 3 in any implicit way. You can't concatenate them, look for one inside another, and generally pass one to a function that expects the other. This is a good thing.

However, boundaries between strings and bytes are inevitable, and this is where the following diagram is always important to keep in mind:

Strings can be encoded to bytes, and bytes can be decoded back to strings.

>>> '€20'.encode('utf-8')
b'\xe2\x82\xac20'
>>> b'\xe2\x82\xac20'.decode('utf-8')
'€20'

Think of it this way: a string is an abstract representation of text. A string consists of characters, which are also abstract entities not tied to any particular binary representation. When manipulating strings, we're living in blissful ignorance. We can split and slice them, concatenate and search inside them. We don't care how they are represented internally and how many bytes it takes to hold each character in them. We only start caring about this when encoding strings into bytes (for example, in order to send them over a communication channel), or decoding strings from bytes (for the other direction).

The argument given to encode and decode is the encoding (or codec). The encoding is a way to represent abstract characters in binary data. There are many possible encodings. UTF-8, shown above, is one. Here's another:

>>> '€20'.encode('iso-8859-15')
b'\xa420'
>>> b'\xa420'.decode('iso-8859-15')
'€20'
>>> '你好啊,傻傻分不清'.encode('utf-8')
b'\xe4\xbd\xa0\xe5\xa5\xbd\xe5\x95\x8a\xef\xbc\x8c\xe5\x82\xbb\xe5\x82\xbb\xe5\x88\x86\xe4\xb8\x8d\xe6\xb8\x85'
>>> b'\xe4\xbd\xa0\xe5\xa5\xbd\xe5\x95\x8a\xef\xbc\x8c\xe5\x82\xbb\xe5\x82\xbb\xe5\x88\x86\xe4\xb8\x8d\xe6\xb8\x85'.decode('utf-8')
'你好啊,傻傻分不清'

The encoding is a crucial part of this translation process. Without the encoding, the bytes object b'\xa420' is just a bunch of bits. The encoding gives it meaning. Using a different encoding, this bunch of bits can have a different meaning:

>>> b'\xa420'.decode('windows-1255')
'₪20'

That's 80% of the money lost due to using the wrong encoding, so be careful.

The bytes/str dichotomy in Python 3 [transport]的更多相关文章

  1. The bytes/str dichotomy in Python 3

    The bytes/str dichotomy in Python 3 - Eli Bendersky's website https://eli.thegreenplace.net/2012/01/ ...

  2. 小白的Python之路 day1 Python3的bytes/str之别

    原文:The bytes/str dichotomy in Python 3 Python 3最重要的新特性大概要算是对文本和二进制数据作了更为清晰的区分.文本总是Unicode,由str类型表示,二 ...

  3. python2 与python3中最大的区别(编码问题bytes&str

    1,在python2.x 中是不区分bytes和str类型的,在python3中bytes和str中是区分开的,str的所有操作bytes都支持 python2 中 >>> s = ...

  4. Python3的bytes/str之别

    Python 3最重要的新特性大概要算是对文本和二进制数据作了更为清晰的区分.文本总是Unicode,由str类型表示,二进制数据则由bytes类型表示.Python 3不会以任意隐式的方式混用str ...

  5. python bytes/str

    http://eli.thegreenplace.net/2012/01/30/the-bytesstr-dichotomy-in-python-3/

  6. repr. str, ascii in Python

    repr和str a="Hello" print(str(a)) print(repr(a)) 结果: Hello 'Hello' 可以看出,repr的结果中多了左右两个引号. r ...

  7. python——TypeError: 'str' does not support the buffer interface

    import socket import sys port=51423 host="localhost" data=b"x"*10485760 #在字符串前加 ...

  8. Python学习之路-Day2-Python基础2

    Python学习之路第二天 学习内容: 1.模块初识 2.pyc是什么 3.python数据类型 4.数据运算 5.bytes/str之别 6.列表 7.元组 8.字典 9.字符串常用操作 1.模块初 ...

  9. Centos7 环境下 Python2.7 换成 Python3.7 运行 scrapy 应用所遇到的问题记录

    参考网友的安装过程 Linux系统Centos安装Python3.7 设置Python默认为Python3.7 mv /usr/bin/python /usr/bin/python.bak ln -s ...

随机推荐

  1. Poj 1125 Stockbroker Grapevine(Floyd算法求结点对的最短路径问题)

    一.Description Stockbrokers are known to overreact to rumours. You have been contracted to develop a ...

  2. SQL语法基础:DDL、DML

    一.DDL(Data Definition Language):数据定义语句 #常见的语句 1)CREATE TABLE/DATABASE:创建数据库 CREATE [TEMPORARY] TABLE ...

  3. 第三课 go语言基础语法

    http://www.runoob.com/go/go-basic-syntax.html 1 行分隔符 在 Go 程序中,一行代表一个语句结束.每个语句不需要像 C 家族中的其它语言一样以分号 ; ...

  4. ubuntu利用包管理器安装Node.JS

    步骤1:用curl获取源代码在我们用卷曲获取源代码之前,我们必须先升级操作系统,然后用卷发命令获取NodeSource添加到本地仓库. root@ubuntu-:~#apt-get update 安装 ...

  5. call apply bind 的区别

    1.call和apply都是对函数的直接调用,而bind方法返回的仍然是一个函数,因此后面还需要()来进行调用才可以 var xw={ name: "小王", gender: &q ...

  6. 8、泛型程序设计与c++标准模板库2、c++标准模板库中的容器

    顺序容器类以逻辑线性排列方式存储元素,在这些容器类型中的元素在逻辑上被认为是连续的存储空间中存储的.顺序容器可用于存储线性群体. 在关联容器类中,元素的存储和检索基于关键字和元素与其他元素之间的关系, ...

  7. 阶段4-独挡一面\项目-基于视频压缩的实时监控系统\Sprint1-基于Epoll架构的采集端程序框架设计\第2课-基于Epoll的采集端程序框架设计

    回顾之前的整个程序架构 把epoll机制应用到这个架构上去 下面主要去分析我们的系统中有没有需要等待的事件,先看看采集子系统 在采集子系统当中,摄像头有数据,摄像头采集到图像数据可以作为一个等待事件. ...

  8. chrome - Vimium 插件超级方便快捷键

    Vimium插件作用 安装后,可以用定义好的快捷键操作浏览器,好用到爆粗口 下载地址 https://chrome.google.com/webstore/detail/vimium/dbepggeo ...

  9. MYSQL MYSQLI PDO

    PHP的MySQL扩展(优缺点) 设计开发允许PHP应用与MySQL数据库交互的早期扩展.mysql扩展提供了一个面向过程 的接口: 并且是针对MySQL4.1.3或更早版本设计的.因此,这个扩展虽然 ...

  10. uva11491 奖品的价值(贪心)

    uva11491 奖品的价值(贪心) 给你一个n位的整数,请你删除其中的d个数字,使得整数尽可能大.1<=d<n<=1e5. 首先因为前面的数位更重要,所以从左往右将每一位数字加入栈 ...