The bytes/str dichotomy in Python 3 [transport]
reference and transporting from: http://eli.thegreenplace.net/2012/01/30/the-bytesstr-dichotomy-in-python-3/
Arguably the most significant new feature of Python 3 is a much cleaner separation between text and binary data. Text is always Unicode and is represented by the str type, and binary data is represented by the bytes type. What makes the separation particularly clean is that str and bytes can't be mixed in Python 3 in any implicit way. You can't concatenate them, look for one inside another, and generally pass one to a function that expects the other. This is a good thing.
However, boundaries between strings and bytes are inevitable, and this is where the following diagram is always important to keep in mind:

Strings can be encoded to bytes, and bytes can be decoded back to strings.
>>> '€20'.encode('utf-8')
b'\xe2\x82\xac20'
>>> b'\xe2\x82\xac20'.decode('utf-8')
'€20'
Think of it this way: a string is an abstract representation of text. A string consists of characters, which are also abstract entities not tied to any particular binary representation. When manipulating strings, we're living in blissful ignorance. We can split and slice them, concatenate and search inside them. We don't care how they are represented internally and how many bytes it takes to hold each character in them. We only start caring about this when encoding strings into bytes (for example, in order to send them over a communication channel), or decoding strings from bytes (for the other direction).
The argument given to encode and decode is the encoding (or codec). The encoding is a way to represent abstract characters in binary data. There are many possible encodings. UTF-8, shown above, is one. Here's another:
>>> '€20'.encode('iso-8859-15')
b'\xa420'
>>> b'\xa420'.decode('iso-8859-15')
'€20'
>>> '你好啊,傻傻分不清'.encode('utf-8')
b'\xe4\xbd\xa0\xe5\xa5\xbd\xe5\x95\x8a\xef\xbc\x8c\xe5\x82\xbb\xe5\x82\xbb\xe5\x88\x86\xe4\xb8\x8d\xe6\xb8\x85'
>>> b'\xe4\xbd\xa0\xe5\xa5\xbd\xe5\x95\x8a\xef\xbc\x8c\xe5\x82\xbb\xe5\x82\xbb\xe5\x88\x86\xe4\xb8\x8d\xe6\xb8\x85'.decode('utf-8')
'你好啊,傻傻分不清'
The encoding is a crucial part of this translation process. Without the encoding, the bytes object b'\xa420' is just a bunch of bits. The encoding gives it meaning. Using a different encoding, this bunch of bits can have a different meaning:
>>> b'\xa420'.decode('windows-1255')
'₪20'
That's 80% of the money lost due to using the wrong encoding, so be careful.
The bytes/str dichotomy in Python 3 [transport]的更多相关文章
- The bytes/str dichotomy in Python 3
The bytes/str dichotomy in Python 3 - Eli Bendersky's website https://eli.thegreenplace.net/2012/01/ ...
- 小白的Python之路 day1 Python3的bytes/str之别
原文:The bytes/str dichotomy in Python 3 Python 3最重要的新特性大概要算是对文本和二进制数据作了更为清晰的区分.文本总是Unicode,由str类型表示,二 ...
- python2 与python3中最大的区别(编码问题bytes&str
1,在python2.x 中是不区分bytes和str类型的,在python3中bytes和str中是区分开的,str的所有操作bytes都支持 python2 中 >>> s = ...
- Python3的bytes/str之别
Python 3最重要的新特性大概要算是对文本和二进制数据作了更为清晰的区分.文本总是Unicode,由str类型表示,二进制数据则由bytes类型表示.Python 3不会以任意隐式的方式混用str ...
- python bytes/str
http://eli.thegreenplace.net/2012/01/30/the-bytesstr-dichotomy-in-python-3/
- repr. str, ascii in Python
repr和str a="Hello" print(str(a)) print(repr(a)) 结果: Hello 'Hello' 可以看出,repr的结果中多了左右两个引号. r ...
- python——TypeError: 'str' does not support the buffer interface
import socket import sys port=51423 host="localhost" data=b"x"*10485760 #在字符串前加 ...
- Python学习之路-Day2-Python基础2
Python学习之路第二天 学习内容: 1.模块初识 2.pyc是什么 3.python数据类型 4.数据运算 5.bytes/str之别 6.列表 7.元组 8.字典 9.字符串常用操作 1.模块初 ...
- Centos7 环境下 Python2.7 换成 Python3.7 运行 scrapy 应用所遇到的问题记录
参考网友的安装过程 Linux系统Centos安装Python3.7 设置Python默认为Python3.7 mv /usr/bin/python /usr/bin/python.bak ln -s ...
随机推荐
- 【转】 Pro Android学习笔记(六三):Preferences(7):代码控制首选项
[-] 代码实现preference 利用preference保存状态 DialogPreference 代码实现preference View可以不通过xml进行设置,有代码直接进行设置,首选项pr ...
- art-template-loader:template
ylbtech-art-template-loader: 1.返回顶部 2.返回顶部 3.返回顶部 4.返回顶部 5.返回顶部 6.返回顶部 作者:ylbtech出处:ht ...
- go语言执行windows下命令行的方法
转自:http://www.jb51.net/article/61727.htm 在golang里执行windows下的命令行,例如在golang里面调用 del d:\a.txt 命令 packag ...
- js点滴知识(1) -- 获取DOM对象和编码
在今天的工作中发现了一些小的问题,在网上查了一下,才知道自己的js才是冰山一角,以后要虚心向他人学习,要虚怀若谷. 发现一:js获取DOM对象与jquery的区别 先前总以为,二者是一样的,最近才知道 ...
- web安全之同源策略
为什么使用同源策略?一个重要原因就是对cookie的保护,cookie 中存着sessionID .如果已经登录网站,同时又去了任意其他网站,该网站有恶意JS代码.如果没有同源策略,那么这个网站就能通 ...
- 菜鸟攻城狮3(Holle World)
1.创建一个HolleWorld.java文本文件 2.代码:public class HolleWorld { public static void main(String[] args) { Sy ...
- ansible案例-安装nginx
一.创建目录: mkidr -p playbook/{files,templates} 二.自定义index.html文件 $ vim playbook/templates/index.html. ...
- 笔试题: 数据库 已看1 一些关键的sql语句练习 和选择题 有用 sql语句练习 挺好
一. 选择题 1.SQL语言是( C )语言. A.层次数据库 B.网络数据库 C.关系数据库 D.非数据库 redis 是 3.如果在where子句中有两个条件要同时满足,应该用哪个 ...
- Apple Swift中文入门教程【转发】
1 简介 今天凌晨Apple刚刚发布了Swift编程语言,本文从其发布的书籍<The Swift Programming Language>中摘录和提取而成.希望对各位的iOS& ...
- Dijkstra 路径规划 C#
示例无向图如下:(起始点为v0) 邻接矩阵为: 注意:其中没有连接的边和自己到自己的点权值用10000表示. 代码: static void Main(string[] args) { , , , , ...