The bytes/str dichotomy in Python 3 [transport]
reference and transporting from: http://eli.thegreenplace.net/2012/01/30/the-bytesstr-dichotomy-in-python-3/
Arguably the most significant new feature of Python 3 is a much cleaner separation between text and binary data. Text is always Unicode and is represented by the str type, and binary data is represented by the bytes type. What makes the separation particularly clean is that str and bytes can't be mixed in Python 3 in any implicit way. You can't concatenate them, look for one inside another, and generally pass one to a function that expects the other. This is a good thing.
However, boundaries between strings and bytes are inevitable, and this is where the following diagram is always important to keep in mind:

Strings can be encoded to bytes, and bytes can be decoded back to strings.
>>> '€20'.encode('utf-8')
b'\xe2\x82\xac20'
>>> b'\xe2\x82\xac20'.decode('utf-8')
'€20'
Think of it this way: a string is an abstract representation of text. A string consists of characters, which are also abstract entities not tied to any particular binary representation. When manipulating strings, we're living in blissful ignorance. We can split and slice them, concatenate and search inside them. We don't care how they are represented internally and how many bytes it takes to hold each character in them. We only start caring about this when encoding strings into bytes (for example, in order to send them over a communication channel), or decoding strings from bytes (for the other direction).
The argument given to encode and decode is the encoding (or codec). The encoding is a way to represent abstract characters in binary data. There are many possible encodings. UTF-8, shown above, is one. Here's another:
>>> '€20'.encode('iso-8859-15')
b'\xa420'
>>> b'\xa420'.decode('iso-8859-15')
'€20'
>>> '你好啊,傻傻分不清'.encode('utf-8')
b'\xe4\xbd\xa0\xe5\xa5\xbd\xe5\x95\x8a\xef\xbc\x8c\xe5\x82\xbb\xe5\x82\xbb\xe5\x88\x86\xe4\xb8\x8d\xe6\xb8\x85'
>>> b'\xe4\xbd\xa0\xe5\xa5\xbd\xe5\x95\x8a\xef\xbc\x8c\xe5\x82\xbb\xe5\x82\xbb\xe5\x88\x86\xe4\xb8\x8d\xe6\xb8\x85'.decode('utf-8')
'你好啊,傻傻分不清'
The encoding is a crucial part of this translation process. Without the encoding, the bytes object b'\xa420' is just a bunch of bits. The encoding gives it meaning. Using a different encoding, this bunch of bits can have a different meaning:
>>> b'\xa420'.decode('windows-1255')
'₪20'
That's 80% of the money lost due to using the wrong encoding, so be careful.
The bytes/str dichotomy in Python 3 [transport]的更多相关文章
- The bytes/str dichotomy in Python 3
The bytes/str dichotomy in Python 3 - Eli Bendersky's website https://eli.thegreenplace.net/2012/01/ ...
- 小白的Python之路 day1 Python3的bytes/str之别
原文:The bytes/str dichotomy in Python 3 Python 3最重要的新特性大概要算是对文本和二进制数据作了更为清晰的区分.文本总是Unicode,由str类型表示,二 ...
- python2 与python3中最大的区别(编码问题bytes&str
1,在python2.x 中是不区分bytes和str类型的,在python3中bytes和str中是区分开的,str的所有操作bytes都支持 python2 中 >>> s = ...
- Python3的bytes/str之别
Python 3最重要的新特性大概要算是对文本和二进制数据作了更为清晰的区分.文本总是Unicode,由str类型表示,二进制数据则由bytes类型表示.Python 3不会以任意隐式的方式混用str ...
- python bytes/str
http://eli.thegreenplace.net/2012/01/30/the-bytesstr-dichotomy-in-python-3/
- repr. str, ascii in Python
repr和str a="Hello" print(str(a)) print(repr(a)) 结果: Hello 'Hello' 可以看出,repr的结果中多了左右两个引号. r ...
- python——TypeError: 'str' does not support the buffer interface
import socket import sys port=51423 host="localhost" data=b"x"*10485760 #在字符串前加 ...
- Python学习之路-Day2-Python基础2
Python学习之路第二天 学习内容: 1.模块初识 2.pyc是什么 3.python数据类型 4.数据运算 5.bytes/str之别 6.列表 7.元组 8.字典 9.字符串常用操作 1.模块初 ...
- Centos7 环境下 Python2.7 换成 Python3.7 运行 scrapy 应用所遇到的问题记录
参考网友的安装过程 Linux系统Centos安装Python3.7 设置Python默认为Python3.7 mv /usr/bin/python /usr/bin/python.bak ln -s ...
随机推荐
- JVM体系结构之二:类加载器之2:JVM 自定义的类加载器的实现和使用
一.回顾一下jdk自带的类加载器: 1.java虚拟机自带的加载器 根类加载器(Bootstrap,c++实现) 扩展类加载器(Extension,java实现) 应用类加载器 ...
- C#设计模式(10)——组合模式
一.概念 组合模式有时候又叫做部分-整体模式,它使我们树型结构的问题中,模糊了简单元素和复杂元素的概念,客户程序可以向处理简单元素一样来处理复杂元素,从而使得客户程序与复杂元素的内部结构解耦. 二.组 ...
- ubuntu14.04装完系统更新后桌面挂了
一开始是只显示个鼠标什么都没有,ctrl-alt-1切到控制台下,把lightdm重启下再进去,多了两个桌面图标,但是顶栏和侧栏都没有,也就是根本没法运行其它程序. 但是幸好桌面右键菜单里有一个“在控 ...
- Linux 静态库(.a)转换为动态库(.so)
Linux 静态库转换为动态库 参考 http://blog.csdn.net/moxuansheng/article/details/5812410 首先将.a文件转为.so文件是可以实现的 原因是 ...
- java之格式化输出
参考http://how2j.cn/k/number-string/number-string-foramt/320.html#nowhere 格式化输出 如果不使用格式化输出,就需要进行字符串连接, ...
- IDEA拷贝git上的最新项目资源
File->new ->project version control->git-> 进入项目git对应的网址,选择第一个backstop,复制url: 输入git用户名和密码 ...
- .Net开发中的@ 和 using 使用技巧
一.@符号的妙用 1.可以作为保留关键字的标识符 C#规范当中,不允许使用保留关键字(class.bool等)当作普通的标识符来命名,这时候@符号作用就体现 出来了,可以通过@符号前缀把这些保留关键字 ...
- php小块代码
//页面本身网址 "http://".$_SERVER["HTTP_HOST"].preg_replace("/[^\/]+$/",&quo ...
- 浅谈REST API
浅谈REST API 说明: 本文部分内容根据其它网络文章编写,如有版权问题请及时通知. 背景 发迹于互联网的REST,在国内国外混得可谓是风生水起,如今又进入电信行业的视野,连TMF都将其作为战略项 ...
- FormsAuthentication.RedirectFromLoginPage 登录
首先讲的这个东西是针对后台数据访问的比如我的后台是admin文件夹.那么我除非登陆成功才可以访问里面的东西.那么除了Session对象判断.和Cookies来判断还能用到FormsAuthentica ...