The bytes/str dichotomy in Python 3 [transport]
reference and transporting from: http://eli.thegreenplace.net/2012/01/30/the-bytesstr-dichotomy-in-python-3/
Arguably the most significant new feature of Python 3 is a much cleaner separation between text and binary data. Text is always Unicode and is represented by the str type, and binary data is represented by the bytes type. What makes the separation particularly clean is that str and bytes can't be mixed in Python 3 in any implicit way. You can't concatenate them, look for one inside another, and generally pass one to a function that expects the other. This is a good thing.
However, boundaries between strings and bytes are inevitable, and this is where the following diagram is always important to keep in mind:

Strings can be encoded to bytes, and bytes can be decoded back to strings.
>>> '€20'.encode('utf-8')
b'\xe2\x82\xac20'
>>> b'\xe2\x82\xac20'.decode('utf-8')
'€20'
Think of it this way: a string is an abstract representation of text. A string consists of characters, which are also abstract entities not tied to any particular binary representation. When manipulating strings, we're living in blissful ignorance. We can split and slice them, concatenate and search inside them. We don't care how they are represented internally and how many bytes it takes to hold each character in them. We only start caring about this when encoding strings into bytes (for example, in order to send them over a communication channel), or decoding strings from bytes (for the other direction).
The argument given to encode and decode is the encoding (or codec). The encoding is a way to represent abstract characters in binary data. There are many possible encodings. UTF-8, shown above, is one. Here's another:
>>> '€20'.encode('iso-8859-15')
b'\xa420'
>>> b'\xa420'.decode('iso-8859-15')
'€20'
>>> '你好啊,傻傻分不清'.encode('utf-8')
b'\xe4\xbd\xa0\xe5\xa5\xbd\xe5\x95\x8a\xef\xbc\x8c\xe5\x82\xbb\xe5\x82\xbb\xe5\x88\x86\xe4\xb8\x8d\xe6\xb8\x85'
>>> b'\xe4\xbd\xa0\xe5\xa5\xbd\xe5\x95\x8a\xef\xbc\x8c\xe5\x82\xbb\xe5\x82\xbb\xe5\x88\x86\xe4\xb8\x8d\xe6\xb8\x85'.decode('utf-8')
'你好啊,傻傻分不清'
The encoding is a crucial part of this translation process. Without the encoding, the bytes object b'\xa420' is just a bunch of bits. The encoding gives it meaning. Using a different encoding, this bunch of bits can have a different meaning:
>>> b'\xa420'.decode('windows-1255')
'₪20'
That's 80% of the money lost due to using the wrong encoding, so be careful.
The bytes/str dichotomy in Python 3 [transport]的更多相关文章
- The bytes/str dichotomy in Python 3
The bytes/str dichotomy in Python 3 - Eli Bendersky's website https://eli.thegreenplace.net/2012/01/ ...
- 小白的Python之路 day1 Python3的bytes/str之别
原文:The bytes/str dichotomy in Python 3 Python 3最重要的新特性大概要算是对文本和二进制数据作了更为清晰的区分.文本总是Unicode,由str类型表示,二 ...
- python2 与python3中最大的区别(编码问题bytes&str
1,在python2.x 中是不区分bytes和str类型的,在python3中bytes和str中是区分开的,str的所有操作bytes都支持 python2 中 >>> s = ...
- Python3的bytes/str之别
Python 3最重要的新特性大概要算是对文本和二进制数据作了更为清晰的区分.文本总是Unicode,由str类型表示,二进制数据则由bytes类型表示.Python 3不会以任意隐式的方式混用str ...
- python bytes/str
http://eli.thegreenplace.net/2012/01/30/the-bytesstr-dichotomy-in-python-3/
- repr. str, ascii in Python
repr和str a="Hello" print(str(a)) print(repr(a)) 结果: Hello 'Hello' 可以看出,repr的结果中多了左右两个引号. r ...
- python——TypeError: 'str' does not support the buffer interface
import socket import sys port=51423 host="localhost" data=b"x"*10485760 #在字符串前加 ...
- Python学习之路-Day2-Python基础2
Python学习之路第二天 学习内容: 1.模块初识 2.pyc是什么 3.python数据类型 4.数据运算 5.bytes/str之别 6.列表 7.元组 8.字典 9.字符串常用操作 1.模块初 ...
- Centos7 环境下 Python2.7 换成 Python3.7 运行 scrapy 应用所遇到的问题记录
参考网友的安装过程 Linux系统Centos安装Python3.7 设置Python默认为Python3.7 mv /usr/bin/python /usr/bin/python.bak ln -s ...
随机推荐
- C++STL库中vector容器常用应用
#include<iostream> #include<vector> #include<algorithm> using namespace std; int m ...
- JavaScript-Tool:jquery.cxselect.js
ylbtech-JavaScript-Tool:jquery.cxselect.js 1.返回顶部 1.jquery.cxselect.js /*! * jQuery cxSelect * @name ...
- asp.net mvc Partial OutputCache 在SpaceBuilder中的应用实践
最近给SpaceBuilder增加OutputCache 时发现了一些问题,贴在这做个备忘,也方便遇到类似问题的朋友查阅. 目前SpaceBuilder表现层使用是asp.net mvc v1.0,使 ...
- JavaScript继承与聚合
一,继承 第一种方式:类与被继承类直接耦合度高 1,首先,准备一个可以被继承的类(父类),例如 //创建一个人员类 function Person(name) {//现在Person里面的域是由Per ...
- 在Oracle中设置主键自增
转自:https://www.2cto.com/database/201705/636725.html 数据库设置主键自增">oracle数据库设置主键自增: --创建表 create ...
- LAMP 1.7Apache用户认证
假如我们要在www.aaa.com/的 abc/目录下放一些文件,只想让自己访问,做一个用户认证.输入正确的用户和密码才能访问 cd /data/www mkdir abc cd abc cp /et ...
- J2EE 学习路线
分享一个比较好的学习网站 http://edu.51cto.com/roadmap/view/id-86.html ================================J2EE=== ...
- [matlab]Monte Carlo模拟学习笔记
理论基础:大数定理,当频数足够多时,频率可以逼近概率,从而依靠概率与$\pi$的关系,求出$\pi$ 所以,rand在Monte Carlo中是必不可少的,必须保证测试数据的随机性. 用蒙特卡洛方法进 ...
- C++中队列的建立和操作
什么是队列结构 队列结构是从数据运算来分类的,也就是说队列结构具有特殊的运算规则.而从数据的逻辑结构来看,队列结构其实就是一种线性结构.如果从数据的存储结构来进一步划分,队列结构可以分成两类. 顺序队 ...
- Equals 和 == 的区别--转
在比较Equals 和 ==的区别前.我们先来了解下相关的知识 C#数据类型 1.值类型 值类型有: 值类型包括:简单类型.结构类型.枚举类型. byte(1).sbyte(1).short(2).u ...