The bytes/str dichotomy in Python 3 [transport]
reference and transporting from: http://eli.thegreenplace.net/2012/01/30/the-bytesstr-dichotomy-in-python-3/
Arguably the most significant new feature of Python 3 is a much cleaner separation between text and binary data. Text is always Unicode and is represented by the str type, and binary data is represented by the bytes type. What makes the separation particularly clean is that str and bytes can't be mixed in Python 3 in any implicit way. You can't concatenate them, look for one inside another, and generally pass one to a function that expects the other. This is a good thing.
However, boundaries between strings and bytes are inevitable, and this is where the following diagram is always important to keep in mind:

Strings can be encoded to bytes, and bytes can be decoded back to strings.
>>> '€20'.encode('utf-8')
b'\xe2\x82\xac20'
>>> b'\xe2\x82\xac20'.decode('utf-8')
'€20'
Think of it this way: a string is an abstract representation of text. A string consists of characters, which are also abstract entities not tied to any particular binary representation. When manipulating strings, we're living in blissful ignorance. We can split and slice them, concatenate and search inside them. We don't care how they are represented internally and how many bytes it takes to hold each character in them. We only start caring about this when encoding strings into bytes (for example, in order to send them over a communication channel), or decoding strings from bytes (for the other direction).
The argument given to encode and decode is the encoding (or codec). The encoding is a way to represent abstract characters in binary data. There are many possible encodings. UTF-8, shown above, is one. Here's another:
>>> '€20'.encode('iso-8859-15')
b'\xa420'
>>> b'\xa420'.decode('iso-8859-15')
'€20'
>>> '你好啊,傻傻分不清'.encode('utf-8')
b'\xe4\xbd\xa0\xe5\xa5\xbd\xe5\x95\x8a\xef\xbc\x8c\xe5\x82\xbb\xe5\x82\xbb\xe5\x88\x86\xe4\xb8\x8d\xe6\xb8\x85'
>>> b'\xe4\xbd\xa0\xe5\xa5\xbd\xe5\x95\x8a\xef\xbc\x8c\xe5\x82\xbb\xe5\x82\xbb\xe5\x88\x86\xe4\xb8\x8d\xe6\xb8\x85'.decode('utf-8')
'你好啊,傻傻分不清'
The encoding is a crucial part of this translation process. Without the encoding, the bytes object b'\xa420' is just a bunch of bits. The encoding gives it meaning. Using a different encoding, this bunch of bits can have a different meaning:
>>> b'\xa420'.decode('windows-1255')
'₪20'
That's 80% of the money lost due to using the wrong encoding, so be careful.
The bytes/str dichotomy in Python 3 [transport]的更多相关文章
- The bytes/str dichotomy in Python 3
The bytes/str dichotomy in Python 3 - Eli Bendersky's website https://eli.thegreenplace.net/2012/01/ ...
- 小白的Python之路 day1 Python3的bytes/str之别
原文:The bytes/str dichotomy in Python 3 Python 3最重要的新特性大概要算是对文本和二进制数据作了更为清晰的区分.文本总是Unicode,由str类型表示,二 ...
- python2 与python3中最大的区别(编码问题bytes&str
1,在python2.x 中是不区分bytes和str类型的,在python3中bytes和str中是区分开的,str的所有操作bytes都支持 python2 中 >>> s = ...
- Python3的bytes/str之别
Python 3最重要的新特性大概要算是对文本和二进制数据作了更为清晰的区分.文本总是Unicode,由str类型表示,二进制数据则由bytes类型表示.Python 3不会以任意隐式的方式混用str ...
- python bytes/str
http://eli.thegreenplace.net/2012/01/30/the-bytesstr-dichotomy-in-python-3/
- repr. str, ascii in Python
repr和str a="Hello" print(str(a)) print(repr(a)) 结果: Hello 'Hello' 可以看出,repr的结果中多了左右两个引号. r ...
- python——TypeError: 'str' does not support the buffer interface
import socket import sys port=51423 host="localhost" data=b"x"*10485760 #在字符串前加 ...
- Python学习之路-Day2-Python基础2
Python学习之路第二天 学习内容: 1.模块初识 2.pyc是什么 3.python数据类型 4.数据运算 5.bytes/str之别 6.列表 7.元组 8.字典 9.字符串常用操作 1.模块初 ...
- Centos7 环境下 Python2.7 换成 Python3.7 运行 scrapy 应用所遇到的问题记录
参考网友的安装过程 Linux系统Centos安装Python3.7 设置Python默认为Python3.7 mv /usr/bin/python /usr/bin/python.bak ln -s ...
随机推荐
- webpy+nginx+uwsgi安装配置
转:(1)安装Nginx1.1 下载nginx-1.0.5.tar.gz并解压1.2 ./configure (也可以增加--prefix= path指定安装路径)此时有可能会提示缺少pcre支持,如 ...
- java代码Math.sqrt
总结:这个判断小数的题目,当时全只2有一个人想出了结果.老师很开心.我很桑心~~~~ 我没想到要取膜,我只想到了除以等于0就够了.至于中间的“取膜”,我没凑齐来,还是不够灵活 package com. ...
- "LPWSTR" 类型的实参与"const.char *"类型形参不兼容
CString csPlus; CString csSummand; m_PlusNumber.GetWindowTextW(csPlus); m_Summand.GetWindowTextW(csS ...
- GIF助手激活教程(购买+激活)图文版
GIF助手首页==>设置(右上角) ==>输入激活码会弹出购买或者激活的对话框.(如果不明白操作,可以点击如何购买激活码先查看购买帮助指南再进行购买) 点击复制设备码并购买. 此时会进入到 ...
- .clearfix:after
<!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8&quo ...
- Windchill 基本业务对象
容器容器是Windchill对象存放的地方:在Windchill中主要的容器有站点.组织.产品.存储库.项目.在Windchill中所有容器对象的父类为wt.inf.container.WTConta ...
- 项目一:第七天 CRM 和bos系统实现定区关联客户,关联快递员. 通过CXF框架实现
定区关联客户 需求:为了快递方便客户下订单(发快递),派快递员上门取件. 所以说需要让定区关联客户(知道客户属于哪个定区),定区跟快递员关系:多对多.知道让哪个快递员上门取件. 将CRM系统中,客户 ...
- 9、linux的特殊符
在shell中常用的特殊符号罗列如下: # ; ;; . , / \ 'string'| ! $ ${} $? ...
- kafka学习之相关命令
1 分别启动zoo和kafka ./zkServer.sh start 然后需要使用./zkServer.sh status查看状态,会发现一个奇怪得问题,即使start启动的时候表示启动成功,但是s ...
- 20.Ecshop 2.x/3.x SQL注入/任意代码执行漏洞
Ecshop 2.x/3.x SQL注入/任意代码执行漏洞 影响版本: Ecshop 2.x Ecshop 3.x-3.6.0 漏洞分析: 该漏洞影响ECShop 2.x和3.x版本,是一个典型的“二 ...