The bytes/str dichotomy in Python 3 [transport]
reference and transporting from: http://eli.thegreenplace.net/2012/01/30/the-bytesstr-dichotomy-in-python-3/
Arguably the most significant new feature of Python 3 is a much cleaner separation between text and binary data. Text is always Unicode and is represented by the str type, and binary data is represented by the bytes type. What makes the separation particularly clean is that str and bytes can't be mixed in Python 3 in any implicit way. You can't concatenate them, look for one inside another, and generally pass one to a function that expects the other. This is a good thing.
However, boundaries between strings and bytes are inevitable, and this is where the following diagram is always important to keep in mind:
Strings can be encoded to bytes, and bytes can be decoded back to strings.
>>> '€20'.encode('utf-8')
b'\xe2\x82\xac20'
>>> b'\xe2\x82\xac20'.decode('utf-8')
'€20'
Think of it this way: a string is an abstract representation of text. A string consists of characters, which are also abstract entities not tied to any particular binary representation. When manipulating strings, we're living in blissful ignorance. We can split and slice them, concatenate and search inside them. We don't care how they are represented internally and how many bytes it takes to hold each character in them. We only start caring about this when encoding strings into bytes (for example, in order to send them over a communication channel), or decoding strings from bytes (for the other direction).
The argument given to encode and decode is the encoding (or codec). The encoding is a way to represent abstract characters in binary data. There are many possible encodings. UTF-8, shown above, is one. Here's another:
>>> '€20'.encode('iso-8859-15')
b'\xa420'
>>> b'\xa420'.decode('iso-8859-15')
'€20'
>>> '你好啊,傻傻分不清'.encode('utf-8')
b'\xe4\xbd\xa0\xe5\xa5\xbd\xe5\x95\x8a\xef\xbc\x8c\xe5\x82\xbb\xe5\x82\xbb\xe5\x88\x86\xe4\xb8\x8d\xe6\xb8\x85'
>>> b'\xe4\xbd\xa0\xe5\xa5\xbd\xe5\x95\x8a\xef\xbc\x8c\xe5\x82\xbb\xe5\x82\xbb\xe5\x88\x86\xe4\xb8\x8d\xe6\xb8\x85'.decode('utf-8')
'你好啊,傻傻分不清'
The encoding is a crucial part of this translation process. Without the encoding, the bytes object b'\xa420' is just a bunch of bits. The encoding gives it meaning. Using a different encoding, this bunch of bits can have a different meaning:
>>> b'\xa420'.decode('windows-1255')
'₪20'
That's 80% of the money lost due to using the wrong encoding, so be careful.
The bytes/str dichotomy in Python 3 [transport]的更多相关文章
- The bytes/str dichotomy in Python 3
The bytes/str dichotomy in Python 3 - Eli Bendersky's website https://eli.thegreenplace.net/2012/01/ ...
- 小白的Python之路 day1 Python3的bytes/str之别
原文:The bytes/str dichotomy in Python 3 Python 3最重要的新特性大概要算是对文本和二进制数据作了更为清晰的区分.文本总是Unicode,由str类型表示,二 ...
- python2 与python3中最大的区别(编码问题bytes&str
1,在python2.x 中是不区分bytes和str类型的,在python3中bytes和str中是区分开的,str的所有操作bytes都支持 python2 中 >>> s = ...
- Python3的bytes/str之别
Python 3最重要的新特性大概要算是对文本和二进制数据作了更为清晰的区分.文本总是Unicode,由str类型表示,二进制数据则由bytes类型表示.Python 3不会以任意隐式的方式混用str ...
- python bytes/str
http://eli.thegreenplace.net/2012/01/30/the-bytesstr-dichotomy-in-python-3/
- repr. str, ascii in Python
repr和str a="Hello" print(str(a)) print(repr(a)) 结果: Hello 'Hello' 可以看出,repr的结果中多了左右两个引号. r ...
- python——TypeError: 'str' does not support the buffer interface
import socket import sys port=51423 host="localhost" data=b"x"*10485760 #在字符串前加 ...
- Python学习之路-Day2-Python基础2
Python学习之路第二天 学习内容: 1.模块初识 2.pyc是什么 3.python数据类型 4.数据运算 5.bytes/str之别 6.列表 7.元组 8.字典 9.字符串常用操作 1.模块初 ...
- Centos7 环境下 Python2.7 换成 Python3.7 运行 scrapy 应用所遇到的问题记录
参考网友的安装过程 Linux系统Centos安装Python3.7 设置Python默认为Python3.7 mv /usr/bin/python /usr/bin/python.bak ln -s ...
随机推荐
- 【转】Pro Android学习笔记(十二):了解Intent(下)
解析Intent,寻找匹配Activity 如果给出component名字(包名.类名)是explicit intent,否则是implicit intent.对于explicit intent,关键 ...
- ES6学习之Class
一.定义类(ES6的类,完全可以看做是构造函数的另一种写法) class Greet { constructor(x, y) { this.x = x; this.y = y; } sayHello( ...
- 关于surf显示立体图,可视化分析数据
如果想判断一个点(x,y)对应的ZV值是否在平面上方.平面上.平面下方,只要将(x,y)带入方程,得到z. 如果ZV大于>Z,则在平面上方:如果ZV<Z,则在方面下方:若ZV=Z,则在平面 ...
- linux日常管理-防火墙netfilter工具-iptables-1
防火墙的名字叫 netfilter 工具/命令叫iptables 命令:iptables 选项: -t 指定表 -A 在最上面增加一条规则 -I 在最下面增加一条规则 -D 删除一条规则 -A-I ...
- C# 处理Json
下面是JSON对象转换为字符串 public static string ToJson(object obj) { try { JavaScriptSerializer serializer = ne ...
- netstat查看网络信息
Netstat 命令用于显示各种网络相关信息,如网络连接,路由表,接口状态 (Interface Statistics),masquerade 连接,多播成员 (Multicast Membershi ...
- Java之反射(部分文档摘过来方便以后查看)
第1章 类加载器 1.1 类的加载 当程序要使用某个类时,如果该类还未被加载到内存中,则系统会通过加载,连接,初始化三步来实现对这个类进行初始化. l 加载 就是指将class文件读入内存,并为之创建 ...
- 9、perldoc文档阅读器
转载:http://www.cnblogs.com/nkwy2012/p/6016320.html 一般来说,将文档的名称作为参数传递给perldoc命令,即可查阅该文档.比如下面,给定文档名称per ...
- Java基础之cmd入门操作笔记
前提:jdk已安装且环境变量配置成功,参考上文jdk 安装及环境变量配置 入门操作步骤: 1.打开记事本或者notepad,编写Abc代码,具体如下: public class Abc{ pub ...
- Good Bye 2014 B. New Year Permutation(floyd )
题目链接 题意:给n个数,要求这n个数字小的尽量放到前面,求一个最小的. 给一个矩阵s[i][j]==1,表示位置 i 的数字可以和 位置 j 的数字交换. 分析: 刚开始用的是3个循环,每次都找一个 ...