python编码转换

Pyton内部的字符串一般都是unicode编码或字节字符串编码；
代码中字符串的默认编码与代码文件本身的编码是一致的；
编码转换通常需要以unicode编码作为中间编码进行转换，即先将其他编码的字符串解码（decode)成unicode字符串，再从unicode编码（encode)成需要的编码；

编码和解码的方式要一致；

不同运行环境的默认编码也可能不一样；dos下默认是：ascii(gbk)
dos环境下:

1.获取系统默认编码：

>>> import sys
>>> sys.getdefaultencoding()
'ascii'
>>>

字节字符串：
>>> s="abc"
>>> type(s)
<type 'str'>

unicode字符串：
>>> s=u"中文"
>>> type(s)
<type 'unicode'>

2.英文字符串编码转换：英文字符串可以decode或encode（除unicode外）任何需要的编码

>>> s="abc" #英文可以decode和encode（除unicode外）任何需要的编码
>>> s.decode()
u'abc'
>>> s.decode("gbk")
u'abc'
>>> s.decode("ascii")
u'abc'
>>> s.decode("utf-8")
u'abc'
>>> s.decode("gb2312")
u'abc'
>>> s.decode("unicode")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
LookupError: unknown encoding: unicode

>>> s="abc" #英文可以decode和encode（除unicode外）任何需要的编码
>>> s.encode()
'abc'
>>> s.encode("gbk")
'abc'
>>> s.encode("ascii")
'abc'
>>> s.encode("utf-8")
'abc'
>>> s.encode("gb2312")
'abc'
>>> s.encode("unicode")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
LookupError: unknown encoding: unicode
>>>

>>> s=u"abc" #英文可以decode和encode（除unicode外）任何需要的编码
>>> s.decode()
u'abc'
>>> s.decode("gbk")
u'abc'
>>> s.decode("ascii")
u'abc'
>>> s.decode("utf-8")
u'abc'
>>> s.decode("gb2312")
u'abc'
>>> s.decode("unicode")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
LookupError: unknown encoding: unicode

>>> s=u"abc" #英文可以decode和encode（除unicode外）任何需要的编码
>>> s.encode()
'abc'
>>> s.encode("gbk")
'abc'
>>> s.encode("ascii")
'abc'
>>> s.encode("utf-8")
'abc'
>>> s.encode("gb2312")
'abc'
>>> s.encode("unicode")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
LookupError: unknown encoding: unicode
>>>

3.中文编解码：

（1）dos环境下默认编码是gbk，所以只能decode(gbk/gb2312)

（2）unicode编码的中文只能encode，不能decode；

>>> s="中文"  #dos的默认编码是gbk，所以此例只能decode(gbk/gb2312)
>>> s.decode()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd6 in position 0: ordinal
not in range(128)
>>> s.decode("gbk")
u'\u4e2d\u6587'
>>> s.decode("ascii")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd6 in position 0: ordinal
not in range(128)
>>> s.decode("utf-8")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "D:\Python27\lib\encodings\utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xd6 in position 0: invalid c
ontinuation byte
>>> s.decode("gb2312")
u'\u4e2d\u6587'
>>> s.decode("unicode")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
LookupError: unknown encoding: unicode
>>>

>>> s="中文"    #dos的默认编码是gbk，所以此例只能先decode(gbk/gb2312)，再encode成需要的编码
>>> s.encode()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd6 in position 0: ordinal
not in range(128)
>>> s.encode("gbk")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd6 in position 0: ordinal
not in range(128)
>>> s.encode("ascii")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd6 in position 0: ordinal
not in range(128)
>>> s.encode("utf-8")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd6 in position 0: ordinal
not in range(128)
>>> s.encode("gb2312")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd6 in position 0: ordinal
not in range(128)
>>> s.encode("unicode")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
LookupError: unknown encoding: unicode
>>>

>>> s=u"中文"  #unicode编码的中文只能encode，不能再decode
>>> s.decode()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordin
al not in range(128)
>>> s.decode("gbk")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordin
al not in range(128)
>>> s.decode("ascii")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordin
al not in range(128)
>>> s.decode("utf-8")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "D:\Python27\lib\encodings\utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordin
al not in range(128)
>>> s.decode("gb2312")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordin
al not in range(128)
>>> s.decode("unicode")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
LookupError: unknown encoding: unicode
>>>

>>> s=u"中文"    #unicode编码的中文只能encode，不能再decode
>>> s.encode()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordin
al not in range(128)
>>> s.encode("gbk")
'\xd6\xd0\xce\xc4'
>>> s.encode("ascii")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordin
al not in range(128)
>>> s.encode("utf-8")
'\xe4\xb8\xad\xe6\x96\x87'
>>> s.encode("gb2312")
'\xd6\xd0\xce\xc4'
>>> s.encode("unicode")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
LookupError: unknown encoding: unicode
>>>

python编码转换的更多相关文章

Python 编码转换与中文处理
python 中的 unicode是让人很困惑.比较难以理解的问题. 这篇文章写的比较好,utf-8是 unicode的一种实现方式,unicode.gbk.gb2312是编码字符集. py文件中的 ...
Python开发【第三章】：Python编码转换
一.字符编码与转码 1.bytes和str 之前有学过关于bytes和str之间的转换,详细资料->bytes和str(第四字符串) 2.为什么要进行编码和转码由于每个国家电脑的字符编码格式不 ...
python 编码转换(转)
主要介绍了python的编码机制,unicode, utf-8, utf-16, GBK, GB2312,ISO-8859-1 等编码之间的转换. 常见的编码转换分为以下几种情况: 自动识别字符串编 ...
python 编码转换专题
主要介绍了python的编码机制,unicode, utf-8, utf-16, GBK, GB2312,ISO-8859-1 等编码之间的转换. 常见的编码转换分为以下几种情况: 自动识别字符串编 ...
Python之路3【知识点】白话Python编码和文件操作
Python文件头部模板先说个小知识点:如何在创建文件的时候自动添加文件的头部信息! 通过:file--settings 每次都通过file--setings打开设置页面太麻烦了!可以通过:View ...
关于Python编码问题小记
Python编码问题小记: 引子: 最近在复习redis,当我在获取redis的key的时候,redis 存储英文和汉字下面这个样子的,我知道汉字是用16进制的UTF-8编码了,然后突然很想搞清楚字符 ...
python编码总结
关于ASCII码和Unicode码的来源计算机只能处理数字,如果要处理文本,需要先将文本转换成数字.早期计算机采用8bit作为一个字节(byte).所以一个字节最大为255(二进制11111111= ...
Python 编码机制
python 编码转换 Python的编码机制,unicode, utf-8, utf-16, GBK, GB2312,ISO-8859-1 等编码之间的转换. 常见的编码转换分为以下几种情况: 自动 ...
python 字符编码转换
#!/bin/env python#-*- encoding=utf8 -*-# 文件头指定utf8编码还是乱码时,使用下面方式指定# fix encoding problem import sys ...

随机推荐

vscode c++ cmake template project
VSCode configure C++ dev environment claim use CMake to build the project. For debugging, VSCode's C ...
JSP中out.print()、out.println()以及out.write()的区别
out是JSP九大内置对象之一,是JspWriter的一个对象,JspWriter继承了java.io.Writer类. out.print()和out.write() print()和println ...
铺放骨牌 uva11270
题解: 插头dp裸题没什么好说的啊就是n个二进制位表示状态相比原先就是用2n个二进制位表示状态蓝书上后面几题插头dp都挺烦的啊... 代码:
flink的流处理特性
flink的流处理特性: 支持高吞吐.低延迟.高性能的流处理支持带有事件时间的窗口(Window)操作支持有状态计算的Exactly-once语义支持高度灵活的窗口(Window)操作,支持基于 ...
HTML自定义滚动条（仿网易邮箱滚动条）转载
它是使用CSS中的伪元素来实现的,主要由以下三个来完成: 1. -webkit-scrollbar:定义滚动条的样式,如长宽. 2. -webkit-scrollbar-thumb:定义滚动条上滑块的 ...
HDU3488 Tour KM
原文链接http://www.cnblogs.com/zhouzhendong/p/8284304.html 题目传送门 - HDU3488 题意概括给一个n的点m条边的有向图. 然后让你把这个图分 ...
LBS基站定位
LBS基站定位(Location Based Service,简称LBS)一般应用于手机用户,它是基于位置的服务,通过电信.移动运营商的无线电通讯网络(如GSM网.CDMA网)或外部定位方式(如GPS ...
day28 面向对象：反射，内置函数，类的内置方法
面向对象进阶博客地址链接: http://www.cnblogs.com/Eva-J/articles/7351812.html 复习昨日内容: # 包 # 开发规范 # # hashlib # 登录 ...
day15 函数的使用方法：递归函数
这里归纳的知识点主要就是: 浮点数,复数, int,整形:float,浮点数:complx,复数: # 实数: # 有理数有限小数 1.2 # 无限循环小数 1.23232323232323.... ...
Idea中lombok不生效原因
我们可以通过在maven中插入配置信息 <dependency> <groupId>org.projectlombok</groupId> <artifact ...

python编码转换

python编码转换的更多相关文章

随机推荐

热门专题