python 编码方式大全 fr = open(filename_r,encoding='cp852')
7.8.3. Standard Encodings
Python comes with a number of codecs built-in, either implemented as C functions or with dictionaries as mapping tables. The following table lists the codecs by name, together with a few common aliases, and the languages for which the encoding is likely used. Neither the list of aliases nor the list of languages is meant to be exhaustive. Notice that spelling alternatives that only differ in case or use a hyphen instead of an underscore are also valid aliases; therefore, e.g. 'utf-8'
is a valid alias for the 'utf_8'
codec.
Many of the character sets support the same languages. They vary in individual characters (e.g. whether the EURO SIGN is supported or not), and in the assignment of characters to code positions. For the European languages in particular, the following variants typically exist:
- an ISO 8859 codeset
- a Microsoft Windows code page, which is typically derived from an 8859 codeset, but replaces control characters with additional graphic characters
- an IBM EBCDIC code page
- an IBM PC code page, which is ASCII compatible
Codec | Aliases | Languages |
---|---|---|
ascii | 646, us-ascii | English |
big5 | big5-tw, csbig5 | Traditional Chinese |
big5hkscs | big5-hkscs, hkscs | Traditional Chinese |
cp037 | IBM037, IBM039 | English |
cp424 | EBCDIC-CP-HE, IBM424 | Hebrew |
cp437 | 437, IBM437 | English |
cp500 | EBCDIC-CP-BE, EBCDIC-CP-CH, IBM500 | Western Europe |
cp720 | Arabic | |
cp737 | Greek | |
cp775 | IBM775 | Baltic languages |
cp850 | 850, IBM850 | Western Europe |
cp852 | 852, IBM852 | Central and Eastern Europe |
cp855 | 855, IBM855 | Bulgarian, Byelorussian, Macedonian, Russian, Serbian |
cp856 | Hebrew | |
cp857 | 857, IBM857 | Turkish |
cp858 | 858, IBM858 | Western Europe |
cp860 | 860, IBM860 | Portuguese |
cp861 | 861, CP-IS, IBM861 | Icelandic |
cp862 | 862, IBM862 | Hebrew |
cp863 | 863, IBM863 | Canadian |
cp864 | IBM864 | Arabic |
cp865 | 865, IBM865 | Danish, Norwegian |
cp866 | 866, IBM866 | Russian |
cp869 | 869, CP-GR, IBM869 | Greek |
cp874 | Thai | |
cp875 | Greek | |
cp932 | 932, ms932, mskanji, ms-kanji | Japanese |
cp949 | 949, ms949, uhc | Korean |
cp950 | 950, ms950 | Traditional Chinese |
cp1006 | Urdu | |
cp1026 | ibm1026 | Turkish |
cp1140 | ibm1140 | Western Europe |
cp1250 | windows-1250 | Central and Eastern Europe |
cp1251 | windows-1251 | Bulgarian, Byelorussian, Macedonian, Russian, Serbian |
cp1252 | windows-1252 | Western Europe |
cp1253 | windows-1253 | Greek |
cp1254 | windows-1254 | Turkish |
cp1255 | windows-1255 | Hebrew |
cp1256 | windows-1256 | Arabic |
cp1257 | windows-1257 | Baltic languages |
cp1258 | windows-1258 | Vietnamese |
euc_jp | eucjp, ujis, u-jis | Japanese |
euc_jis_2004 | jisx0213, eucjis2004 | Japanese |
euc_jisx0213 | eucjisx0213 | Japanese |
euc_kr | euckr, korean, ksc5601, ks_c-5601, ks_c-5601-1987, ksx1001, ks_x-1001 | Korean |
gb2312 | chinese, csiso58gb231280, euc- cn, euccn, eucgb2312-cn, gb2312-1980, gb2312-80, iso- ir-58 | Simplified Chinese |
gbk | 936, cp936, ms936 | Unified Chinese |
gb18030 | gb18030-2000 | Unified Chinese |
hz | hzgb, hz-gb, hz-gb-2312 | Simplified Chinese |
iso2022_jp | csiso2022jp, iso2022jp, iso-2022-jp | Japanese |
iso2022_jp_1 | iso2022jp-1, iso-2022-jp-1 | Japanese |
iso2022_jp_2 | iso2022jp-2, iso-2022-jp-2 | Japanese, Korean, Simplified Chinese, Western Europe, Greek |
iso2022_jp_2004 | iso2022jp-2004, iso-2022-jp-2004 | Japanese |
iso2022_jp_3 | iso2022jp-3, iso-2022-jp-3 | Japanese |
iso2022_jp_ext | iso2022jp-ext, iso-2022-jp-ext | Japanese |
iso2022_kr | csiso2022kr, iso2022kr, iso-2022-kr | Korean |
latin_1 | iso-8859-1, iso8859-1, 8859, cp819, latin, latin1, L1 | West Europe |
iso8859_2 | iso-8859-2, latin2, L2 | Central and Eastern Europe |
iso8859_3 | iso-8859-3, latin3, L3 | Esperanto, Maltese |
iso8859_4 | iso-8859-4, latin4, L4 | Baltic languages |
iso8859_5 | iso-8859-5, cyrillic | Bulgarian, Byelorussian, Macedonian, Russian, Serbian |
iso8859_6 | iso-8859-6, arabic | Arabic |
iso8859_7 | iso-8859-7, greek, greek8 | Greek |
iso8859_8 | iso-8859-8, hebrew | Hebrew |
iso8859_9 | iso-8859-9, latin5, L5 | Turkish |
iso8859_10 | iso-8859-10, latin6, L6 | Nordic languages |
iso8859_11 | iso-8859-11, thai | Thai languages |
iso8859_13 | iso-8859-13, latin7, L7 | Baltic languages |
iso8859_14 | iso-8859-14, latin8, L8 | Celtic languages |
iso8859_15 | iso-8859-15, latin9, L9 | Western Europe |
iso8859_16 | iso-8859-16, latin10, L10 | South-Eastern Europe |
johab | cp1361, ms1361 | Korean |
koi8_r | Russian | |
koi8_u | Ukrainian | |
mac_cyrillic | maccyrillic | Bulgarian, Byelorussian, Macedonian, Russian, Serbian |
mac_greek | macgreek | Greek |
mac_iceland | maciceland | Icelandic |
mac_latin2 | maclatin2, maccentraleurope | Central and Eastern Europe |
mac_roman | macroman | Western Europe |
mac_turkish | macturkish | Turkish |
ptcp154 | csptcp154, pt154, cp154, cyrillic-asian | Kazakh |
shift_jis | csshiftjis, shiftjis, sjis, s_jis | Japanese |
shift_jis_2004 | shiftjis2004, sjis_2004, sjis2004 | Japanese |
shift_jisx0213 | shiftjisx0213, sjisx0213, s_jisx0213 | Japanese |
utf_32 | U32, utf32 | all languages |
utf_32_be | UTF-32BE | all languages |
utf_32_le | UTF-32LE | all languages |
utf_16 | U16, utf16 | all languages |
utf_16_be | UTF-16BE | all languages (BMP only) |
utf_16_le | UTF-16LE | all languages (BMP only) |
utf_7 | U7, unicode-1-1-utf-7 | all languages |
utf_8 | U8, UTF, utf8 | all languages |
utf_8_sig | all languages |
7.8.4. Python Specific Encodings
A number of predefined codecs are specific to Python, so their codec names have no meaning outside Python. These are listed in the tables below based on the expected input and output types (note that while text encodings are the most common use case for codecs, the underlying codec infrastructure supports arbitrary data transforms rather than just text encodings). For asymmetric codecs, the stated purpose describes the encoding direction.
The following codecs provide unicode-to-str encoding [1] and str-to-unicode decoding [2], similar to the Unicode text encodings.
Codec | Aliases | Purpose |
---|---|---|
idna | Implements RFC 3490, see also encodings.idna |
|
mbcs | dbcs | Windows only: Encode operand according to the ANSI codepage (CP_ACP) |
palmos | Encoding of PalmOS 3.5 | |
punycode | Implements RFC 3492 | |
raw_unicode_escape | Produce a string that is suitable as raw Unicode literal in Python source code | |
rot_13 | rot13 | Returns the Caesar-cypher encryption of the operand |
undefined | Raise an exception for all conversions. Can be used as the system encoding if no automatic coercion between byte and Unicode strings is desired. | |
unicode_escape | Produce a string that is suitable as Unicode literal in Python source code | |
unicode_internal | Return the internal representation of the operand |
New in version 2.3: The idna
and punycode
encodings.
The following codecs provide str-to-str encoding and decoding [2].
Codec | Aliases | Purpose | Encoder/decoder |
---|---|---|---|
base64_codec | base64, base-64 | Convert operand to multiline MIME base64 (the result always includes a trailing '\n' ) |
base64.encodestring() ,base64.decodestring() |
bz2_codec | bz2 | Compress the operand using bz2 | bz2.compress() , bz2.decompress() |
hex_codec | hex | Convert operand to hexadecimal representation, with two digits per byte | binascii.b2a_hex() , binascii.a2b_hex() |
quopri_codec | quopri, quoted-printable, quotedprintable | Convert operand to MIME quoted printable | quopri.encode() with quotetabs=True ,quopri.decode() |
string_escape | Produce a string that is suitable as string literal in Python source code | ||
uu_codec | uu | Convert the operand using uuencode | uu.encode() , uu.decode() |
zlib_codec | zip, zlib | Compress the operand using gzip | zlib.compress() , zlib.decompress() |
[1] | str objects are also accepted as input in place of unicode objects. They are implicitly converted to unicode by decoding them using the default encoding. If this conversion fails, it may lead to encoding operations raising UnicodeDecodeError . |
[2] | (1, 2) unicode objects are also accepted as input in place of str objects. They are implicitly converted to str by encoding them using the default encoding. If this conversion fails, it may lead to decoding operations raising UnicodeEncodeError . |
python 编码方式大全 fr = open(filename_r,encoding='cp852')的更多相关文章
- 【python】python编码方式,chardet编码识别库
环境: python3.6 需求: 针对于打开一个文件,可以读取到文本的编码方式,根据默认的文件编码方式来获取文件,就不会出现乱码. 针对这种需求,python中有这个方式可以很好的解决: 解决策略: ...
- python编码错误
初学python,遇到的最难忘的坑没有之一.这个问题起码困扰了我一周.在我写了一段代码之后经常遇见这样的报错. 本质原因是我用的python2,在编码流派中python2是比较奇葩的一派,不随大流.所 ...
- 系统编码,文件编码,python编码
系统编码,可以通过locale命令查看(LINUX)https://wiki.archlinux.org/index.php/Locale_(简体中文), centos7 配置文件在/etc/prof ...
- vim 编码方式的设置
和所有的流行文本编辑器一样,Vim 可以很好的编辑各种字符编码的文件,这当然包括UCS-2.UTF-8 等流行的 Unicode 编码方式.然而不幸的是,和很多来自 Linux 世界的软件一样,这需要 ...
- base64编码方式
一.编码的两大方式: 在python3.x中,字符串编码分为unicode和bytes两大类编码方式. 直接书写s='中国人',这种方式定义的编码方式为unicode,是通用的方式. 另一种是byte ...
- Python中的幽灵—编码方式
首先要搞懂本地操作系统编码与系统编码的区别: 本地操作系统编码方式与操作系统有关,Linux默认编码方式为utf-8,Windows默认编码方式为gbk: 系统编码方式与编译器or解释器有关,Pyth ...
- python批量修改文件内容及文件编码方式的处理
最近公司在做tfs迁移,后面要用新的ip地址去访问tfs 拉取代码 ,所以原来发布脚本中.bat类型的脚本中的的ip地址需要更换 简单说下我们发布脚本层级目录 :每个服务站点下都会有一个发布脚本 . ...
- python笔记二(数据类型和变量、编码方式、字符串的编码、字符串的格式化)
一.数据类型 python可以直接处理的数据类型有:整数.浮点数.字符串.布尔值.空值. 整数 浮点数 字符串:双引号内嵌套单引号,可以输出 i'm ok. 也可以用\来实现,\n 换行 \t tab ...
- python文件(概念、基本操作、常用操作、文本文件的编码方式)
文件 目标 文件的概念 文件的基本操作 文件/文件夹的常用操作 文本文件的编码方式 01. 文件的概念 1.1 文件的概念和作用 计算机的 文件,就是存储在某种 长期储存设备 上的一段 数据 长期存储 ...
随机推荐
- something about facebook token
There are two method origin token , you can use any one of them, first one may be easier. Origin fro ...
- Oracle免客户端InstantClient安装使用
正常情况下,用PL/SQL等软件连接Oracle,需要安装Oracle客户端软件,一般安装oracle客户端差不多需要2G左右的硬盘空间,但如果我们仅仅是连接数据库进行查询和执行一些相应的语句而不进行 ...
- JDBC连接数据库创建连接对象
1.加载JDBC驱动程序: 在连接数据库之前,首先要加载想要连接的数据库的驱动到JVM(Java虚拟机), 这通过java.lang.Class类的静态方法forName(String classN ...
- poi转geohash
import geohashimport sysfor line in sys.stdin: fields = line.strip().split('\t') hostid,POS_TIME,POS ...
- HTML的基础知识
1.什么是HTML? html是一种,用来描述网页的一种语言,指的是一种超文本编辑语言,他不是一种编程的语言,而是一种标记的语言,包含:静态HTML和动态的HTML: 2.学习推荐的网站: http: ...
- java并发:读写锁ReadWriteLock
在没有写操作的时候,两个线程同时读一个资源没有任何问题,允许多个线程同时读取共享资源. 但是如果有一个线程想去写这些共享资源,就不应该再有其它线程对该资源进行读或写. 简单来说,多个线程同时操作同一资 ...
- python之文件操作read
#open函数,该函数用于文件处理,文件操作一共就有三种方法,打开文件#关闭文件, #先来说下打开文件,打开文件的模式有下面几种# 1.r,只读模式 f = open('test.log','r',e ...
- Mysql优化性能优化21条
今天,数据库的操作越来越成为整个应用的性能瓶颈了,这点对于Web应用尤其明显.关于数据库的性能,这并不只是DBA才需要担心的事,而这更是我们程序员需要去关注的事情.当我们去设计数据库表结构,对操作数据 ...
- centos7下创建mysql5.6多实例
一.mysql安装目录说明mysql5.6以二进制安装包安装在/data/mysql56下数据目录为/data/mysql56/data下配置文件为/etc/my.cnf下 二.多实例目录说明/mys ...
- 【转】Hadoop HDFS分布式环境搭建
原文地址 http://blog.sina.com.cn/s/blog_7060fb5a0101cson.html Hadoop HDFS分布式环境搭建 最近选择给大家介绍Hadoop HDFS系统 ...