python decode unicode encode

字符串在Python内部的表示是unicode编码，因此，在做编码转换时，通常需要以unicode作为中间编码，即先将其他编码的字符串解码（decode）成unicode，再从unicode编码（encode）成另一种编码。

代码中字符串的默认编码与代码文件本身的编码一致，以下是不一致的两种:

1. s = u'你好'

该字符串的编码就被指定为unicode了，即python的内部编码，而与代码文件本身的编码(查看默认编码：import sys print('hello',sys.getdefaultencoding()) ascii 。设置默认编码：import sys reload(sys) sys.setdefaultencoding('utf-8')))无关。因此，对于这种情况做编码转换，只需要直接使用encode方法将其转换成指定编码即可.

2. # -*- coding: utf-8 -*-

s = ‘你好’

此时为utf-8编码，ascii编码不能显示汉字

isinstance(s, unicode) #用来判断是否为unicode ,是返回True，不是返回False

unicode(str,'gb2312')与str.decode('gb2312')是一样的，都是将gb2312编码的str转为unicode编码

使用str.__class__可以查看str的编码形式

原理说了半天，最后来个包治百病的吧：）

#!/usr/bin/env python
#coding=utf-8
s="中文"

if isinstance(s, unicode):
#s=u"中文"
print s.encode('gb2312')
else:
#s="中文"
print s.decode('utf-8').encode('gb2312')

语音模块代码：

# -*- coding: utf-8 -*-import

import sys

print('hello',sys.getdefaultencoding())

def xfs_frame_info(words):

    #decode utf-8 to python internal unicode coding

    isinstance(words,unicode)

    wordu = words.decode('utf-8')

    #encode python unicode to gbk

    data = wordu.encode('gbk')

    length = len(data) + 2

    frame_info = bytearray(5)

    frame_info[0] = 0xfd

    frame_info[1] = (length >> 8)

    frame_info[2] = (length & 0x00ff)

    frame_info[3] = 0x01

    frame_info[4] = 0x01

    buf = frame_info + data

    print("buf:",buf)

    return buf

if __name__ == "__main__":

    print("hello world")

    words1= u'你好'

    #encodetype = isinstance(words1,unicode)

    #print("encodetype",encodetype)

    print("origin unicode", words1)

    words= words1.encode('utf-8')

    print("utf-8 encoded", words)

    a = xfs_frame_info(words)

    print('a',a)

if __name__ == "__main__":

    print("hello world")

    words1= '你好'

    print("oringe utf-8 encode:",words1)

    encodetype = isinstance(words1,unicode)

    wordu = words1.decode('utf-8')

    print("unicode from utf-8 decode:",wordu)

    #encodetype = isinstance(words1,utf-8)

    #encodetype = isinstance(words1,'ascii')

    #print("encodetype",encodetype)

    #print("origin unicode", words1)

    word_utf8 = wordu.encode('utf-8')

    #encodetype2 = isinstance(words,utf8)

    #print("encodetype2",encodetype2)

    print("utf-8 encoded",word_utf8)

    a = xfs_frame_info(word_utf8)

    print('a',a)

你好前不加u''时，要多一步decode为unicode

python decode unicode encode的更多相关文章

Python decode与encode
字符串在Python内部的表示是unicode编码(8-bit string),因此,在做编码转换时,通常需要以unicode作为中间编码,即先将其他编码的字符串解码(decode)成unicod ...
关于python decode()和 encode()
1.先收集一下这几天看到的关于decode()解码和encode()编码的用法 bytes和str是字节包和字符串,python3中会区分bytes和str,不会混用这两个.字符串可以编码成字节包,而 ...
python decode和encode
摘抄: 字符串在Python内部的表示是Unicode编码,因此,在做编码转换时,通常需要以unicode作为中间编码,即先将其他编码的字符解码(decode)成unicode,再从unicode编码 ...
python encode decode unicode区别及用法
decode 解码 encode 转码 unicode是一种编码,具体可以百度搜 # coding: UTF-8 u = u'汉' print repr(u) # u'\u6c49' s = u.en ...
Python字符串的encode与decode研究心得乱码问题解决方法
为什么Python使用过程中会出现各式各样的乱码问题,明明是中文字符却显示成“\xe4\xb8\xad\xe6\x96\x87”的形式? 为什么会报错“UnicodeEncodeError: 'asc ...
python字符decode与encode的问题
同事在工作中遇到一个字符编码的问题:问题是:从mysql数据库中读出来的varchar类型数据在python是unicode类型的. 但他却对这个unicode字符进行了decode,因为他以为读出来 ...
Python字符串的encode与decode研究心得乱码问题解决方法
以下摘自:http://www.jb51.net/article/17560.htm 为什么Python使用过程中会出现各式各样的乱码问题,明明是中文字符却显示成“\xe4\xb8\xad\xe6\x ...
Python字符串的encode与decode研究心得——解决乱码问题
转~Python字符串的encode与decode研究心得——解决乱码问题为什么Python使用过程中会出现各式各样的乱码问题,明明是中文字符却显示成“/xe4/xb8/xad/xe6/x96/x8 ...
【Python】关于decode和encode
#-*-coding:utf-8 import sys ''' *首先要搞清楚,字符串在Python内部的表示是unicode编码,因此,在做编码转换时,通常需要以unicode作为中间编码, 即先将 ...

随机推荐

用jquery操作xml文件
一. xml文件\内容读取 1.读取xml文件 $.get( xmlfile.xml , function (xml){ //xml即为可以读取使用的内容,具体读取见第2点 }); 2.读取xml内容 ...
Bzoj3170: [Tjoi2013]松鼠聚会 (切比雪夫距离)
题目链接显然,题目要求我们求切比雪夫距离,不会的可以去看一下attack的博客. 考虑枚举所有的点转换为曼哈顿距离后. 那么对于这个点的路程和是. \[\sum_{i=1}^n | x_i - x ...
Linux-利用keepalived实现lvs的高可用性
单主模型IPVS示例配置keepalive 高可用的ipvs集群示例:修改keepalived配置文件修改主机:192.168.234.27的keepalived配置文件 [root@234c27 ...
perl学习之：字符串函数
一.打开.关闭文件 open的返回值用来确定打开文件的操作是否成功,当其成功时返回非零值,失败时返回零,因此可以如下判断: if (open(MYFILE, "myfile" ...
【php】自带的过滤机制
<?php print_r(filter_list()); ?> 输出类似: Array ( [0] => int [1] => boolean [2] => float ...
LeetCode（121） Best Time to Buy and Sell Stock
题目 Say you have an array for which the ith element is the price of a given stock on day i. If you we ...
MPEG-4视频编码核心思想
1 引言当今时代,信息技术和计算机互联网飞速发展,在此背景下,多媒体信息已成为人类获取信息的最主要载体,同时也成为电子信息领域技术开发和研究的热点.多媒体信息经数字化处理后具有易于加密.抗干扰能 ...
leetcode刷题——动态规划
知识点专题-B-动态规划题目斐波那契数列矩阵路径数组区间分割整数最长递增子序列最大连续子序列和最长公共子序列最长回文子序列最长公共子串最长回文子串背包题解 CS-Notes ...
Python 日期与时间
Python 3.6.4 import time, calendar, datetime print("距离1970年的秒数为:", time.time()) print(&quo ...
PHP读取xlsx Excel 文件
<?php require_once 'simplexlsx.class.php'; if ( $xlsx = SimpleXLSX::parse('pricelist.xlsx') ) { p ...

python decode unicode encode

python decode unicode encode的更多相关文章

随机推荐

热门专题