python 修改文件编码方式

 import chardet

 import os

 def strJudgeCode(str):

     return chardet.detect(str)

 def readFile(path):

     try:

         f = open(path, 'r')

         filecontent = f.read()

     finally:

         if f:

             f.close()

     return filecontent

 def WriteFile(str, path):

     try:

         f = open(path, 'w')

         f.write(str)

     finally:

         if f:

             f.close()

 def converCode(path):

     file_con = readFile(path)

     result = strJudgeCode(file_con)

     #print(file_con)

     if result['encoding'] == 'utf-8':

         #os.remove(path)

         a_unicode = file_con.decode('utf-8')

         gb2312 = a_unicode.encode('gbk')

         WriteFile(gb2312, path)

 def listDirFile(dir):

     list = os.listdir(dir)

     for line in list:

         filepath = os.path.join(dir, line)

         if os.path.isdir(filepath):

             listDirFile(filepath)

         else:

             print(line)

             converCode(filepath)            

 if __name__ == '__main__':

     listDirFile(u'.\TRMD')

详细解释：


 1 import chardet

 import os

 def strJudgeCode(str):

     return chardet.detect(str)

     '''

 chardet.detect()返回字典，其中confidence是检测精确度，encoding是编码形式

 {'confidence': 0.98999999999999999, 'encoding': 'GB2312'}

 （）网页编码判断：

 >>> import urllib

 >>> rawdata = urllib.urlopen('http://www.google.cn/').read()

 >>> import chardet

 >>> chardet.detect(rawdata)

 {'confidence': 0.98999999999999999, 'encoding': 'GB2312'}

 （）文件编码判断

 复制代码

 import chardet

 tt=open('c:\\111.txt','rb')

 ff=tt.readline()

 #这里试着换成read()也可以，但是换成readlines()后报错

 enc=chardet.detect(ff)

 print enc['encoding']

 tt.close()

     '''

 def readFile(path):

     try:

         f = open(path, 'r')

         filecontent = f.read()

     finally:

         if f:

             f.close()

     return filecontent

 def WriteFile(str, path):

     try:

         f = open(path, 'w')

         f.write(str)

     finally:

         if f:

             f.close()

 def converCode(path):

     file_con = readFile(path)

     result = strJudgeCode(file_con)

     #print(file_con)

     if result['encoding'] == 'utf-8':

         #os.remove(path)

         a_unicode = file_con.decode('utf-8')

     '''

 使用decode()和encode()来进行解码和编码

 u = '中文' #指定字符串类型对象u

 str = u.encode('gb2312') #以gb2312编码对u进行编码，获得bytes类型对象str

 u1 = str.decode('gb2312')#以gb2312编码对字符串str进行解码，获得字符串类型对象u1

 u2 = str.decode('utf-8')#如果以utf-8的编码对str进行解码得到的结果，将无法还原原来的字符串内容

     '''

         gb2312 = a_unicode.encode('gbk')

         WriteFile(gb2312, path)

 def listDirFile(dir):

     list = os.listdir(dir)#返回指定路径下的文件和文件夹列表。

     for line in list:

         filepath = os.path.join(dir, line)

         '''

 是在拼接路径的时候用的。举个例子，

 os.path.join(“home”, "me", "mywork")

 在Linux系统上会返回

 “home/me/mywork"

 在Windows系统上会返回

 "home\me\mywork"

 好处是可以根据系统自动选择正确的路径分隔符"/"或"\"

         '''

         if os.path.isdir(filepath):#os.path.isdir()函数判断某一路径是否为目录

             listDirFile(filepath)

         else:

             print(line)

             converCode(filepath)            

 if __name__ == '__main__':

     listDirFile(u'.\TRMD')

     '''

 u'string'  表示 已经是 unicode 编码的 'string' 字符串

 # -*- coding: UTF- -*-   这句是告诉python程序中的文本是utf-8编码，让python可以按照utf-8读取程

 中文前加u就是告诉python后面的是个unicode编码，存储时按unicode格式存储。

     '''

python 修改文件编码方式的更多相关文章

python批量修改文件内容及文件编码方式的处理
最近公司在做tfs迁移,后面要用新的ip地址去访问tfs 拉取代码 ,所以原来发布脚本中.bat类型的脚本中的的ip地址需要更换简单说下我们发布脚本层级目录 :每个服务站点下都会有一个发布脚本 . ...
格式化MyEclipse代码(java、jsp、js)行的长度@修改java代码字体@修改Properties文件编码方式
每次用MyEclipse/Eclipse自带的快捷键Ctrl+shift+f格式化代码时,如果原来的一行代码大于80列,Eclipse就会自动换为多行.如果想格式化代码后不想让代码换行可以通过以下方式 ...
Python读取文件编码及内容
Python读取文件编码及内容最近做一个项目,需要读取文件内容,但是文件的编码方式有可能都不一样.有的使用GBK,有的使用UTF8.所以在不正确读取的时候会出现如下错误: UnicodeDecode ...
python 修改文件内容
python 修改文件内容一.修改原文件方式 1 def alter(file,old_str,new_str): 2 """ 3 替换文件中的字符串 4 :param ...
python 的文件编码处理
python的文件编码处理有点粗鲁 1.不管文件原来是编码类型,读入后都转换成Unicode的编码 2.写入文件时,write函数把变量以读入文件的编码方式写入(根据open(path,mode,en ...
Ubuntu 查看/修改文件编码
使用enca工具可以查看和修改文件编码 1.安装 sudo apt-get install enca 2.使用查看文件编码 enca –L zh_CN file_name 修改文件编码 enca – ...
eclipse修改文件编码
http://topic.csdn.net/u/20080724/14/428de399-790d-442a-8340-3a5fb6dcfcee.html[修改文件编码,假设JS] 在Eclips ...
分享一个批量修改文件编码的python脚本
分享一个自己编写的递归查找子目录,将所有cpp文件编码修改为utf-8编码格式的小脚本 #i!/usr/bin/env python3 # -*- coding:utf-8 -*- import os ...
Python中文件编码的检测
前言: 文件打开的原则是“ 以什么编码格式保存的,就以什么编码格式打开 ”,我们常见的文件一般是以“ utf-8 ”或“ GBK ”编码进行保存的,由于编辑器一般设置了默认的保存和打开方式,所以我们在 ...

随机推荐

tensorflow UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte
tensorflow读取图像出现错误:UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid s ...
mysql之explain
⊙ 使用EXPLAIN语法检查查询执行计划 ◎ 查看索引的使用情况 ◎ 查看行扫描情况 ⊙ 避免使用SELECT * ◎ 这会导致表的全扫描 ◎ 网络带宽会被浪费话说工欲善其 ...
TCP/IP知识总结（TCP/IP协议族读书笔记四）
参考:http://blog.chinaunix.net/uid-26275986-id-4109679.html 继续!TCP的流量控制和拥塞控制. TCP相对UDP可靠的地方在于它的拥塞控制.流量 ...
Haskell语言学习笔记（27）Endo, Dual, Foldable
Endo Monoid newtype Endo a = Endo { appEndo :: a -> a } instance Monoid (Endo a) where mempty = E ...
str和repr的区别(转)
Python打印值的时候会保持该值在python代码中的状态,不是用户所希望看到的状态.而使用print打印值则不一样,print打印出来的值是用户所希望看到的状态. 例如: >>> ...
cmd 获取拖拽文件名
1. @echo off & setlocal enableDelayedExpansion set a= set /p a=Please drag your txt file for spl ...
安装和使用iOS的包管理工具CocoaPods
CocoaPods是ruby实现的,需要用ruby进行安装,mac自带ruby,如果没有ruby的需要先安装ruby. 安装CocoaPods命令安装CocoaPods命令:sudo gem i ...
Loadrunner通过吞吐量计算每个用户需要的带宽
Loadrunner通过吞吐量计算每个用户需要的带宽运行一个场景,点击Analysis进行分析,使用分析报告中的Average Throughput(bytes/second)进行计算. 计算公式: ...
获取APP的启动图 -Launch Image
http://adad184.com/2015/10/15/tips-access-current-launch-image/
解决ios手机页面overflow scroll滑动很卡的问题
在移动端html中经常出现横向/纵向滚动的效果,但是在iPhone中滚动速度很慢,感觉不流畅,有种卡卡的感觉,但是在安卓设备上没有这种感觉; 要解决这个问题很简单: 一行代码搞定 -webkit-ov ...

python 修改文件编码方式

python 修改文件编码方式的更多相关文章

随机推荐

热门专题