python为文本中的汉字加拼音

Python中文转拼音代码(支持全拼和首字母缩写)

本文的代码,从https://github.com/cleverdeng/pinyin.py升级得来,针对原文的代码,做了以下升级: 1 2 3 4 1.可以传入参数firstcode:如果为true,只取汉子的第一个拼音字母:如果为false,则会输出全部拼音: 2.修复:如果为英文字母,则直接输出: 3.修复:如果分隔符为空字符串,仍然能正常输出: 4.升级:可以指定词典的文件路径代码很简单,直接读取了一个词典(字符和英文的映射),然后挨个替换中文中的拼音即可:

（转载）delphi中获取汉字的拼音首字母

delphi中获取汉字的拼音首字母1.py: array[216..247] of string = ({216}'CJWGNSPGCGNESYPB' + 'TYYZDXYKYGTDJNMJ' + 'QMBSGZSCYJSYYZPG' +{216}'KBZGYCYWYKGKLJSW' + 'KPJQHYZWDDZLSGMR' + 'YPYWWCCKZNKYDG',{217}'TTNJJEYKKZYTCJNM' + 'CYLQLYPYQFQRPZSL' + 'WBTGKJFYXJWZLTBN' +

python统计文本中每个单词出现的次数

.python统计文本中每个单词出现的次数: #coding=utf-8 __author__ = 'zcg' import collections import os with open('abc.txt') as file1:#打开文本文件 str1=file1.read().split(' ')#将文章按照空格划分开 print "原文本:\n %s"% str1 print "\n各单词出现的次数:\n %s" % collections.Counter(s

给文本中的url加超级链接，同时滤过已加过链接的url

/** * 给文本中的url加超级链接,同时滤过已有链接的url * @param string $str [description] * @return [type] [description] */ function text2links($str='') { if($str=='' or !preg_match('/(http|www\.|@)/i', $str)) return $str; $lines = explode("\n", $str); $new_text = ''

SQL中把汉字转换拼音码

思路:在SQL中创建一个函数fn_GetPy(),函数的输入参数是一个汉字字符串,返回值是拼音码字符串. 创建函数语句: CREATE function fn_GetPy(@str nvarchar(4000)) returns nvarchar(4000) --WITH ENCRYPTION as begin declare @intLen int declare @strRet nvarchar(4000) declare @temp nvarchar(100) set @intLen =

python去除文本中的HTML标签

def SplitHtmlTag(file): with open(file,"r") as f,open("result.txt","w+") as c: lines=f.readlines() for line in lines: re_html=re.compile(r'<[^>]+>')#从'<'开始匹配,不是'>'的字符都跳过,直到'>' line=re_html.sub('',line) c.wri

Python 去掉文本中空行

pandas 操作csv文件时,一直报错,排查后发现csv文本中存在很多“空行”: So 需要把空行全部去掉: def clearBlankLine(): file1 = open('text1.txt', 'r', encoding='utf-8') # 要去掉空行的文件 file2 = open('text2.txt', 'w', encoding='utf-8') # 生成没有空行的文件 try: for line in file1.readlines(): if line == '\n'

利用python将表格中的汉字转化为拼音

缺少包时用pip install 进行安装,例如: pip install xlsxwriter 完成代码如下: #!/usr/bin/python #-*-coding:utf-8-*- #from openpyxl import load_workbook from xpinyin import Pinyin import pandas as pd import xlwt import xlrd import xlsxwriter #将gb18030_loadder_tab1.xls表中

mysql数据库中查询汉字的拼音首字母

本人提供的方法有如下特点: 1.代码精简,使用简单,只要会基本的SQL语句就行2.不用建立mysql 函数等复杂的东西3.汉字库最全,可查询20902个汉字方法如下: 1.建立拼音首字母资料表Sql代码:(最好再加上主键和索引) DROP TABLE IF EXISTS `pinyin`; CREATE TABLE `pinyin` ( `PY` varchar(1), `HZ1` varchar(1), `HZ2` varchar(1) ) ; INSERT INTO `pinyin

python 过滤文本中的标点符号（转）

网上搜到的大都太复杂,最后找到一个用正则表达式实现的: import re s = "string. With. Punctuation?" # 如果空白符也需要过滤,使用 r'[^\w]' s = re.sub(r'[^\w\s]','',s) 支持中文和中文标点. 原理很简单:在正则表达式中,\w 匹配字母或数字或下划线或汉字(具体与字符集有关),^\w 表示相反匹配. 转自:http://baimoz.me/1656/

python从文本中提取某酒店机顶盒号和智能卡号

1.某项目中经常遇到需要关闭一些机顶盒消费权限.但是给过来的不是纯字符串,需要自己提取. 有400多个机顶盒和智能卡.nodepad++的列块模式也可以提取,但是还是稍微麻烦,因为列不对等先复制到文本里提取脚本,使用re模块,它功能更强大. [\n:-]+表示以里面的多种为分隔符 #正则表达式[,|;*]中的任何一个出现至少一次 import re f=open('1.txt','r',encoding='utf-8') w=open('2.txt','a',encoding='utf-8'

使用python读取文本中结构化数据

需求 read some .txt file in dir and find min and max num in file. solution: echo *.txt > file.name in linux shell >>>execfile("mytest.py"); //equivalent to run mytest.m in matlab import os fileobj = open("./test2images/2d_xxx.name

用Python实现小说中的汉字频率统计

环境: Python 3的代码,亲测可用. 思路: 是先把每个字符提出来放在列表里:再过滤掉其中的标点符号:最后用字典对某个字出现的频率进行累加. 扩展: 用处很多,稍微改改,既可以用来统计小说或文章,也可以用来决定让孩子学哪些常用字,还可以用来分析微博或朋友圈中好友的语言特点,需要的就拿去浪吧,记得送我一个滑稽的回复. #coding:utf-8 word_lst = [] word_dict = {} exclude_str = ",.!?.()[]<><>=:+-

bash python获取文本中每个字符出现的次数

bash: grep -o . myfile | sort |uniq -c python: 使用collections模块 import pprint import collections f = 'xxx' with open(f) as info: count = collections.Counter(info.read().upper()) value = pprint.pformat(count) print(value) 或 import codecs f = 'xxx' wit

python的xpinyin模块：汉字转拼音

pypinyin 1.安装 pip install pypinyin 2.使用方法 >>> from pypinyin import pinyin, lazy_pinyin >>> import pypinyin >>> pinyin(u'中心') [[u'zh\u014dng'], [u'x\u012bn']] # 启用多音字模式 >>> pinyin(u'中心', heteronym

Python 统计文本中单词的个数

1.读文件,通过正则匹配 def statisticWord(): line_number = 0 words_dict = {} with open (r'D:\test\test.txt',encoding='utf-8') as a_file: for line in a_file: words = re.findall(r'&#\d+;|&#\d+;|&\w+;',line) for word in words: words_dict[word] = words_dict.

python删除文本中的所有空字符

import re import os input_path = 'G:/test/aa.json' output_path ='G:/test/bb.json' with open(input_path) as input_file: str = input_file.read() str = re.sub('\s','',str) print str with open(output_path, 'w') as output_file: output_file.write(str)

Python 001- 将URL中的汉字转换为url编码

很多时候想爬取网页信息,结果出现URL是中文的情况(比如‘耳机'),url的地址编码却是%E8%80%B3%E6%9C%BA,因此需要做一个转换.这里我们就用到了模块urllib. 代码超简单 #-*- coding:utf-8 -*- import urllib data = '耳机' print data print urllib.quote(data) 结果: 耳机 %E8%80%B3%E6%9C%BA [Finished in 0.1s] 如果想换回去,用urllib.unquote()

js 中实现汉字按拼音排序

let arr = ["贵州省", "江苏省", "江西省", "浙江省", "四川省", "安徽省", "山东省", "上海", "湖北省", "福建省", "辽宁省", "山西省", "河北省", "青海省", "黑

Python 替换文本中的某些词语

https://stackoverflow.com/questions/39086/search-and-replace-a-line-in-a-file-in-python from tempfile import mkstemp from shutil import move from os import fdopen, remove def replace(file_path, pattern, subst): #Create temp file fh, abs_path = mkstem

Hanlp汉字转拼音使用python调用详解

1.hanlp简介 HanLP是一系列模型与算法组成的NLP工具包,由大快搜索主导并完全开源,目标是普及自然语言处理在生产环境中的应用.HanLP具备功能完善.性能高效.架构清晰.语料时新.可自定义的特点. 开源网址:HanLP: Han Language Processing 但由于hanlp是用java来实现的,要在python中使用hanlp,只能通过调用pyhanlp这个包来. 但是pyhanlp里面有一些功能仍然不支持python直接调用,比如汉字转拼音,这时候就需要从python中启

python为文本中的汉字加拼音

热门专题