import re from numpy import * def getStr(file_path,file_path1): fp = open(file_path, 'r') op = open(file_path1,'w') for eachline in fp.readlines(): lines = re.split("\t| |\n",eachline) print(lines[2:10]) newlines=lines[2:10] i = 0 for s in newli
jieba “结巴”中文分词:做最好的 Python 中文分词组件 "Jieba" (Chinese for "to stutter") Chinese text segmentation: built to be the best Python Chinese word segmentation module. Scroll down for English documentation. 特点 支持三种分词模式: 精确模式,试图将句子最精确地切开,适合文本分析:
Python读取txt文件,有两种方式: (1)逐行读取 data=open("data.txt") line=data.readline() while line: print line line=data.readline() (2)一次全部读入内存 data=open("data.txt") for line in data.readlines(): print line