Python Pandas read_csv报错
为实现文本去重(将前面采集的数据进行两两对比删除重复),写了以下代码。
#-*- coding: utf-8 -*-
import pandas as pd
inputfile = 'e:/data/H_KJ300F-JAC2101W.txt' #评论文件
outputfile = 'e:/data/H_KJ300F-JAC2101W_process_1.txt' #评论处理后保存路径
data = pd.read_csv(inputfile, encoding = 'utf-8', header = None)
l1 = len(data)
data = pd.DataFrame(data[0].unique())
l2 = len(data)
data.to_csv(outputfile, index = False, header = False, encoding = 'utf-8')
print(u'删除了%s条评论。' %(l1 - l2))
报错:
Traceback (most recent call last): File "<stdin>", line 1, in <module> return _read(filepath_or_buffer, kwds) File "D:\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 401, in _read data = parser.read() File "D:\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 939, in read ret = self._engine.read(nrows) File "D:\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 1508, in read data = self._reader.read(nrows) File "pandas\parser.pyx", line 848, in pandas.parser.TextReader.read (pandas\parser.c:10415) File "pandas\parser.pyx", line 870, in pandas.parser.TextReader._read_low_memory (pandas\parser.c:10691) File "pandas\parser.pyx", line 924, in pandas.parser.TextReader._read_rows (pandas\parser.c:11437) File "pandas\parser.pyx", line 911, in pandas.parser.TextReader._tokenize_rows (pandas\parser.c:11308) File "pandas\parser.pyx", line 2024, in pandas.parser.raise_parser_error (pandas\parser.c:27037)pandas.io.common.CParserError: Error tokenizing data. C error: Expected 1 fields in line 360, saw 2>>> data =pd.read_csv(inputfile,encoding ='utf-8',header = None) data = self._reader.read(nrows) File "pandas\parser.pyx", line 848, in pandas.parser.TextReader.read (pandas\parser.c:10415)>>> File "pandas\parser.pyx", line 911, in pandas.parser.TextReader._tokenize_rows (pandas\parser.c:11308) File "pandas\parser.pyx", line 2024, in pandas.parser.raise_parser_error (pandas\parser.c:27037)pandas.io.common.CParserError: Error tokenizing data. C error: Expected 1 fields in line 361, saw 2 File "pandas\parser.pyx", line 870, in pandas.parser.TextReader._read_low_memory (pandas\parser.c:10691) File "pandas\parser.pyx", line 924, in pandas.parser.TextReader._read_rows (pandas\parser.c:11437) ret = self._engine.read(nrows) File "D:\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 1508, in read data = parser.read() File "D:\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 939, in read return _read(filepath_or_buffer, kwds) File "D:\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 401, in _read File "D:\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 646, in parser_fTraceback (most recent call last): File "<stdin>", line 1, in <module>
解决:把整个文件里面的半角","换成全角",“
原因:没有设定分隔符的情况下,默认使用","作为分隔条符。
Python Pandas read_csv报错的更多相关文章
- pandas.read_csv() 报错 OSError: Initializing from file failed,报错原因分析和解决方法
今天调用pandas读取csv文件时,突然报错“ OSError: Initializing from file failed ”,我是有点奇怪的,以前用的好好的,read_csv(path)方法不是 ...
- read_csv报错Initializing from file failed
Python版本:Python 3.6 pandas.read_csv() 报错 OSError: Initializing from file failed,一般由两种情况引起:一种是函数参数为路径 ...
- 【python】python读取文件报错UnicodeDecodeError: 'gbk' codec can't decode byte 0xac in position 2: illegal multibyte sequence
python读取文件报错UnicodeDecodeError: 'gbk' codec can't decode byte 0xac in position 2: illegal multibyte ...
- 【python】python安装tensorflow报错:python No matching distribution found for tensorflow==1.12.0
python安装tensorflow报错:python No matching distribution found for tensorflow==1.12.0 python版本是3.7.2 要安装 ...
- mac下python环境pip报错[SSL: TLSV1_ALERT_PROTOCOL_VERSION] tlsv1 alert protocol version (_ssl.c:590) 的解决方法
1.mac下python环境pip报错: issuserdeMacBook-Pro:~ issuser$ pip install pyinstallerCollecting pyinstaller ...
- Django中修改DATABASES后,执行python manage.py ****报错!UnicodeEncodeError
Django中修改DATABASES后,执行python manage.py ****报错!UnicodeEncodeError: 'latin-1' codec can't encode chara ...
- python 安装模块报错 response.py", line 302, in _error_catcher
python 安装模块报错 Exception:Traceback (most recent call last): File "/usr/share/python-wheels/urlli ...
- 【原创】大叔经验分享(11)python引入模块报错ImportError: No module named pandas numpy
python应用通常需要一些库,比如numpy.pandas等,安装也很简单,直接通过pip # pip install numpyRequirement already satisfied: num ...
- python读取文件报错:pandas.errors.ParserError: iterator should return strings, not bytes (did you open the file in text mode?)
python 读取csv文件报错问题 import csv with open('E:/Selenium2script/DDT模块/test.csv','rb') as f: readers = cs ...
随机推荐
- JAVA-最常用的A题语法
输出 System.out.println(""); if 语句 if(布尔表达式) { //如果布尔表达式为true将执行的语句 } if...else... 语句 if(布尔表 ...
- 2-5 R语言基础 factor
#因子:分类数据#有序和无序#整数向量+标签label#Male/Female#常用于lm(),glm() > x <- factor(c("female"," ...
- [Java123] JBoss
https://blog.csdn.net/taogebx/article/details/4620760
- 十分钟教你使用NoteExpress
http://www.a-site.cn/article/761794.html 如果你正走在读研的路上,不管是什么专业,日常生活中都少不了读文献.读文献和读文献. 与其等到文献堆积如山,给阅读和使用 ...
- 由微软打造的深度学习开放联盟ONNX成立
导读 如今的微软已经一跃成为全球市值最高的高科技公司之一.2018年11月底,微软公司市值曾两次超越了苹果,成为全球市值最高的公司,之后也一直处于与苹果胶着的状态.市场惊叹微软是一家有能力改造自己并取 ...
- C++之内联函数
C++语言新增关键字 inline,用于将一个函数声明为内联函数.在程序编译时,编译器会将内联函数调用处用函数体替换,这一点类似于C语言中的宏扩展. 采用内联函数可以有效避免函数调用的开销,程序执行效 ...
- 关于开发React Native的注意事项
今天在写一个简单的RN的Demo时,一连出现了好几个错误,最后幸亏得以解决,在这里把我踩过的坑以及解决办法分享出来: 1.运行出现错误:Could not connect to development ...
- Docker中查看Mysql数据库中的各环境参数
通过官方的文档可以看到运行MySQL容器的命令是: docker run --name some-mysql -e MYSQL_ROOT_PASSWORD=mypwd -d mysql:tag 如:d ...
- LOJ #2135. 「ZJOI2015」幻想乡战略游戏
#2135. 「ZJOI2015」幻想乡战略游戏 链接 分析: 动态点分治,求加权重心,带修改. 考虑如果知道了一个点s,如何求答案,那么首先可以点分治的思想,求每个联通块内所有点到分治中心距离和,然 ...
- 页面弹出全屏浮层或遮罩时,禁止底层body滚动
· 解决方法 针对弹出的浮层的 touchmove事件,添加阻止浏览器默认行为. $('.mask-wrapper').on('touchmove', function (event) { // 监听 ...