Python 如何在csv中定位非数字和字母的符号
在数据清洗过程中,有时不仅希望去掉脏数据,更希望定位脏数据的位置,例如从csv里面定位非数字和字母单元格的位置,在使用isdigit()、isalpha()、isalnum()时无法判断浮点数,会将浮点数都判断为特殊符号。
以下为样例数据,希望定位特殊符号的位置。

实现代码为:
# -*- coding: utf-8 -*-
"""
Created on Tue Dec 6 14:37:12 2016 @author: user
""" import csv
import re csv_reader = csv.reader(open('D:/工作文件夹/Pyhton/20081003.csv',encoding = 'utf-8'))
rows = 0 #方法一、此方法可用于输出所有数值,过滤非数值(反之亦然成立)
'''
def is_a_num(string):
try:
float(string)#return float(string)
except:
return string#return '' for row in csv_reader:
if row != ['FIELD_000','FIELD_001','FIELD_002','FIELD_003','FIELD_004','FIELD_005','FIELD_006','FIELD_007','FIELD_008','FIELD_009','FIELD_010','FIELD_011','FIELD_012','FIELD_013','FIELD_014','FIELD_015','FIELD_016','FIELD_017','FIELD_018','FIELD_019','FIELD_020','FIELD_021','FIELD_022','FIELD_023','FIELD_024','FIELD_025','FIELD_026','FIELD_027','FIELD_028','FIELD_029','FIELD_030','FIELD_031','FIELD_032','FIELD_033','FIELD_034','FIELD_035','FIELD_036','FIELD_037','FIELD_038','FIELD_039','FIELD_040','FIELD_041','FIELD_042','FIELD_043','FIELD_044','FIELD_045','FIELD_046','FIELD_047','FIELD_048','FIELD_049','FIELD_050','FIELD_051','FIELD_052','FIELD_053','FIELD_054','FIELD_055','FIELD_056','FIELD_057','FIELD_058','FIELD_059','FIELD_060','FIELD_061','FIELD_062','FIELD_063','FIELD_064','FIELD_065','FIELD_066','FIELD_067','FIELD_068','FIELD_069','FIELD_070','FIELD_071','FIELD_072','FIELD_073','FIELD_074','FIELD_075','FIELD_076','FIELD_077','FIELD_078','FIELD_079','FIELD_080','FIELD_081','FIELD_082','FIELD_083','FIELD_084','FIELD_085','FIELD_086','FIELD_087','FIELD_088','FIELD_089','FIELD_090','FIELD_091','FIELD_092','FIELD_093','FIELD_094','FIELD_095','FIELD_096','FIELD_097','FIELD_098','FIELD_099','FIELD_100','FIELD_101','FIELD_102','FIELD_103','FIELD_104','FIELD_105','FIELD_106','FIELD_107','FIELD_108','FIELD_109','FIELD_110','FIELD_111','FIELD_112','FIELD_113','FIELD_114','FIELD_115','FIELD_116','FIELD_117','FIELD_118','FIELD_119','FIELD_120','FIELD_121','FIELD_122','FIELD_123','FIELD_124','FIELD_125','FIELD_126','FIELD_127','FIELD_128','FIELD_129','FIELD_130','FIELD_131','FIELD_132','FIELD_133','FIELD_134','FIELD_135','FIELD_136','FIELD_137','FIELD_138','FIELD_139','FIELD_140','FIELD_141','FIELD_142','FIELD_143','FIELD_144','FIELD_145','FIELD_146','FIELD_147','FIELD_148','FIELD_149','FIELD_150','FIELD_151','FIELD_152','FIELD_153','FIELD_154','FIELD_155','FIELD_156','FIELD_157','FIELD_158','FIELD_159','FIELD_160','FIELD_161','FIELD_162','FIELD_163','FIELD_164','FIELD_165','FIELD_166','FIELD_167','FIELD_168','FIELD_169','FIELD_170','FIELD_171','FIELD_172','FIELD_173','FIELD_174','FIELD_175','FIELD_176','FIELD_177','FIELD_178','FIELD_179','FIELD_180','FIELD_181','FIELD_182','FIELD_183','FIELD_184','FIELD_185','FIELD_186','FIELD_187','FIELD_188','FIELD_189','FIELD_190','FIELD_191','FIELD_192','FIELD_193','FIELD_194','FIELD_195','FIELD_196','FIELD_197','FIELD_198','FIELD_199','FIELD_200','FIELD_201','FIELD_202','FIELD_203','FIELD_204','FIELD_205','FIELD_206','FIELD_207','FIELD_208','FIELD_209','FIELD_210','FIELD_211','FIELD_212','FIELD_213','FIELD_214','FIELD_215','FIELD_216','FIELD_217','FIELD_218','FIELD_219','FIELD_220','FIELD_221','FIELD_222','FIELD_223','FIELD_224','FIELD_225','FIELD_226','FIELD_227','FIELD_228','FIELD_229','FIELD_230','FIELD_231','FIELD_232','FIELD_233','FIELD_234','FIELD_235','FIELD_236','FIELD_237','FIELD_238','FIELD_239','FIELD_240','FIELD_241','FIELD_242','FIELD_243','FIELD_244','FIELD_245','FIELD_246','FIELD_247','FIELD_248','FIELD_249','FIELD_250','FIELD_251','FIELD_252','FIELD_253','FIELD_254','FIELD_255','FIELD_256','FIELD_257','FIELD_258','FIELD_259','FIELD_260','FIELD_261','FIELD_262','FIELD_263','FIELD_264','FIELD_265','FIELD_266','FIELD_267','FIELD_268','FIELD_269','FIELD_270','FIELD_271','FIELD_272','FIELD_273','FIELD_274','FIELD_275','FIELD_276','FIELD_277','FIELD_278','FIELD_279','FIELD_280','FIELD_281','FIELD_282','FIELD_283','FIELD_284','FIELD_285','FIELD_286','FIELD_287','FIELD_288','FIELD_289','FIELD_290','FIELD_291','FIELD_292','FIELD_293','FIELD_294','FIELD_295','FIELD_296','FIELD_297','FIELD_298','FIELD_299','FIELD_300','FIELD_301','FIELD_302','FIELD_303','FIELD_304','FIELD_305','FIELD_306','FIELD_307','FIELD_308','FIELD_309','FIELD_310','FIELD_311','FIELD_312','FIELD_313','FIELD_314','FIELD_315','FIELD_316','FIELD_317','FIELD_318','FIELD_319','FIELD_320','FIELD_321','FIELD_322','FIELD_323','FIELD_324','FIELD_325','FIELD_326']:
rows += 1
columns = 0
for Factor in row[0:]:
if is_a_num(Factor) and Factor != '':
# if not Factor.isalnum() and Factor != '' :
columns += 1
print(rows,columns,Factor)
'''
#方法二
for row in csv_reader:
if row != ['FIELD_000','FIELD_001','FIELD_002','FIELD_003','FIELD_004','FIELD_005','FIELD_006','FIELD_007','FIELD_008','FIELD_009','FIELD_010','FIELD_011','FIELD_012','FIELD_013','FIELD_014','FIELD_015','FIELD_016','FIELD_017','FIELD_018','FIELD_019','FIELD_020','FIELD_021','FIELD_022','FIELD_023','FIELD_024','FIELD_025','FIELD_026','FIELD_027','FIELD_028','FIELD_029','FIELD_030','FIELD_031','FIELD_032','FIELD_033','FIELD_034','FIELD_035','FIELD_036','FIELD_037','FIELD_038','FIELD_039','FIELD_040','FIELD_041','FIELD_042','FIELD_043','FIELD_044','FIELD_045','FIELD_046','FIELD_047','FIELD_048','FIELD_049','FIELD_050','FIELD_051','FIELD_052','FIELD_053','FIELD_054','FIELD_055','FIELD_056','FIELD_057','FIELD_058','FIELD_059','FIELD_060','FIELD_061','FIELD_062','FIELD_063','FIELD_064','FIELD_065','FIELD_066','FIELD_067','FIELD_068','FIELD_069','FIELD_070','FIELD_071','FIELD_072','FIELD_073','FIELD_074','FIELD_075','FIELD_076','FIELD_077','FIELD_078','FIELD_079','FIELD_080','FIELD_081','FIELD_082','FIELD_083','FIELD_084','FIELD_085','FIELD_086','FIELD_087','FIELD_088','FIELD_089','FIELD_090','FIELD_091','FIELD_092','FIELD_093','FIELD_094','FIELD_095','FIELD_096','FIELD_097','FIELD_098','FIELD_099','FIELD_100','FIELD_101','FIELD_102','FIELD_103','FIELD_104','FIELD_105','FIELD_106','FIELD_107','FIELD_108','FIELD_109','FIELD_110','FIELD_111','FIELD_112','FIELD_113','FIELD_114','FIELD_115','FIELD_116','FIELD_117','FIELD_118','FIELD_119','FIELD_120','FIELD_121','FIELD_122','FIELD_123','FIELD_124','FIELD_125','FIELD_126','FIELD_127','FIELD_128','FIELD_129','FIELD_130','FIELD_131','FIELD_132','FIELD_133','FIELD_134','FIELD_135','FIELD_136','FIELD_137','FIELD_138','FIELD_139','FIELD_140','FIELD_141','FIELD_142','FIELD_143','FIELD_144','FIELD_145','FIELD_146','FIELD_147','FIELD_148','FIELD_149','FIELD_150','FIELD_151','FIELD_152','FIELD_153','FIELD_154','FIELD_155','FIELD_156','FIELD_157','FIELD_158','FIELD_159','FIELD_160','FIELD_161','FIELD_162','FIELD_163','FIELD_164','FIELD_165','FIELD_166','FIELD_167','FIELD_168','FIELD_169','FIELD_170','FIELD_171','FIELD_172','FIELD_173','FIELD_174','FIELD_175','FIELD_176','FIELD_177','FIELD_178','FIELD_179','FIELD_180','FIELD_181','FIELD_182','FIELD_183','FIELD_184','FIELD_185','FIELD_186','FIELD_187','FIELD_188','FIELD_189','FIELD_190','FIELD_191','FIELD_192','FIELD_193','FIELD_194','FIELD_195','FIELD_196','FIELD_197','FIELD_198','FIELD_199','FIELD_200','FIELD_201','FIELD_202','FIELD_203','FIELD_204','FIELD_205','FIELD_206','FIELD_207','FIELD_208','FIELD_209','FIELD_210','FIELD_211','FIELD_212','FIELD_213','FIELD_214','FIELD_215','FIELD_216','FIELD_217','FIELD_218','FIELD_219','FIELD_220','FIELD_221','FIELD_222','FIELD_223','FIELD_224','FIELD_225','FIELD_226','FIELD_227','FIELD_228','FIELD_229','FIELD_230','FIELD_231','FIELD_232','FIELD_233','FIELD_234','FIELD_235','FIELD_236','FIELD_237','FIELD_238','FIELD_239','FIELD_240','FIELD_241','FIELD_242','FIELD_243','FIELD_244','FIELD_245','FIELD_246','FIELD_247','FIELD_248','FIELD_249','FIELD_250','FIELD_251','FIELD_252','FIELD_253','FIELD_254','FIELD_255','FIELD_256','FIELD_257','FIELD_258','FIELD_259','FIELD_260','FIELD_261','FIELD_262','FIELD_263','FIELD_264','FIELD_265','FIELD_266','FIELD_267','FIELD_268','FIELD_269','FIELD_270','FIELD_271','FIELD_272','FIELD_273','FIELD_274','FIELD_275','FIELD_276','FIELD_277','FIELD_278','FIELD_279','FIELD_280','FIELD_281','FIELD_282','FIELD_283','FIELD_284','FIELD_285','FIELD_286','FIELD_287','FIELD_288','FIELD_289','FIELD_290','FIELD_291','FIELD_292','FIELD_293','FIELD_294','FIELD_295','FIELD_296','FIELD_297','FIELD_298','FIELD_299','FIELD_300','FIELD_301','FIELD_302','FIELD_303','FIELD_304','FIELD_305','FIELD_306','FIELD_307','FIELD_308','FIELD_309','FIELD_310','FIELD_311','FIELD_312','FIELD_313','FIELD_314','FIELD_315','FIELD_316','FIELD_317','FIELD_318','FIELD_319','FIELD_320','FIELD_321','FIELD_322','FIELD_323','FIELD_324','FIELD_325','FIELD_326']:
rows += 1
columns = 0
for Factor in row[0:]:
if re.match("[.0-9A-Z]+$", Factor) == None and Factor != '':
# if not Factor.isalnum() and Factor != '' :
columns += 1
print(rows,columns,Factor)
其中,re.match为正则表达式:
re.match的函数原型为:re.match(pattern, string, flags)
第一个参数是正则表达式,这里为"[.0-9A-Z]+$",匹配[]中的任何字符至少1次,如果匹配成功,则返回一个Match,否则返回一个None;
第二个参数表示要匹配的字符串;
第三个参数是标致位,用于控制正则表达式的匹配方式,如:是否区分大小写,多行匹配等等。
Python 如何在csv中定位非数字和字母的符号的更多相关文章
- Python 解决写入csv中间隔一行空行问题
转载解决写入csv中间隔一行空行问题 写入csv: with open(birth_weight_file,'w') as f: writer=csv.writer(f) writer.writero ...
- Python的驻留机制(仅对数字,字母,下划线有效)
Python的驻留机制及为在同一运行空间内,当两变量的值相同,则地址也相同. 举例: a = 'abc' b = 'abc' print(id(a)) print(id(b)) 以上示例为驻留机制有效 ...
- python 找出字符串中出现次数最多的字母
# 请大家找出s=”aabbccddxxxxffff”中 出现次数最多的字母 # 第一种方法,字典方式: s="aabbccddxxxxffff" count ={} for i ...
- [C++/Python] 如何在C++中使用一个Python类? (Use Python-defined class in C++)
最近在做基于OpenCV的车牌识别, 其中需要用到深度学习的一些代码(Python), 所以一开始的时候开发语言选择了Python(祸患之源). 固然现在Python的速度不算太慢, 但你一定要用Py ...
- winform中如何在TextBox中只能输入数字(可以带小数点)
可以采用像web表单验证的方式,利用textbox的TextChanged事件,每当textbox内容变化时,调用正则表达式的方法验证,用一个label在text后面提示输入错误,具体代码如下: pr ...
- C# winform如何在textbox中判断输入的是字母还是数字?
1.用正规式using System.Text.RegularExpressions; string pattern = @"^\d+(\.\d)?$";if(Text1.Text ...
- python 如何在 command 中能够找到 其他module
部分代码如下: __author__ = 'norsd' # coding=utf8 # 上句说明使用utf8编码 try: import os import sys import time #关键语 ...
- js判断字符串中是否有数字和字母
var p = /[0-9]/; var b = p.test(string);//true,说明有数字var p = /[a-z]/i; var b = p.test(string);//true, ...
- sql 判断字符串中是否含有数字和字母
判断是否含有字母 select PATINDEX('%[A-Za-z]%', ‘ads23432’)=0 (如果存在字母,结果<>1) 判断是否含有数字 PATINDEX('%[0-9]% ...
随机推荐
- 使用nfs3将hdfs挂载到本地或远程目录(非kerberos适用)
最基本的配置方法,aix.kerberos等的操作详见http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/Hdf ...
- 从“菜鸟”码农到“资深”架构师,我到底经历了什么?--------http://baijiahao.baidu.com/s?id=1585813883835208757&wfr=spider&for=pc
http://baijiahao.baidu.com/s?id=1585813883835208757&wfr=spider&for=pc
- CodeForcesGym 100753B Bounty Hunter II
Bounty Hunter II Time Limit: 5000ms Memory Limit: 262144KB This problem will be judged on CodeForces ...
- Codeforces 121A Lucky Sum
Lucky Sum Time Limit: 2000ms Memory Limit: 262144KB This problem will be judged on CodeForces. Origi ...
- noip模拟赛 立方数2
题目描述LYK定义了一个数叫“立方数”,若一个数可以被写作是一个正整数的3次方,则这个数就是立方数,例如1,8,27就是最小的3个立方数.LYK还定义了一个数叫“立方差数”,若一个数可以被写作是两个立 ...
- [bzoj1617][Usaco2008 Mar]River Crossing渡河问题_动态规划
River Crossing渡河问题 bzoj-1617 Usaco-2008 Mar 题目大意:题目链接. 注释:略. 想法:zcs0724出考试题的时候并没有发现这题我做过... 先把m求前缀和, ...
- 微信最新开源的PhxSQL
在编者看到“[重磅]微信开源PhxSQL:高可用.强一致的MySQL集群”时,由衷赞叹,这等造福广大DBA及运维同仁的事情,真心赞.腾讯及微信的开放,真的不是说说而已. 本文由资深DB从业者撰写,相信 ...
- ngTbale真分页实现排序、搜索等功能
一. 真分页表格基础 1. 需求:分页,排序,搜索都是需要发API到服务端. 2. JS实现代码: getStorage是localStorage一个工具方法,可以自己写这个方法. API参数如下: ...
- maven更改镜像路径为阿里镜像,以便下载速度快
1.maven更改镜像路径为阿里镜像,以便下载速度快 2.maven每更新一次镜像地址,都会重新下载一次包 3. 怎么配maven链接阿里云的镜像详细步骤 修改maven根目录下的conf文件夹中的s ...
- 条款31: 千万不要返回局部对象的引用,也不要返回函数内部用new初始化的指针的引用
先看第一种情况:返回一个局部对象的引用.它的问题在于,局部对象 ----- 顾名思义 ---- 仅仅是局部的.也就是说,局部对象是在被定义时创建,在离开生命空间时被销毁的.所谓生命空间,是指它们所在的 ...