在数据清洗过程中,有时不仅希望去掉脏数据,更希望定位脏数据的位置,例如从csv里面定位非数字和字母单元格的位置,在使用isdigit()、isalpha()、isalnum()时无法判断浮点数,会将浮点数都判断为特殊符号。

以下为样例数据,希望定位特殊符号的位置。

实现代码为:

# -*- coding: utf-8 -*-
"""
Created on Tue Dec 6 14:37:12 2016 @author: user
""" import csv
import re csv_reader = csv.reader(open('D:/工作文件夹/Pyhton/20081003.csv',encoding = 'utf-8'))
rows = 0 #方法一、此方法可用于输出所有数值,过滤非数值(反之亦然成立)
'''
def is_a_num(string):
try:
float(string)#return float(string)
except:
return string#return '' for row in csv_reader:
if row != ['FIELD_000','FIELD_001','FIELD_002','FIELD_003','FIELD_004','FIELD_005','FIELD_006','FIELD_007','FIELD_008','FIELD_009','FIELD_010','FIELD_011','FIELD_012','FIELD_013','FIELD_014','FIELD_015','FIELD_016','FIELD_017','FIELD_018','FIELD_019','FIELD_020','FIELD_021','FIELD_022','FIELD_023','FIELD_024','FIELD_025','FIELD_026','FIELD_027','FIELD_028','FIELD_029','FIELD_030','FIELD_031','FIELD_032','FIELD_033','FIELD_034','FIELD_035','FIELD_036','FIELD_037','FIELD_038','FIELD_039','FIELD_040','FIELD_041','FIELD_042','FIELD_043','FIELD_044','FIELD_045','FIELD_046','FIELD_047','FIELD_048','FIELD_049','FIELD_050','FIELD_051','FIELD_052','FIELD_053','FIELD_054','FIELD_055','FIELD_056','FIELD_057','FIELD_058','FIELD_059','FIELD_060','FIELD_061','FIELD_062','FIELD_063','FIELD_064','FIELD_065','FIELD_066','FIELD_067','FIELD_068','FIELD_069','FIELD_070','FIELD_071','FIELD_072','FIELD_073','FIELD_074','FIELD_075','FIELD_076','FIELD_077','FIELD_078','FIELD_079','FIELD_080','FIELD_081','FIELD_082','FIELD_083','FIELD_084','FIELD_085','FIELD_086','FIELD_087','FIELD_088','FIELD_089','FIELD_090','FIELD_091','FIELD_092','FIELD_093','FIELD_094','FIELD_095','FIELD_096','FIELD_097','FIELD_098','FIELD_099','FIELD_100','FIELD_101','FIELD_102','FIELD_103','FIELD_104','FIELD_105','FIELD_106','FIELD_107','FIELD_108','FIELD_109','FIELD_110','FIELD_111','FIELD_112','FIELD_113','FIELD_114','FIELD_115','FIELD_116','FIELD_117','FIELD_118','FIELD_119','FIELD_120','FIELD_121','FIELD_122','FIELD_123','FIELD_124','FIELD_125','FIELD_126','FIELD_127','FIELD_128','FIELD_129','FIELD_130','FIELD_131','FIELD_132','FIELD_133','FIELD_134','FIELD_135','FIELD_136','FIELD_137','FIELD_138','FIELD_139','FIELD_140','FIELD_141','FIELD_142','FIELD_143','FIELD_144','FIELD_145','FIELD_146','FIELD_147','FIELD_148','FIELD_149','FIELD_150','FIELD_151','FIELD_152','FIELD_153','FIELD_154','FIELD_155','FIELD_156','FIELD_157','FIELD_158','FIELD_159','FIELD_160','FIELD_161','FIELD_162','FIELD_163','FIELD_164','FIELD_165','FIELD_166','FIELD_167','FIELD_168','FIELD_169','FIELD_170','FIELD_171','FIELD_172','FIELD_173','FIELD_174','FIELD_175','FIELD_176','FIELD_177','FIELD_178','FIELD_179','FIELD_180','FIELD_181','FIELD_182','FIELD_183','FIELD_184','FIELD_185','FIELD_186','FIELD_187','FIELD_188','FIELD_189','FIELD_190','FIELD_191','FIELD_192','FIELD_193','FIELD_194','FIELD_195','FIELD_196','FIELD_197','FIELD_198','FIELD_199','FIELD_200','FIELD_201','FIELD_202','FIELD_203','FIELD_204','FIELD_205','FIELD_206','FIELD_207','FIELD_208','FIELD_209','FIELD_210','FIELD_211','FIELD_212','FIELD_213','FIELD_214','FIELD_215','FIELD_216','FIELD_217','FIELD_218','FIELD_219','FIELD_220','FIELD_221','FIELD_222','FIELD_223','FIELD_224','FIELD_225','FIELD_226','FIELD_227','FIELD_228','FIELD_229','FIELD_230','FIELD_231','FIELD_232','FIELD_233','FIELD_234','FIELD_235','FIELD_236','FIELD_237','FIELD_238','FIELD_239','FIELD_240','FIELD_241','FIELD_242','FIELD_243','FIELD_244','FIELD_245','FIELD_246','FIELD_247','FIELD_248','FIELD_249','FIELD_250','FIELD_251','FIELD_252','FIELD_253','FIELD_254','FIELD_255','FIELD_256','FIELD_257','FIELD_258','FIELD_259','FIELD_260','FIELD_261','FIELD_262','FIELD_263','FIELD_264','FIELD_265','FIELD_266','FIELD_267','FIELD_268','FIELD_269','FIELD_270','FIELD_271','FIELD_272','FIELD_273','FIELD_274','FIELD_275','FIELD_276','FIELD_277','FIELD_278','FIELD_279','FIELD_280','FIELD_281','FIELD_282','FIELD_283','FIELD_284','FIELD_285','FIELD_286','FIELD_287','FIELD_288','FIELD_289','FIELD_290','FIELD_291','FIELD_292','FIELD_293','FIELD_294','FIELD_295','FIELD_296','FIELD_297','FIELD_298','FIELD_299','FIELD_300','FIELD_301','FIELD_302','FIELD_303','FIELD_304','FIELD_305','FIELD_306','FIELD_307','FIELD_308','FIELD_309','FIELD_310','FIELD_311','FIELD_312','FIELD_313','FIELD_314','FIELD_315','FIELD_316','FIELD_317','FIELD_318','FIELD_319','FIELD_320','FIELD_321','FIELD_322','FIELD_323','FIELD_324','FIELD_325','FIELD_326']:
rows += 1
columns = 0
for Factor in row[0:]:
if is_a_num(Factor) and Factor != '':
# if not Factor.isalnum() and Factor != '' :
columns += 1
print(rows,columns,Factor)
'''
#方法二
for row in csv_reader:
if row != ['FIELD_000','FIELD_001','FIELD_002','FIELD_003','FIELD_004','FIELD_005','FIELD_006','FIELD_007','FIELD_008','FIELD_009','FIELD_010','FIELD_011','FIELD_012','FIELD_013','FIELD_014','FIELD_015','FIELD_016','FIELD_017','FIELD_018','FIELD_019','FIELD_020','FIELD_021','FIELD_022','FIELD_023','FIELD_024','FIELD_025','FIELD_026','FIELD_027','FIELD_028','FIELD_029','FIELD_030','FIELD_031','FIELD_032','FIELD_033','FIELD_034','FIELD_035','FIELD_036','FIELD_037','FIELD_038','FIELD_039','FIELD_040','FIELD_041','FIELD_042','FIELD_043','FIELD_044','FIELD_045','FIELD_046','FIELD_047','FIELD_048','FIELD_049','FIELD_050','FIELD_051','FIELD_052','FIELD_053','FIELD_054','FIELD_055','FIELD_056','FIELD_057','FIELD_058','FIELD_059','FIELD_060','FIELD_061','FIELD_062','FIELD_063','FIELD_064','FIELD_065','FIELD_066','FIELD_067','FIELD_068','FIELD_069','FIELD_070','FIELD_071','FIELD_072','FIELD_073','FIELD_074','FIELD_075','FIELD_076','FIELD_077','FIELD_078','FIELD_079','FIELD_080','FIELD_081','FIELD_082','FIELD_083','FIELD_084','FIELD_085','FIELD_086','FIELD_087','FIELD_088','FIELD_089','FIELD_090','FIELD_091','FIELD_092','FIELD_093','FIELD_094','FIELD_095','FIELD_096','FIELD_097','FIELD_098','FIELD_099','FIELD_100','FIELD_101','FIELD_102','FIELD_103','FIELD_104','FIELD_105','FIELD_106','FIELD_107','FIELD_108','FIELD_109','FIELD_110','FIELD_111','FIELD_112','FIELD_113','FIELD_114','FIELD_115','FIELD_116','FIELD_117','FIELD_118','FIELD_119','FIELD_120','FIELD_121','FIELD_122','FIELD_123','FIELD_124','FIELD_125','FIELD_126','FIELD_127','FIELD_128','FIELD_129','FIELD_130','FIELD_131','FIELD_132','FIELD_133','FIELD_134','FIELD_135','FIELD_136','FIELD_137','FIELD_138','FIELD_139','FIELD_140','FIELD_141','FIELD_142','FIELD_143','FIELD_144','FIELD_145','FIELD_146','FIELD_147','FIELD_148','FIELD_149','FIELD_150','FIELD_151','FIELD_152','FIELD_153','FIELD_154','FIELD_155','FIELD_156','FIELD_157','FIELD_158','FIELD_159','FIELD_160','FIELD_161','FIELD_162','FIELD_163','FIELD_164','FIELD_165','FIELD_166','FIELD_167','FIELD_168','FIELD_169','FIELD_170','FIELD_171','FIELD_172','FIELD_173','FIELD_174','FIELD_175','FIELD_176','FIELD_177','FIELD_178','FIELD_179','FIELD_180','FIELD_181','FIELD_182','FIELD_183','FIELD_184','FIELD_185','FIELD_186','FIELD_187','FIELD_188','FIELD_189','FIELD_190','FIELD_191','FIELD_192','FIELD_193','FIELD_194','FIELD_195','FIELD_196','FIELD_197','FIELD_198','FIELD_199','FIELD_200','FIELD_201','FIELD_202','FIELD_203','FIELD_204','FIELD_205','FIELD_206','FIELD_207','FIELD_208','FIELD_209','FIELD_210','FIELD_211','FIELD_212','FIELD_213','FIELD_214','FIELD_215','FIELD_216','FIELD_217','FIELD_218','FIELD_219','FIELD_220','FIELD_221','FIELD_222','FIELD_223','FIELD_224','FIELD_225','FIELD_226','FIELD_227','FIELD_228','FIELD_229','FIELD_230','FIELD_231','FIELD_232','FIELD_233','FIELD_234','FIELD_235','FIELD_236','FIELD_237','FIELD_238','FIELD_239','FIELD_240','FIELD_241','FIELD_242','FIELD_243','FIELD_244','FIELD_245','FIELD_246','FIELD_247','FIELD_248','FIELD_249','FIELD_250','FIELD_251','FIELD_252','FIELD_253','FIELD_254','FIELD_255','FIELD_256','FIELD_257','FIELD_258','FIELD_259','FIELD_260','FIELD_261','FIELD_262','FIELD_263','FIELD_264','FIELD_265','FIELD_266','FIELD_267','FIELD_268','FIELD_269','FIELD_270','FIELD_271','FIELD_272','FIELD_273','FIELD_274','FIELD_275','FIELD_276','FIELD_277','FIELD_278','FIELD_279','FIELD_280','FIELD_281','FIELD_282','FIELD_283','FIELD_284','FIELD_285','FIELD_286','FIELD_287','FIELD_288','FIELD_289','FIELD_290','FIELD_291','FIELD_292','FIELD_293','FIELD_294','FIELD_295','FIELD_296','FIELD_297','FIELD_298','FIELD_299','FIELD_300','FIELD_301','FIELD_302','FIELD_303','FIELD_304','FIELD_305','FIELD_306','FIELD_307','FIELD_308','FIELD_309','FIELD_310','FIELD_311','FIELD_312','FIELD_313','FIELD_314','FIELD_315','FIELD_316','FIELD_317','FIELD_318','FIELD_319','FIELD_320','FIELD_321','FIELD_322','FIELD_323','FIELD_324','FIELD_325','FIELD_326']:
rows += 1
columns = 0
for Factor in row[0:]:
if re.match("[.0-9A-Z]+$", Factor) == None and Factor != '':
# if not Factor.isalnum() and Factor != '' :
columns += 1
print(rows,columns,Factor)

其中,re.match为正则表达式:

re.match的函数原型为:re.match(pattern, string, flags)

第一个参数是正则表达式,这里为"[.0-9A-Z]+$",匹配[]中的任何字符至少1次,如果匹配成功,则返回一个Match,否则返回一个None;

第二个参数表示要匹配的字符串;

第三个参数是标致位,用于控制正则表达式的匹配方式,如:是否区分大小写,多行匹配等等。

Python 如何在csv中定位非数字和字母的符号的更多相关文章

  1. Python 解决写入csv中间隔一行空行问题

    转载解决写入csv中间隔一行空行问题 写入csv: with open(birth_weight_file,'w') as f: writer=csv.writer(f) writer.writero ...

  2. Python的驻留机制(仅对数字,字母,下划线有效)

    Python的驻留机制及为在同一运行空间内,当两变量的值相同,则地址也相同. 举例: a = 'abc' b = 'abc' print(id(a)) print(id(b)) 以上示例为驻留机制有效 ...

  3. python 找出字符串中出现次数最多的字母

    # 请大家找出s=”aabbccddxxxxffff”中 出现次数最多的字母 # 第一种方法,字典方式: s="aabbccddxxxxffff" count ={} for i ...

  4. [C++/Python] 如何在C++中使用一个Python类? (Use Python-defined class in C++)

    最近在做基于OpenCV的车牌识别, 其中需要用到深度学习的一些代码(Python), 所以一开始的时候开发语言选择了Python(祸患之源). 固然现在Python的速度不算太慢, 但你一定要用Py ...

  5. winform中如何在TextBox中只能输入数字(可以带小数点)

    可以采用像web表单验证的方式,利用textbox的TextChanged事件,每当textbox内容变化时,调用正则表达式的方法验证,用一个label在text后面提示输入错误,具体代码如下: pr ...

  6. C# winform如何在textbox中判断输入的是字母还是数字?

    1.用正规式using System.Text.RegularExpressions; string pattern = @"^\d+(\.\d)?$";if(Text1.Text ...

  7. python 如何在 command 中能够找到 其他module

    部分代码如下: __author__ = 'norsd' # coding=utf8 # 上句说明使用utf8编码 try: import os import sys import time #关键语 ...

  8. js判断字符串中是否有数字和字母

    var p = /[0-9]/; var b = p.test(string);//true,说明有数字var p = /[a-z]/i; var b = p.test(string);//true, ...

  9. sql 判断字符串中是否含有数字和字母

    判断是否含有字母 select PATINDEX('%[A-Za-z]%', ‘ads23432’)=0 (如果存在字母,结果<>1) 判断是否含有数字 PATINDEX('%[0-9]% ...

随机推荐

  1. Datatable 导出到execl 官网demo

    <!DOCTYPE html> <html> <head> <meta http-equiv="Content-type" content ...

  2. SERE0014: Illegal HTML character: decimal 154

    问题:jmeter,生成报告转化成html,报错SERE0014: Illegal HTML character: decimal 154 原因: 某些字符,特别是控制字符#x7F-#x9F ,在XM ...

  3. 关于git上传GitHub以及码云(gitee)

    如果你是gitee(码云),点击链接跳转 首先,你的有一个GitHub的账号(然后新建项目我就不说了) # Linux的方法 GitHub网站下的,点击settings下的emails,确认自己的邮箱 ...

  4. 【05】JSON笔记

    [05]笔记           尽管有许多宣传关于 XML 如何拥有跨平台,跨语言的优势,然而,除非应用于 Web Services,否则,在普通的 Web 应用中,开发者经常为 XML 的解析伤透 ...

  5. Android StatusBarUtil:设置Android系统下方虚拟键键盘透明度

     Android StatusBarUtil:设置Android系统下方虚拟键键盘透明度 Android StatusBarUtil是github上的一个开源项目,主页:https://githu ...

  6. vim下多行注释与解注释

    1.多行注释 (1)按esc进入命令行模式 (2)按下Ctrl+v,进入区块模式,并使用上下键选择需要注释的多行 (3)按下“I”(大写)键,进入插入模式 (4)输入注释符(“//”或“#”等) (5 ...

  7. Sublime Text 3配置支持Markdown编辑

    继上一篇http://www.cnblogs.com/EasonJim/p/7119304.html文章安装好之后,对Markdown支持需要做如下处理: 1.按下[Ctrl]+[Shift]+[P] ...

  8. 6、Java并发性和多线程-并发性与并行性

    以下内容转自http://tutorials.jenkov.com/java-concurrency/concurrency-vs-parallelism.html(使用谷歌翻译): 术语并发和并行性 ...

  9. Android GIS开发系列-- 入门季(6)GraphicsLayer添加文字与图片标签

    一.GraphicsLayer添加图片 GraphicLayer添加图片Graphic,要用到PictureMarkerSymbol,也是样式的一种.添加代码如下: Drawable drawable ...

  10. java NPE就是空指针异常,null pointer exception

    java NPE就是空指针异常,null pointer exception