在数据清洗过程中,有时不仅希望去掉脏数据,更希望定位脏数据的位置,例如从csv里面定位非数字和字母单元格的位置,在使用isdigit()、isalpha()、isalnum()时无法判断浮点数,会将浮点数都判断为特殊符号。

以下为样例数据,希望定位特殊符号的位置。

实现代码为:

# -*- coding: utf-8 -*-
"""
Created on Tue Dec 6 14:37:12 2016 @author: user
""" import csv
import re csv_reader = csv.reader(open('D:/工作文件夹/Pyhton/20081003.csv',encoding = 'utf-8'))
rows = 0 #方法一、此方法可用于输出所有数值,过滤非数值(反之亦然成立)
'''
def is_a_num(string):
try:
float(string)#return float(string)
except:
return string#return '' for row in csv_reader:
if row != ['FIELD_000','FIELD_001','FIELD_002','FIELD_003','FIELD_004','FIELD_005','FIELD_006','FIELD_007','FIELD_008','FIELD_009','FIELD_010','FIELD_011','FIELD_012','FIELD_013','FIELD_014','FIELD_015','FIELD_016','FIELD_017','FIELD_018','FIELD_019','FIELD_020','FIELD_021','FIELD_022','FIELD_023','FIELD_024','FIELD_025','FIELD_026','FIELD_027','FIELD_028','FIELD_029','FIELD_030','FIELD_031','FIELD_032','FIELD_033','FIELD_034','FIELD_035','FIELD_036','FIELD_037','FIELD_038','FIELD_039','FIELD_040','FIELD_041','FIELD_042','FIELD_043','FIELD_044','FIELD_045','FIELD_046','FIELD_047','FIELD_048','FIELD_049','FIELD_050','FIELD_051','FIELD_052','FIELD_053','FIELD_054','FIELD_055','FIELD_056','FIELD_057','FIELD_058','FIELD_059','FIELD_060','FIELD_061','FIELD_062','FIELD_063','FIELD_064','FIELD_065','FIELD_066','FIELD_067','FIELD_068','FIELD_069','FIELD_070','FIELD_071','FIELD_072','FIELD_073','FIELD_074','FIELD_075','FIELD_076','FIELD_077','FIELD_078','FIELD_079','FIELD_080','FIELD_081','FIELD_082','FIELD_083','FIELD_084','FIELD_085','FIELD_086','FIELD_087','FIELD_088','FIELD_089','FIELD_090','FIELD_091','FIELD_092','FIELD_093','FIELD_094','FIELD_095','FIELD_096','FIELD_097','FIELD_098','FIELD_099','FIELD_100','FIELD_101','FIELD_102','FIELD_103','FIELD_104','FIELD_105','FIELD_106','FIELD_107','FIELD_108','FIELD_109','FIELD_110','FIELD_111','FIELD_112','FIELD_113','FIELD_114','FIELD_115','FIELD_116','FIELD_117','FIELD_118','FIELD_119','FIELD_120','FIELD_121','FIELD_122','FIELD_123','FIELD_124','FIELD_125','FIELD_126','FIELD_127','FIELD_128','FIELD_129','FIELD_130','FIELD_131','FIELD_132','FIELD_133','FIELD_134','FIELD_135','FIELD_136','FIELD_137','FIELD_138','FIELD_139','FIELD_140','FIELD_141','FIELD_142','FIELD_143','FIELD_144','FIELD_145','FIELD_146','FIELD_147','FIELD_148','FIELD_149','FIELD_150','FIELD_151','FIELD_152','FIELD_153','FIELD_154','FIELD_155','FIELD_156','FIELD_157','FIELD_158','FIELD_159','FIELD_160','FIELD_161','FIELD_162','FIELD_163','FIELD_164','FIELD_165','FIELD_166','FIELD_167','FIELD_168','FIELD_169','FIELD_170','FIELD_171','FIELD_172','FIELD_173','FIELD_174','FIELD_175','FIELD_176','FIELD_177','FIELD_178','FIELD_179','FIELD_180','FIELD_181','FIELD_182','FIELD_183','FIELD_184','FIELD_185','FIELD_186','FIELD_187','FIELD_188','FIELD_189','FIELD_190','FIELD_191','FIELD_192','FIELD_193','FIELD_194','FIELD_195','FIELD_196','FIELD_197','FIELD_198','FIELD_199','FIELD_200','FIELD_201','FIELD_202','FIELD_203','FIELD_204','FIELD_205','FIELD_206','FIELD_207','FIELD_208','FIELD_209','FIELD_210','FIELD_211','FIELD_212','FIELD_213','FIELD_214','FIELD_215','FIELD_216','FIELD_217','FIELD_218','FIELD_219','FIELD_220','FIELD_221','FIELD_222','FIELD_223','FIELD_224','FIELD_225','FIELD_226','FIELD_227','FIELD_228','FIELD_229','FIELD_230','FIELD_231','FIELD_232','FIELD_233','FIELD_234','FIELD_235','FIELD_236','FIELD_237','FIELD_238','FIELD_239','FIELD_240','FIELD_241','FIELD_242','FIELD_243','FIELD_244','FIELD_245','FIELD_246','FIELD_247','FIELD_248','FIELD_249','FIELD_250','FIELD_251','FIELD_252','FIELD_253','FIELD_254','FIELD_255','FIELD_256','FIELD_257','FIELD_258','FIELD_259','FIELD_260','FIELD_261','FIELD_262','FIELD_263','FIELD_264','FIELD_265','FIELD_266','FIELD_267','FIELD_268','FIELD_269','FIELD_270','FIELD_271','FIELD_272','FIELD_273','FIELD_274','FIELD_275','FIELD_276','FIELD_277','FIELD_278','FIELD_279','FIELD_280','FIELD_281','FIELD_282','FIELD_283','FIELD_284','FIELD_285','FIELD_286','FIELD_287','FIELD_288','FIELD_289','FIELD_290','FIELD_291','FIELD_292','FIELD_293','FIELD_294','FIELD_295','FIELD_296','FIELD_297','FIELD_298','FIELD_299','FIELD_300','FIELD_301','FIELD_302','FIELD_303','FIELD_304','FIELD_305','FIELD_306','FIELD_307','FIELD_308','FIELD_309','FIELD_310','FIELD_311','FIELD_312','FIELD_313','FIELD_314','FIELD_315','FIELD_316','FIELD_317','FIELD_318','FIELD_319','FIELD_320','FIELD_321','FIELD_322','FIELD_323','FIELD_324','FIELD_325','FIELD_326']:
rows += 1
columns = 0
for Factor in row[0:]:
if is_a_num(Factor) and Factor != '':
# if not Factor.isalnum() and Factor != '' :
columns += 1
print(rows,columns,Factor)
'''
#方法二
for row in csv_reader:
if row != ['FIELD_000','FIELD_001','FIELD_002','FIELD_003','FIELD_004','FIELD_005','FIELD_006','FIELD_007','FIELD_008','FIELD_009','FIELD_010','FIELD_011','FIELD_012','FIELD_013','FIELD_014','FIELD_015','FIELD_016','FIELD_017','FIELD_018','FIELD_019','FIELD_020','FIELD_021','FIELD_022','FIELD_023','FIELD_024','FIELD_025','FIELD_026','FIELD_027','FIELD_028','FIELD_029','FIELD_030','FIELD_031','FIELD_032','FIELD_033','FIELD_034','FIELD_035','FIELD_036','FIELD_037','FIELD_038','FIELD_039','FIELD_040','FIELD_041','FIELD_042','FIELD_043','FIELD_044','FIELD_045','FIELD_046','FIELD_047','FIELD_048','FIELD_049','FIELD_050','FIELD_051','FIELD_052','FIELD_053','FIELD_054','FIELD_055','FIELD_056','FIELD_057','FIELD_058','FIELD_059','FIELD_060','FIELD_061','FIELD_062','FIELD_063','FIELD_064','FIELD_065','FIELD_066','FIELD_067','FIELD_068','FIELD_069','FIELD_070','FIELD_071','FIELD_072','FIELD_073','FIELD_074','FIELD_075','FIELD_076','FIELD_077','FIELD_078','FIELD_079','FIELD_080','FIELD_081','FIELD_082','FIELD_083','FIELD_084','FIELD_085','FIELD_086','FIELD_087','FIELD_088','FIELD_089','FIELD_090','FIELD_091','FIELD_092','FIELD_093','FIELD_094','FIELD_095','FIELD_096','FIELD_097','FIELD_098','FIELD_099','FIELD_100','FIELD_101','FIELD_102','FIELD_103','FIELD_104','FIELD_105','FIELD_106','FIELD_107','FIELD_108','FIELD_109','FIELD_110','FIELD_111','FIELD_112','FIELD_113','FIELD_114','FIELD_115','FIELD_116','FIELD_117','FIELD_118','FIELD_119','FIELD_120','FIELD_121','FIELD_122','FIELD_123','FIELD_124','FIELD_125','FIELD_126','FIELD_127','FIELD_128','FIELD_129','FIELD_130','FIELD_131','FIELD_132','FIELD_133','FIELD_134','FIELD_135','FIELD_136','FIELD_137','FIELD_138','FIELD_139','FIELD_140','FIELD_141','FIELD_142','FIELD_143','FIELD_144','FIELD_145','FIELD_146','FIELD_147','FIELD_148','FIELD_149','FIELD_150','FIELD_151','FIELD_152','FIELD_153','FIELD_154','FIELD_155','FIELD_156','FIELD_157','FIELD_158','FIELD_159','FIELD_160','FIELD_161','FIELD_162','FIELD_163','FIELD_164','FIELD_165','FIELD_166','FIELD_167','FIELD_168','FIELD_169','FIELD_170','FIELD_171','FIELD_172','FIELD_173','FIELD_174','FIELD_175','FIELD_176','FIELD_177','FIELD_178','FIELD_179','FIELD_180','FIELD_181','FIELD_182','FIELD_183','FIELD_184','FIELD_185','FIELD_186','FIELD_187','FIELD_188','FIELD_189','FIELD_190','FIELD_191','FIELD_192','FIELD_193','FIELD_194','FIELD_195','FIELD_196','FIELD_197','FIELD_198','FIELD_199','FIELD_200','FIELD_201','FIELD_202','FIELD_203','FIELD_204','FIELD_205','FIELD_206','FIELD_207','FIELD_208','FIELD_209','FIELD_210','FIELD_211','FIELD_212','FIELD_213','FIELD_214','FIELD_215','FIELD_216','FIELD_217','FIELD_218','FIELD_219','FIELD_220','FIELD_221','FIELD_222','FIELD_223','FIELD_224','FIELD_225','FIELD_226','FIELD_227','FIELD_228','FIELD_229','FIELD_230','FIELD_231','FIELD_232','FIELD_233','FIELD_234','FIELD_235','FIELD_236','FIELD_237','FIELD_238','FIELD_239','FIELD_240','FIELD_241','FIELD_242','FIELD_243','FIELD_244','FIELD_245','FIELD_246','FIELD_247','FIELD_248','FIELD_249','FIELD_250','FIELD_251','FIELD_252','FIELD_253','FIELD_254','FIELD_255','FIELD_256','FIELD_257','FIELD_258','FIELD_259','FIELD_260','FIELD_261','FIELD_262','FIELD_263','FIELD_264','FIELD_265','FIELD_266','FIELD_267','FIELD_268','FIELD_269','FIELD_270','FIELD_271','FIELD_272','FIELD_273','FIELD_274','FIELD_275','FIELD_276','FIELD_277','FIELD_278','FIELD_279','FIELD_280','FIELD_281','FIELD_282','FIELD_283','FIELD_284','FIELD_285','FIELD_286','FIELD_287','FIELD_288','FIELD_289','FIELD_290','FIELD_291','FIELD_292','FIELD_293','FIELD_294','FIELD_295','FIELD_296','FIELD_297','FIELD_298','FIELD_299','FIELD_300','FIELD_301','FIELD_302','FIELD_303','FIELD_304','FIELD_305','FIELD_306','FIELD_307','FIELD_308','FIELD_309','FIELD_310','FIELD_311','FIELD_312','FIELD_313','FIELD_314','FIELD_315','FIELD_316','FIELD_317','FIELD_318','FIELD_319','FIELD_320','FIELD_321','FIELD_322','FIELD_323','FIELD_324','FIELD_325','FIELD_326']:
rows += 1
columns = 0
for Factor in row[0:]:
if re.match("[.0-9A-Z]+$", Factor) == None and Factor != '':
# if not Factor.isalnum() and Factor != '' :
columns += 1
print(rows,columns,Factor)

其中,re.match为正则表达式:

re.match的函数原型为:re.match(pattern, string, flags)

第一个参数是正则表达式,这里为"[.0-9A-Z]+$",匹配[]中的任何字符至少1次,如果匹配成功,则返回一个Match,否则返回一个None;

第二个参数表示要匹配的字符串;

第三个参数是标致位,用于控制正则表达式的匹配方式,如:是否区分大小写,多行匹配等等。

Python 如何在csv中定位非数字和字母的符号的更多相关文章

  1. Python 解决写入csv中间隔一行空行问题

    转载解决写入csv中间隔一行空行问题 写入csv: with open(birth_weight_file,'w') as f: writer=csv.writer(f) writer.writero ...

  2. Python的驻留机制(仅对数字,字母,下划线有效)

    Python的驻留机制及为在同一运行空间内,当两变量的值相同,则地址也相同. 举例: a = 'abc' b = 'abc' print(id(a)) print(id(b)) 以上示例为驻留机制有效 ...

  3. python 找出字符串中出现次数最多的字母

    # 请大家找出s=”aabbccddxxxxffff”中 出现次数最多的字母 # 第一种方法,字典方式: s="aabbccddxxxxffff" count ={} for i ...

  4. [C++/Python] 如何在C++中使用一个Python类? (Use Python-defined class in C++)

    最近在做基于OpenCV的车牌识别, 其中需要用到深度学习的一些代码(Python), 所以一开始的时候开发语言选择了Python(祸患之源). 固然现在Python的速度不算太慢, 但你一定要用Py ...

  5. winform中如何在TextBox中只能输入数字(可以带小数点)

    可以采用像web表单验证的方式,利用textbox的TextChanged事件,每当textbox内容变化时,调用正则表达式的方法验证,用一个label在text后面提示输入错误,具体代码如下: pr ...

  6. C# winform如何在textbox中判断输入的是字母还是数字?

    1.用正规式using System.Text.RegularExpressions; string pattern = @"^\d+(\.\d)?$";if(Text1.Text ...

  7. python 如何在 command 中能够找到 其他module

    部分代码如下: __author__ = 'norsd' # coding=utf8 # 上句说明使用utf8编码 try: import os import sys import time #关键语 ...

  8. js判断字符串中是否有数字和字母

    var p = /[0-9]/; var b = p.test(string);//true,说明有数字var p = /[a-z]/i; var b = p.test(string);//true, ...

  9. sql 判断字符串中是否含有数字和字母

    判断是否含有字母 select PATINDEX('%[A-Za-z]%', ‘ads23432’)=0 (如果存在字母,结果<>1) 判断是否含有数字 PATINDEX('%[0-9]% ...

随机推荐

  1. STM32 内存管理实验

    参考原文<STM32F1开发指南> 内存管理简介 内存管理,是指软件运行时对计算机内存资源的分配和使用的技术.最主要的目的是如何高效.快速的分配,并且在适当的时候释放和回收内存资源.内存管 ...

  2. 2.8 补充:shell脚本执行方法

    bash shell 脚本的方法有多种,现在作个小结.假设我们编写好的shell脚本的文件名为hello.sh,文件位置在/data/shell目录中并已有执行权限.   方法一:切换到shell脚本 ...

  3. 学习MongoDB--(5-2):索引(查看索引的使用,管理索引)

    前一篇简单介绍了索引,并给出了基本的索引使用,这一次,我们进一步说一下MongoDB中的索引,包括如何查看查询是否走索引,如何管理索引和地理空间索引等. [使用explain和hint] 前面讲高级查 ...

  4. HDU 1159 LCS最长公共子序列

    #include <cstdio> #include <cstring> using namespace std; ; #define max(a,b) a>b?a:b ...

  5. Codeforces Round #232 (Div. 2) C

    C. On Number of Decompositions into Multipliers time limit per test 1 second memory limit per test 2 ...

  6. codevs1314 寻宝

    题目描述 Description 传说很遥远的藏宝楼顶层藏着诱人的宝藏.小明历尽千辛万苦终于找到传说中的这个藏宝楼,藏宝楼的门口竖着一个木板,上面写有几个大字:寻宝说明书.说明书的内容如下: 藏宝楼共 ...

  7. Switch Game

    Problem Description There are many lamps in a line. All of them are off at first. A series of operat ...

  8. CString、char*与string的区别

    三者的区别 CString 是MFC或者ATL中的实现: string 是C++标准库中的实现: char* 为C编程中最常用的字符串指针,一般以’\0’为结束标志. string和CString均是 ...

  9. HDU2193-AVL-数据结构-AVL

    题目链接:http://acm.hdu.edu.cn/statistic.php? pid=2193&from=126&lang=&order_type=0 好吧.水题一道,原 ...

  10. Java中集合List,Map和Set的差别

    Java中集合List,Map和Set的差别 1.List和Set的父接口是Collection.而Map不是 2.List中的元素是有序的,能够反复的 3.Map是Key-Value映射关系,且Ke ...