Python中正则模块re.compile、re.match及re.search函数用法

import re
help(re.compile)
'''
输出结果为：
Help on function compile in module re:

compile(pattern, flags=0)
    Compile a regular expression pattern, returning a pattern object.
通过help可知：编译一个正则表达式模式，返回一个模式对象。
'''

'''
第二个参数flags是匹配模式，可以使用按位或’|’表示同时生效，也可以在正则表达式字符串中指定。
Pattern对象是不能直接实例化的，只能通过compile方法得到。匹配模式有： 
1).re.I(re.IGNORECASE): 忽略大小写
2).re.M(MULTILINE): 多行模式，改变’^’和’$’的行为
3).re.S(DOTALL): 点任意匹配模式，改变’.’的行为
4).re.L(LOCALE): 使预定字符类 \w \W \b \B \s \S 取决于当前区域设定
5).re.U(UNICODE): 使预定字符类 \w \W \b \B \s \S \d \D 取决于unicode定义的字符属性
6).re.X(VERBOSE): 详细模式。这个模式下正则表达式可以是多行，忽略空白字符，并可以加入注释 
'''

text="JGod is a handsome boy ,but he is a ider"
print re.findall(r'\w*o\w*',text)  #查找有o的单词
#输出结果为：['JGod', 'handsome', 'boy']

#利用compile生成一个规则模式吧，然后利用findall将某一个对象内容进行匹配。，合适则输出符合规则的内容
regex=re.compile(r'\w*o\w*')
print regex.findall(text)
#>>> ['JGod', 'handsome', 'boy']

test1="who you are,what you do,When you get get there? What is time you state there?"
regex1=re.compile(r'\w*wh\w*',re.IGNORECASE)
wh=regex1.findall(test1)
print wh
#>>> ['who', 'what', 'When', 'What']

'''
re正则表达式模块还包括一些有用的操作正则表达式的函数。下面主要介绍match函数以及search函数。
定义： re.match 尝试从字符串的开始匹配一个模式。
原型：
re.match(pattern, string, flags)
第一个参数是正则表达式,如果匹配成功，则返回一个Match，否则返回一个None；
第二个参数表示要匹配的字符串；
第三个参数是标致位，用于控制正则表达式的匹配方式，如：是否区分大小写，多行匹配等等。

函数的返回值为真或者假。
例如：match(‘p’,’python’)返回值为真；match(‘p’,’www.python.org’)返回值为假。

定义：re.search会在给定字符串中寻找第一个匹配给定正则表达式的子字符串。

函数的返回值：如果查找到则返回查找到的值，否则返回为None。

原型：
re.search(pattern, string, flags)

每个参数的含意与re.match一样。
'''
#re.match的例子1
import re
your_love=re.match("wh","What are you doing? who is you mate?",re.I)
if your_love:
    print "you are my angle"
else:
    print "i lose you "
#相当于：
print "*"*100  #便于区分
import re
content="What are you doing? who is your mate?"
regu_cont=re.compile("\w*wh\w*",re.I)
yl=regu_cont.match(content)
if yl:
    print yl.group(0)
else:
    print "what happen?"
解析：首先创造了需要正则表达式匹配的字符串content;
接着利用re.compile()来创建了我们所需要的匹配规则，创建了模式对象regu_cont；
yl用来接收对内容content字符串进行regu_cont正则表达式实现match函数的结果
如果有yl不为空，则使用m.group(index)输出查找到的子字符串

否则（返回值为None） print “what happen?”

match例子2

'''
match如果查找到结果， 将返回一个 MatchObject，你可以查询 MatchObject 关于匹配字符串的相关信息了。MatchObject 实例也有几个方法和属性；最重要的那些如下所示：
group() 返回被 RE 匹配的字符串
start() 返回匹配开始的位置
end() 返回匹配结束的位置
span() 返回一个元组包含匹配 (开始,结束) 的位置
'''
import re
content="What are you doing? who is your mate?"
regu_cont=re.compile("\w*wh\w*",re.I)
yl=regu_cont.match(content)
if yl:
    print yl.group(0)
else:
    print "pass the test"
print yl.group()
print yl.start()
print yl.end()
print yl.span()

执行结果为：
What
What
0
4
(0, 4)


#search()方法与match()方法类似

import re

content='Where are you from? You look so hansome.'

regex=re.compile(r'\w*som\w*')

m=regex.search(content)

if m:

    print m.group(0)

else:

    print "Not found"

#相当于：
import re
m=re.search(r'\w*som\w*','Where are you from? You look so handsome.',re.I)
if m:
    print m.group(0)
else:
    print "not found"

re 模块官方说明文档

正则匹配的时候，第一个字符是 r，表示 raw string 原生字符，意在声明字符串中间的特殊字符不用转义。

比如表示 ‘\n'，可以写 r'\n'，或者不适用原生字符 ‘\n'。

推荐使用 re.match

re.compile() 函数

编译正则表达式模式，返回一个对象。可以把常用的正则表达式编译成正则表达式对象，方便后续调用及提高效率。

re.compile(pattern, flags=0)

pattern 指定编译时的表达式字符串
flags 编译标志位，用来修改正则表达式的匹配方式。支持 re.L|re.M 同时匹配

flags 标志位参数

re.I(re.IGNORECASE)
使匹配对大小写不敏感

re.L(re.LOCAL)
做本地化识别（locale-aware）匹配

re.M(re.MULTILINE)
多行匹配，影响 ^ 和 $

re.S(re.DOTALL)
使 . 匹配包括换行在内的所有字符

re.U(re.UNICODE)
根据Unicode字符集解析字符。这个标志影响 \w, \W, \b, \B.

re.X(re.VERBOSE)
该标志通过给予你更灵活的格式以便你将正则表达式写得更易于理解。

示例：

import re

content = 'Citizen wang , always fall in love with neighbour，WANG'

rr = re.compile(r'wan\w', re.I) # 不区分大小写

print(type(rr))

a = rr.findall(content)

print(type(a))

print(a)

findall 返回的是一个 list 对象

<class '_sre.SRE_Pattern'>
<class 'list'>
['wang', 'WANG']

re.match() 函数

总是从字符串‘开头曲匹配'，并返回匹配的字符串的 match 对象 <class '_sre.SRE_Match'>。

re.match(pattern, string[, flags=0])

pattern 匹配模式，由 re.compile 获得
string 需要匹配的字符串

import re

pattern = re.compile(r'hello')

a = re.match(pattern, 'hello world')

b = re.match(pattern, 'world hello')

c = re.match(pattern, 'hell')

d = re.match(pattern, 'hello ')

if a:

print(a.group())

else:

print('a 失败')

if b:

print(b.group())

else:

print('b 失败')

if c:

print(c.group())

else:

print('c 失败')

if d:

print(d.group())

else:

print('d 失败')

hello
b 失败
c 失败
hello

match 的方法和属性

参考链接

import re

str = 'hello world! hello python'

pattern = re.compile(r'(?P<first>hell\w)(?P<symbol>\s)(?P<last>.*ld!)') # 分组，0 组是整个 hello world!, 1组 hello，2组 ld!

match = re.match(pattern, str)

print('group 0:', match.group(0)) # 匹配 0 组，整个字符串

print('group 1:', match.group(1)) # 匹配第一组，hello

print('group 2:', match.group(2)) # 匹配第二组，空格

print('group 3:', match.group(3)) # 匹配第三组，ld!

print('groups:', match.groups()) # groups 方法，返回一个包含所有分组匹配的元组

print('start 0:', match.start(0), 'end 0:', match.end(0)) # 整个匹配开始和结束的索引值

print('start 1:', match.start(1), 'end 1:', match.end(1)) # 第一组开始和结束的索引值

print('start 2:', match.start(1), 'end 2:', match.end(2)) # 第二组开始和结束的索引值

print('pos 开始于：', match.pos)

print('endpos 结束于：', match.endpos) # string 的长度

print('lastgroup 最后一个被捕获的分组的名字：', match.lastgroup)

print('lastindex 最后一个分组在文本中的索引：', match.lastindex)

print('string 匹配时候使用的文本：', match.string)

print('re 匹配时候使用的 Pattern 对象：', match.re)

print('span 返回分组匹配的 index （start(group),end(group))：', match.span(2))

返回结果：

group 0: hello world!
group 1: hello
group 2:
group 3: world!
groups: ('hello', ' ', 'world!')
start 0: 0 end 0: 12
start 1: 0 end 1: 5
start 2: 0 end 2: 6
pos 开始于： 0
endpos 结束于： 25
lastgroup 最后一个被捕获的分组的名字： last
lastindex 最后一个分组在文本中的索引： 3
string 匹配时候使用的文本： hello world! hello python
re 匹配时候使用的 Pattern 对象： re.compile('(?P<first>hell\\w)(?P<symbol>\\s)(?P<last>.*ld!)')
span 返回分组匹配的 index （start(group),end(group))： (5, 6)

re.search 函数

对整个字符串进行搜索匹配，返回第一个匹配的字符串的 match 对象。

re.search(pattern, string[, flags=0])

pattern 匹配模式，由 re.compile 获得
string 需要匹配的字符串

import re

str = 'say hello world! hello python'

pattern = re.compile(r'(?P<first>hell\w)(?P<symbol>\s)(?P<last>.*ld!)') # 分组，0 组是整个 hello world!, 1组 hello，2组 ld!

search = re.search(pattern, str)

print('group 0:', search.group(0)) # 匹配 0 组，整个字符串

print('group 1:', search.group(1)) # 匹配第一组，hello

print('group 2:', search.group(2)) # 匹配第二组，空格

print('group 3:', search.group(3)) # 匹配第三组，ld!

print('groups:', search.groups()) # groups 方法，返回一个包含所有分组匹配的元组

print('start 0:', search.start(0), 'end 0:', search.end(0)) # 整个匹配开始和结束的索引值

print('start 1:', search.start(1), 'end 1:', search.end(1)) # 第一组开始和结束的索引值

print('start 2:', search.start(1), 'end 2:', search.end(2)) # 第二组开始和结束的索引值

print('pos 开始于：', search.pos)

print('endpos 结束于：', search.endpos) # string 的长度

print('lastgroup 最后一个被捕获的分组的名字：', search.lastgroup)

print('lastindex 最后一个分组在文本中的索引：', search.lastindex)

print('string 匹配时候使用的文本：', search.string)

print('re 匹配时候使用的 Pattern 对象：', search.re)

print('span 返回分组匹配的 index （start(group),end(group))：', search.span(2))

注意 re.search 和 re.match 匹配的 str 的区别

打印结果：

group 0: hello world!
group 1: hello
group 2:
group 3: world!
groups: ('hello', ' ', 'world!')
start 0: 4 end 0: 16
start 1: 4 end 1: 9
start 2: 4 end 2: 10
pos 开始于： 0
endpos 结束于： 29
lastgroup 最后一个被捕获的分组的名字： last
lastindex 最后一个分组在文本中的索引： 3
string 匹配时候使用的文本： say hello world! hello python
re 匹配时候使用的 Pattern 对象： re.compile('(?P<first>hell\\w)(?P<symbol>\\s)(?P<last>.*ld!)')
span 返回分组匹配的 index （start(group),end(group))： (9, 10)

PS：这里再为大家提供2款非常方便的正则表达式工具供大家参考使用：

JavaScript正则表达式在线测试工具：
http://tools.jb51.net/regex/javascript

正则表达式在线生成工具：
http://tools.jb51.net/regex/create_reg

https://www.cnblogs.com/huangdongju/p/7839697.html

Python中正则模块re.compile、re.match及re.search函数用法的更多相关文章

Python3中正则模块re.compile、re.match及re.search函数用法详解
Python3中正则模块re.compile.re.match及re.search函数用法 re模块 re.compile.re.match. re.search 正则匹配的时候,第一个字符是 r,表 ...
使用 Python 中 re 模块对测试用例参数化，进行搜索 search、替换 sub
自动化测试用例,如果一百个接口要在Excel写100个sheet表单,每个接口有10个字段,里面有5个都可能是变化的,需要使用参数化,先试用特定的字符进行站位,在构造参数时在进行替换占位符: 一.用力 ...
正则表达式与Python中re模块的使用
正则表达式与Python中re模块的使用最近做了点爬虫,正则表达式使用的非常多,用Python做的话会用到re模块. 本文总结一下正则表达式与re模块的基础与使用. 另外,给大家介绍一个在线测试正则 ...
第11.18节 Python 中re模块的匹配对象
匹配对象是Python中re模块正则表达式匹配处理的返回结果,用于存放匹配的情况.老猿认为匹配对象更多的应该是与组匹配模式的功能对应的,只是没有使用组匹配模式的正则表达式整体作为组0. 为了说明下面的 ...
正则表达式与python中re模块
一个网站,正则表达式入门的,很好 http://www.jb51.net/tools/zhengze.html 下面这个包含对python中re的介绍,也是很不错的http://www.w3cscho ...
【转】关于python中re模块split方法的使用
注:最近在研究文本处理,需要用到正则切割文本,所以收索到了这篇文章,很有用,谢谢原作者. 原址:http://blog.sciencenet.cn/blog-314114-775285.html 关于 ...
Python中optionParser模块的使用方法[转］
本文以实例形式较为详尽的讲述了Python中optionParser模块的使用方法,对于深入学习Python有很好的借鉴价值.分享给大家供大家参考之用.具体分析如下: 一般来说,Python中有两个内 ...
python中threading模块详解（一）
python中threading模块详解(一) 来源 http://blog.chinaunix.net/uid-27571599-id-3484048.html threading提供了一个比thr ...
Python中的模块介绍和使用
在Python中有一个概念叫做模块(module),这个和C语言中的头文件以及Java中的包很类似,比如在Python中要调用sqrt函数,必须用import关键字引入math这个模块,下面就来了解一 ...

随机推荐

第九篇：使用 AdaBoost 元算法提高分类器性能
前言有人认为 AdaBoost 是最好的监督学习的方式. 某种程度上因为它是元算法,也就是说它会是几种分类器的组合.这就好比对于一个问题能够咨询多个 "专家" 的意见了. 组合的 ...
设置eclipse中python脚本的编码格式
今天在运行python脚本时报如下错误: SyntaxError: Non-ASCII character '\xe5' in file D:\pythonlearn1\src\day01\direc ...
HDU1536 S-Nim
S-Nim Time Limit: 5000/1000 MS (Java/Others) Memory Limit: 65536/32768 K (Java/Others)Total Submi ...
Packet for query is too large (1166 > 1024). You can change this value
转载: MySQL max_allowed_packet 设置过小导致记录写入失败 mysql根据配置文件会限制server接受的数据包大小. 有时候大的插入和更新会受max_allowed_pack ...
python--常用的十进制、16进制、字符串、字节串之间的转换
进行协议解析时,总是会遇到各种各样的数据转换的问题,从二进制到十进制,从字节串到整数等等整数之间的进制转换: 10进制转16进制: hex(16) ==> 0x10 16进制转10进制 ...
Python overall structer
在C/C++/Java中,main是程序执行的起点,Python中,也有类似的运行机制,但方式却截然不同:Python使用缩进对齐组织代码的执行,所有没有缩进的代码(非函数定义和类定义),都会在载入时 ...
flask系列
1.flask基础 2.flask上下文 3.flask源码剖析--请求流程 4.数据库连接池DButils 5.Flask-Session 6.WTForms 7.Flask-SQLAlchemy ...
剑指Offer——数组中出现次数超过一半的数字
题目描述: 数组中有一个数字出现的次数超过数组长度的一半,请找出这个数字.例如输入一个长度为9的数组{1,2,3,2,2,2,5,4,2}.由于数字2在数组中出现了5次,超过数组长度的一半,因此输出2 ...
Aggregated Counting-----hdu5439(2015 长春网络赛找规律)
#include<stdio.h> #include<string.h> #include<iostream> #include<math.h> #in ...
Spring整合JUnit4进行AOP单元测试的时候，报："C:\Program Files\Java\jdk1.8.0_191\bin\java.exe" -ea -Didea.test.cyclic.buffer.size=1048576 "-javaagent:C:\Program Files\JetBrains\IntelliJ IDEA 2018.3\lib\idea_rt.jar=64
错误代码 "C:\Program Files\Java\jdk1.8.0_191\bin\java.exe" -ea -Didea.test.cyclic.buffer.size= ...

Python中正则模块re.compile、re.match及re.search函数用法

Python中正则模块re.compile、re.match及re.search函数用法的更多相关文章

随机推荐

热门专题