python正则模块re

　　python中re中内置匹配、搜索、替换方法见博客---python附录-re.py模块源码（含re官方文档链接）

　　正则的应用是处理一些字符串，phthon的博文python-基础学习篇（二）中提到了字符串类型有一些字符串内置的处理方法，但是需要了解一点内置方法是适用于一些简单字符串的处理，复杂的字符串处理方法还是正则表达式的天下。至于为啥要整一些内置方法，我个人认为对于一些简单应用中的字符串处理，无需使用一个整体的系统的正则知识，同时也是python易入门的体现。

　　python中的正则内置于re模块中，使用正则之前需要导入re模块。

import re

　　有了之前的正则表达式的基础，我们可以写出一些正则表达式（pattern）了，如何使用正则表达式去处理字符串(string)呢？只能通过re模块中内置的几个方法去操作。

　　re模块内置的函数方法

　　re.compile(pattern, flags=0)

　　re.compile()方法可以把一个正则表达式编译成一个正则对象(PatternObj)，返回的正则对象是操作其他处理字符串方法的主体。

pattern_obj = re.compile(pattern)

match_obj = pattern_obj.compile(string)

　　等同于

match_obj = re.match(pattern,string)

　　实际上re.match()处理流程内含re.compile()的过程。match方法源码：

def match(pattern, string, flags=0):

    """Try to apply the pattern at the start of the string, returning

    a Match object, or None if no match was found."""

    return _compile(pattern, flags).match(string)

　　可以看出match方法返回的实际就是正则对象pattern_obj调用match()方法的结果。

　　re.search(pattern, string, flags=0)

　　re.search()方法是搜索整个字符串，找到第一个符合正则规则的字符串部分，返回一个匹配对象(MatchObject)；没有匹配成功，就返回None。

 import re

 pattern = r'the'

 match_obj = re.search(pattern, 'The dog is eating the bone', re.I)

 print(match_obj.group(0))

 print(match_obj)

 # The

 # <re.Match object; span=(0, 3), match='The'>

　　re.match(pattern, string, flags=0)

　　re.match()方法是从字符串开始位置匹配整个字符串，当从字符串开始成功匹配到部分字符内容，返回一个匹配对象(MatchObject)；没有匹配成功，就返回None。

 import re

 pattern = r'the'

 match_obj = re.match(pattern, 'Dog is eating the bone', re.I)

 print(match_obj)

 # None

　　对比

 import re

 pattern = r'the'

 match_obj = re.match(pattern, 'The dog is eating the bone', re.I)

 print(match_obj.group(0))

 print(match_obj)

 # The

 # <re.Match object; span=(0, 3), match='The'>

　　re.search()和re.match()区别对比：位置上，search()方法可以从字符串任意位置匹配部分字符串内容，match()方法必须从字符串开始位置匹配字符串内容，一旦开头匹配不成，则匹配失败；内容上，search()方法是非贪婪匹配，只要找到第一个符合正则规则的部分字符串就返回匹配对象，match()方法则是按照正则规则只匹配字符串开始位置的部分字符串；多行模式下，match()方法依旧只会匹配字符串的开始位置，而search()方法和“^”联合使用则是从多行的每一行开始匹配。

　　re.fullmatch(pattern, string, flags=0)

　　re.fullmatch()相似于re.match()是从字符串开始位置开始匹配，re.match()是匹配字符串部分或者全部，而re.fullmatch()是匹配字符串的全部，当且仅当正则表达式匹配整个字符串内容的时候，返回一个匹配对象MatchObject，否则返回None。

　　re.split(pattern, string, maxsplit=0, flags=0)

　　re.split()表示对字符串string，按照正则表达式pattern匹配内容分隔字符串，其中maxsplit是指最大分隔次数，最大分隔次数应该是小于默认分隔次数的。分隔后的字符串内容组成列表返回。

 import re

 split_list_default = re.split(r'\W+', 'Words, words, words.')

 print(split_list_default)

 # ['Words', 'words', 'words', ''] 正则表达式\W+表示以一个或多个非单词字符对字符串分隔，分隔后组成列表的形式返回，注意列表后空字符串为'.'和之前的words分隔结果

 split_list_max = re.split(r'\W+', 'Words, words, words.', 1)

 print(split_list_max)

 # ['Words', 'words, words.'] 指定分隔次数，字符串分隔会由左至右按照maxsplit最大分隔次数分隔，实际最大分隔次数是小于等于默认分隔次数的

 split_list_couple = re.split(r'(\W+)', 'Words, words, words.')

 print(split_list_couple)

 # ['Words', ', ', 'words', ', ', 'words', '.', ''] 正则表达式中存在分组情况，即捕获型括号，(\W+)会捕获字符串中‘， ’并添加至列表一起显示出来

　　re.findall(pattern, string, flags=0)

　　re.findall()类似于re.search()方法，re.search()是在字符串中搜索到第一个与正则表达式匹配的字符串内容就返回一个匹配对象MatchObject，而re.findall()方法是在字符串中搜索并找到所有与正则表达式匹配的字符串内容，组成一个列表返回，列表中元素顺序是按照正则表达式在字符串中由左至右匹配的返回；未匹配成功，返回一个空列表。

import re

pattern = r'\d{3}'

find = re.findall(pattern, 'include21321exclude13243alert213lib32')

print(find)

# ['213', '132', '213']

　　注意：当re.findall()中的正则表达式存在两个或两个以上分组时，按照分组自左向右的形式匹配，匹配结果按照顺序组成元组，返回列表中元素以元组的形式给出。

import re

pattern = r'(\d{3})(1)'

find = re.findall(pattern, 'include21321exclude13243alert213lib32')

print(find)

# [('132', '1')]

　　re.finditer(pattern, string, flags=0)

　　re.finditer()相似于re.findall()方法，搜索字符串中所有与正则表达式匹配的字符串内容，返回一个迭代器Iterator，迭代器Iterator内保存了所有匹配字符串内容生成的匹配对象MatchObject。即匹配文本封装在匹配对象MatchObject中，多个匹配对象MatchObject保存在一个迭代器Iterator中。

import re

pattern = r'\d{3}'

find = re.finditer(pattern, 'include21321exclude13243alert213lib32')

print(find)

for i in find:

    print(i)

    print(i.group(0))

# <callable_iterator object at 0x00000000028FB0F0>

# <re.Match object; span=(7, 10), match='213'>

#

# <re.Match object; span=(19, 22), match='132'>

#

# <re.Match object; span=(29, 32), match='213'>

#

　　re.sub(pattern, repl, string, count=0, flags=0)

　　re.sub()表示用正则表达式匹配字符串string中的字符串内容，使用repl参数内容替换匹配完成的字符串内容，返回替换后的字符串。参数count指定替换次数，正则表达式匹配字符串是由左至右的，可能匹配多个内容，替换操作也是自左向右替换，如果只想替换左边部分匹配内容可以设置count参数，参数值为非负整数且小于等于最大匹配成功个数；未匹配成功，不做替换，返回原字符串。

import re

pattern = r'\d+'

find_default = re.sub(pattern, ' ', 'include21321exclude13243alert213lib32')

print(find_default)

find_count = re.sub(pattern, ' ', 'include21321exclude13243alert213lib32', 2)

print(find_count)

# include exclude alert lib

# include exclude alert213lib32

　　注意：repl参数内容可以是字符串也可以是函数，如果repl是函数，要求这个函数只能有一个匹配对象MatchObject参数，将匹配成功后生成的匹配对象传入函数处理后拼接到原字符串返回。

import re

def replace_func(match_obj):

    if match_obj.group(0).isdigit():

        return ' '

    else:

        return '-'

pattern = r'\d+'

find_default = re.sub(pattern, replace_func,  'include21321exclude13243alert213lib32')

print(find_default)

# include exclude alert lib

　　re.subn(pattern, repl, string, count=0, flags=0)

　　re.subn()与re.sub()作用相同，只在返回结果有所差别，re.sub()返回是替换后的字符串，而re.subn()返回是一个由替换后的字符串和替换次数组合成的元组。

import re

def replace_func(match_obj):

    if match_obj.group(0).isdigit():

        return ' '

    else:

        return '-'

pattern = r'\d+'

find_default = re.subn(pattern, replace_func,  'include21321exclude13243alert213lib32')

print(find_default)

# ('include exclude alert lib ', 4)

　　re.escape(pattern)

　　转义正则表达式中可以产生特殊含义的字符，主要用于匹配文本字符串中含有正则表达式的情形。

import re

result = re.escape('\d*')

print(result)

# \\d\*

　　re.purge()

　　清除正则表达式缓存

　　参数flags

　　上述方法中含有默认参数flags=0，可以通过函数的调用为flags指定特殊的参数值来指定匹配模式。常用参数值有：

　　re.I(re.IGNORECASE)，不区分大小写模式；

　　re.M(re.MULTILINE)，多行模式；

　　re.S(re.DOTALL)，单行模式；

　　re.X(re.VERBOSE)，注释模式；

　　正则对象（Pattern）

　　正则对象可以使用直接调用上述方法，在re.match()方法中有所描述。由match()方法到subn()方法都是正则对象Pattern的实例方法，正则对象Pattern的实例属性有：

　　Pattern.flags

　　指定或获取匹配模式，如：Pattern.flags = re.I，但是一般不直接操作实例属性，由实例方法操作实例属性，故该属性多用于获取匹配模式。

　　Pattern.groups

　　获取捕获分组的数量。

　　Pattern.pattern

　　获取原始正则表达式

　　匹配对象（MatchObject）

　　匹配对象是对匹配内容的封装。

　　MatchObject.group(num)

　　获取匹配对象中封装的匹配内容，group(0)表示获取全部内容，大于等于1表示获取对应捕获分组中的内容。

import re

pattern = r'(\w+) (\w+)'

match_obj = re.match(pattern, 'Snow Stack')

print(match_obj.group(0))

print(match_obj.group(1))

print(match_obj.group(2))

# Snow Stack

# Snow

# Stack

　　MatchObject.__getitem__(num)

　　作用同MatchObject.group(num)。

import re

pattern = r'(\w+) (\w+)'

match_obj = re.match(pattern, 'Snow Stack')

print(match_obj[0])

print(match_obj[1])

print(match_obj[2])

# Snow Stack

# Snow

# Stack

　　MatchObject.groups()

　　以元组的形式返回所有捕获分组内容，只返回捕获分组中的内容，不包含其他匹配内容。

import re

pattern = r'(\w+) (\w+)'

match_obj = re.match(pattern, 'Snow Stack')

print(match_obj.groups())

# ('Snow', 'Stack')

　　MatchObject.groupdict()

　　返回一个字典，包含了所有的命名子组。key就是组名，value就是捕获分组匹配的内容。

import re

pattern = r'(?P<first_name>\w+) (?P<last_name>\w+)'

match_obj = re.match(pattern, 'Snow Stack')

print(match_obj.groupdict())

# {'first_name': 'Snow', 'last_name': 'Stack'}

认识python正则模块re的更多相关文章

python:正则模块
1,正则表达式正则表达式是用来做字符串的匹配的,正则有他自己的规则,和python没有关系,一种匹配字符串的规则. 2,字符组在同一个位置可能出现的各种字符组成了一个字符组,在正则表达式中用[]表 ...
19 Python 正则模块和正则表达式
什么是模块? 常见的场景:一个模块就是一个包含了python定义和声明的文件,文件名就是模块名字加上.py的后缀. 但其实import加载的模块分为四个通用类别: 1 使用python编写的代码(.p ...
Python——正则模块
1.re模块是用来操作正则表达式 2.正则表达式——用来字符串匹配的 (1)字符组:[字符组] 例如[0123fdsa456*/-] [0-9] 等同于[0123456789] [a-z] 匹配小写 ...
Python正则模块
re模块 import re # match # 从头开始匹配, 只匹配一次,就结束 a = re.match('\d+', '54:58天5488:8451') # 默认就是0个群组 print(a ...
Python全栈开发【re正则模块】
re正则模块本节内容: 正则介绍元字符及元字符集元字符转义符 re模块下的常用方法正则介绍(re) 正则表达式(或 RE)是一种小型的.高度专业化的编程语言. 在Python中,它内嵌在Pyt ...
小白的Python之路 day5 re正则模块
re正则模块一.概述就其本质而言,正则表达式(或 RE)是一种小型的.高度专业化的编程语言,要讲他的具体用法要讲一本书!它内嵌在Python中,并通过 re 模块实现.你可以为想要匹配的相应字符串 ...
Python 正则处理_re模块
正则表达式动机文本处理成为计算机常见工作之一对文本内容搜索,定位,提取是逻辑比较复杂的工作为了快速方便的解决上述问题,产生了正则表达式技术定义文本的高级匹配模式, 提供搜索, 替换, 本质 ...
python 常用模块 time random os模块 sys模块 json & pickle shelve模块 xml模块 configparser hashlib subprocess logging re正则
python 常用模块 time random os模块 sys模块 json & pickle shelve模块 xml模块 configparser hashlib subprocess ...
python——re模块（正则表达）
python——re模块(正则表达) 两个比较不错的正则帖子: http://blog.csdn.net/riba2534/article/details/54288552 http://blog.c ...

随机推荐

sed修改json内容
1.config.json { "whiteHoleUrl": "http://172.16.80.90/whui/serviceManagement/regist/ne ...
POJ 3057 Evacuation (二分匹配)
题意:给定一个图,然后有几个门,每个人要出去,但是每个门每个秒只能出去一个,然后问你最少时间才能全部出去. 析:初一看,应该是像搜索,但是怎么保证每个人出去的时候都不冲突呢,毕竟每个门每次只能出一个人 ...
linux每天一小步---head命令详解
1 命令功能 head命令用来查看文件的前多少行或多少字节的内容(默认显示10行) 2 命令语法 head [选项参数] [文件名] 3 命令参数 -q 显示多个文件的内容时不显示文件 ...
跳转AppStore 评分
-(void)goToAppStore { NSString *str = [NSString stringWithFormat: @"itms-apps://ax.itunes.apple ...
目前主流编译器对C++11特性的支持情况
目前主流编译器对C++11特性的支持情况 1. GCC编译器(从编译器GCC4.8.X的版本完全支持) (1)目前C++11特性,之前成为C++0X特性,从GCC4.3的后续版本中逐步对C++11进行 ...
Ubuntu 14.04 install emacs 24.5
1.前期准备工作 2.安装基础构件工具 3.下载emacs编译需要的依赖库 4.下载emacs24.5编译安装 5.下载并安装我的emacs配置文件 6.配置tmux和zsh 1. 前期准备工作在阿 ...
寻找最大的K个数（上）
这是一道很经典的题目,有太多方法了,今天写了两种方法,分别是快排和堆排序 #include <iostream> using namespace std; #define N 25 //初 ...
Android-操作系统拨打电话广播的处理
Android操作系统的 packages/apps/phone/AndroidManifest.xml源码阅读在之前的博客,Android-隐式意图激活操作系统通话界面,讲解了,阅读Android ...
php萌新|学习|排坑|のmysqli_error()方法的妙用
从开始学习php当现在已经有一个月多.除了每天完成公司布置的日常汇报,也没有耐下性子写一写自己想写的东西.今天就当起个头,坚持一周有个两三片文章或者小总结,也不枉费自己的付出.(我自己都不信,你会信吗 ...
Zero Clipboard js+swf实现的复制功能使用方法
开发中经常会用到复制的功能,在 IE 下实现比较简单.但要想做到跨浏览器比较困难了.本文将介绍一个跨浏览器的库类 Zero Clipboard .它利用 Flash 进行复制,所以只要浏览器装有 Fl ...

认识python正则模块re

python正则模块re

re模块内置的函数方法

re.compile(pattern, flags=0)

re.search(pattern, string, flags=0)

re.match(pattern, string, flags=0)

re.fullmatch(pattern, string, flags=0)

re.split(pattern, string, maxsplit=0, flags=0)

re.findall(pattern, string, flags=0)

re.finditer(pattern, string, flags=0)

re.sub(pattern, repl, string, count=0, flags=0)

re.subn(pattern, repl, string, count=0, flags=0)

re.escape(pattern)

re.purge()

参数flags

正则对象（Pattern）

匹配对象（MatchObject）

认识python正则模块re的更多相关文章

随机推荐

热门专题

　　re模块内置的函数方法

　　re.compile(pattern, flags=0)

　　re.search(pattern, string, flags=0)

　　re.match(pattern, string, flags=0)

　　re.fullmatch(pattern, string, flags=0)

　　re.split(pattern, string, maxsplit=0, flags=0)

　　re.findall(pattern, string, flags=0)

　　re.finditer(pattern, string, flags=0)

　　re.sub(pattern, repl, string, count=0, flags=0)

　　re.subn(pattern, repl, string, count=0, flags=0)

　　re.escape(pattern)

　　re.purge()

　　参数flags

　　正则对象（Pattern）

　　匹配对象（MatchObject）