Python模块之re

re模块

准备：

flags有很多可选值：

re.I(IGNORECASE)忽略大小写，括号内是完整的写法

re.M(MULTILINE)多行模式，改变^和$的行为

re.S(DOTALL)点可以匹配任意字符，包括换行符

re.L(LOCALE)做本地化识别的匹配，表示特殊字符集 \w, \W, \b, \B, \s, \S 依赖于当前环境，不推荐使用

re.U(UNICODE) 使用\w \W \s \S \d \D使用取决于unicode定义的字符属性。在python3中默认使用该flag

re.X(VERBOSE)冗长模式，该模式下pattern字符串可以是多行的，忽略空白字符，并可以添加注释

1）匹配方法 findall() search() mathc()

(1)findall()

def findall(pattern, string, flags=0):

    """Return a list of all non-overlapping matches in the string.

    If one or more capturing groups are present in the pattern, return

    a list of groups; this will be a list of tuples if the pattern

    has more than one group.

    Empty matches are included in the result."""

    return _compile(pattern, flags).findall(string)

用法：所有的匹配结果都返回在一个列表中，如果没有匹配上就返回一个空列表。

例子：

import re

res = re.findall(r'\w+', r'Jake@tom')

print(res)

结果：

['Jake', 'tom']

(2)search()

def search(pattern, string, flags=0):

    """Scan through string looking for a match to the pattern, returning

    a match object, or None if no match was found."""

    return _compile(pattern, flags).search(string)

用法：返回第一个匹配到的对象，可以调用这个对象的 group()方法返回第一个匹配到的值。没有匹配上返回None。

例子：

import re

res1 = re.search(r'\d+', r'222T333')

print('search', res1)

print(res1.group())

结果：

search <_sre.SRE_Match object; span=(0, 3), match=''>

222

(3)match()

def match(pattern, string, flags=0):

    """Try to apply the pattern at the start of the string, returning

    a match object, or None if no match was found."""

    return _compile(pattern, flags).match(string)

用法：和search用法一样，唯一区别就是只在字符串开始匹配

例子：

import re

res2 = re.match(r'\d+', r'222T333')#与search区别就是只在字符串开始匹配

print('match', res2)

print(res2.group())

结果：

match <_sre.SRE_Match object; span=(0, 3), match=''>

222

2）切割和替换 sub() subn() split()

(1)sub()

def sub(pattern, repl, string, count=0, flags=0):

    """Return the string obtained by replacing the leftmost

    non-overlapping occurrences of the pattern in string by the

    replacement repl.  repl can be either a string or a callable;

    if a string, backslash escapes in it are processed.  If it is

    a callable, it's passed the match object and must return

    a replacement string to be used."""

    return _compile(pattern, flags).sub(repl, string, count)

用法：根据（pattern）正则表达式规则将匹配好的字符串替换为新字符串（repl）,string为目标串，count可以指定替换次数

例子：

import re

res = re.sub(r'\d+', 'SSS',r'222XXX333V3')

print(res)

结果：

SSSXXXSSSVSSS

(2)subn()

def subn(pattern, repl, string, count=0, flags=0):

    """Return a 2-tuple containing (new_string, number).

    new_string is the string obtained by replacing the leftmost

    non-overlapping occurrences of the pattern in the source

    string by the replacement repl.  number is the number of

    substitutions that were made. repl can be either a string or a

    callable; if a string, backslash escapes in it are processed.

    If it is a callable, it's passed the match object and must

    return a replacement string to be used."""

    return _compile(pattern, flags).subn(repl, string, count)

用法：根据（pattern）正则表达式规则将匹配好的字符串替换为新字符串（repl）,string为目标串，count可以指定替换次数.

返回的结果是元组，其中有替换结果和替换次数

例子：

import re

res = re.subn(r'\d+', 'SSS',r'222XXX333V3')

print(res)

结果：

('SSSXXXSSSVSSS', 3)

(3)split()

def split(pattern, string, maxsplit=0, flags=0):

    """Split the source string by the occurrences of the pattern,

    returning a list containing the resulting substrings.  If

    capturing parentheses are used in pattern, then the text of all

    groups in the pattern are also returned as part of the resulting

    list.  If maxsplit is nonzero, at most maxsplit splits occur,

    and the remainder of the string is returned as the final element

    of the list."""

    return _compile(pattern, flags).split(string, maxsplit)

用法：按照正则表达式匹配好的字符串去切割目标字符串，匹配对个结果会先拿第一个结果切割目标串，

切割完后拿第二个结果切割这两个字符串，以此类推。可以指定最大切割次数，返回一个列表。

例子：

import re

res = re.split(r'\d+', r'333FF444FF44')

print(res)

结果：

['', 'FF', 'FF', '']

3）进阶 compile() finditer()

(1)compile()*****时间效率

def compile(pattern, flags=0):

    "Compile a regular expression pattern, returning a pattern object."

    return _compile(pattern, flags)

用法：把正则表达式编译为正则表达式对象

作用：节省时间，只有在多次使用某一个相同的正则表达式的时候，才会帮助我们提高效率。

例子：

import re

res = re.compile(r'\d+')

print(res)

结果：

re.compile('\\d+')

(2)finditer()*****空间效率

def finditer(pattern, string, flags=0):

    """Return an iterator over all non-overlapping matches in the

    string.  For each match, the iterator returns a match object.

    Empty matches are included in the result."""

    return _compile(pattern, flags).finditer(string)

用法：根据正则表达式匹配字符串得到 一个迭代器，迭代器中每个元素都是一个对象，每个

对象都可通过 group()方法获取对应的匹配值。

例子：

import re

res = re.compile(r'\d+')

res = re.finditer(r'\d+', r'sss444ff333f')

print(res)

for r in res:

    print(r, '--------', r.group())

结果：

<callable_iterator object at 0x106f29668>

<_sre.SRE_Match object; span=(3, 6), match=''> -------- 444

<_sre.SRE_Match object; span=(8, 11), match=''> -------- 333

3.正则表式进阶（很重要）

分组与re模块的组合使用

1）分组 () 与 findall() finditer()

import re

#findall会优先显示分组中的内容，如果在第左半边括号后加上?:就会取消分组优先

#(?:正则表达式) 取消优先

#如果有一个分组，那么就将匹配好的元素放到一个列表中，如果分组有两个以上，那么这些元组组成一个元组存到列表中

res = re.findall(r'<(\w+)>', r'<a>我爱你中国</a><h1>亲爱的母亲</h1>')

print(res)

res = re.findall(r'<(?:\w+)>', r'<a>我爱你中国</a><h1>亲爱的母亲</h1>')

print(res)

结果：

['a', 'h1']

['<a>', '<h1>']

#不会优先分组中内容，可以通过group(分组名)来得到分组中的值

import re

res = re.finditer(r'<(?:\w+)>', r'<a>我爱你中国</a><h1>亲爱的母亲</h1>')

for i in res:

    print(i.group())

结果：

<a>

<h1>

2）分组命名  分组与 search()

#（？P<name>正则表达式）表示给分组起名字

#（？P=name）表示使用这个分组，这里匹配到的内容应该和分组中内容完全一致

<1>

import re

#search匹配的是第一次匹配好的值

#得到的结果可以使用结果.group()方法得到

#如果search与分组配合使用给group传参数，第一个分组内容传1的到第一分组内容，以此类推

#groups()函数的到一个所有分组的集合以元组形式返回

res = re.search(r'<(\w+)>(\w+)</(\w+)>', r'<a>hello</a>')

print(res.group())

print(res.group(1))

print(res.group(2))

print(res.group(3))

print(res.groups())

结果：

<a>hello</a>

a

hello

a

('a', 'hello', 'a')

<2>

import re

res = re.search(r'<(?P<name>\w+)>\w+</(?P=name)>', r'<a>hello</a>')

print(res.group('name'))

print(res.group())

res = re.search(r'<(?P<name>\w+)>\w+</(?P=name)>', r'<a>hello</h1><a>hello</a>')

print(res.group('name'))

结果：

a

<a>hello</a>

a

<3>

import re

res = re.search(r'<(?P<tt>\w+)>(?P<cc>\w+)</\w+>', r'<a>hello</h1><a>hello</a>')

print(res.group('tt'))

print(res.group('cc'))

print(res.group())

结果：

a

hello

<a>hello</h1>

3）通过索引使用分组

#\1表示使用第一组，匹配到的内容必须和第一组中的内容完全一致。

import re

#\1表示使用第一组，匹配到的内容必须和第一组中的内容完全一致。

res = re.search(r'<(\w+)>\w+</\1>', r'<a>hello</a>')

print(res.group(1))

print(res.group())

结果：

a

<a>hello</a>

4）分组与 split()

切割后的结果会保留分组内被切割的内容

import re

ret = re.split('(\d+)','Tom18Jake20Json22')

print(ret)

结果：

['Tom', '', 'Jake', '', 'Json', '', '']

总结：

# 在python中使用正则表达式

    # 转义符 : 在正则中的转义符 \ 在python中的转义符

    # re模块

        # findall search match

        # sub subn split

        # compile finditer

    # python中的正则表达式

        # findall 会优先显示分组中的内容,要想取消分组优先,(?:正则表达式)

        # split 遇到分组 会保留分组内被切掉的内容

        # search 如果search中有分组的话,通过group(n)就能够拿到group中的匹配的内容

# 正则表达式进阶

    # 分组命名

        # (?P<name>正则表达式) 表示给分组起名字

        # (?P=name)表示使用这个分组,这里匹配到的内容应该和分组中的内容完全相同

    # 通过索引使用分组

        # \1 表示使用第一组,匹配到的内容必须和第一个组中的内容完全相同

Python模块之re的更多相关文章

使用C/C++写Python模块
最近看开源项目时学习了一下用C/C++写python模块,顺便把学习进行一下总结,废话少说直接开始: 环境:windows.python2.78.VS2010或MingW 1 创建VC工程 (1) 打 ...
Python模块之configpraser
Python模块之configpraser 一. configpraser简介用于处理特定格式的文件,其本质还是利用open来操作文件. 配置文件的格式: 使用"[]"内包含 ...
Python模块之"prettytable"
Python模块之"prettytable" 摘要: Python通过prettytable模块可以将输出内容如表格方式整齐的输出.(对于用Python操作数据库会经常用到) 1. ...
python 学习第五天，python模块
一,Python的模块导入 1,在写python的模块导入之前,先来讲一些Python中的概念性的问题 (1)模块:用来从逻辑上组织Python代码(变量,函数,类,逻辑:实现一个功能),本质是.py ...
windows下安装python模块
如何在windows下安装python模块 1. 官网下载安装包,比如(pip : https://pypi.python.org/pypi/pip#downloads) pip-9.0.1.tar. ...
安装第三方Python模块，增加InfoPi的健壮性
这3个第三方Python模块是可选的,不安装的话InfoPi也可以运行. 但是如果安装了,会增加InfoPi的健壮性. 目录 1.cchardet 自动检测文本编码 2.lxml 用于解析 ...
Python基础篇【第5篇】: Python模块基础(一)
模块简介在计算机程序的开发过程中,随着程序代码越写越多,在一个文件里代码就会越来越长,越来越不容易维护. 为了编写可维护的代码,我们把很多函数分组,分别放到不同的文件里,这样,每个文件包含的代码就 ...
python 模块加载
python 模块加载本文主要介绍python模块加载的过程. module的组成所有的module都是由对象和对象之间的关系组成. type和object python中所有的东西都是对象,分为 ...
pycharm安装python模块
这个工具真的好好,真的很喜欢,它很方便,很漂亮,各种好 pycharm安装python模块:file-setting-搜索project inte OK
Python模块常用的几种安装方式
Python模块安装方法一.方法1: 单文件模块直接把文件拷贝到 $python_dir/Lib 二.方法2: 多文件模块,带setup.py 下载模块包,进行解压,进入模块文件夹,执行:pytho ...

随机推荐

Spring Boot2.0之整合多数据源
一般公司分两个数据库: 一个放共同配置文件, 一个数据库垂直业务数据库垂直拆分和水平拆分: 垂直是根据业务划分具体数据库在一个项目中有多个数据源(不同库jdbc) 无限个的哈~ 根据包名或者注 ...
ubuntu14开发环境配置
1 配置JDK1.8 jdk工具从官网下载,我下载到了~/tool目录下,首先进入用户的bash配置目录,打开配置文件: cd ~ vi .bashrc 编辑.bashrc文件,在适当位置或者文件最后 ...
haproxysocket 参数记录
haproxy的一些指标 pxname 组名 svname 服务器名 qcur 当前队列 qmax 最大队列 scur当前会话用户 smax最大会话用户 slim会话限制 stot会话 ...
jstl <c:url>标签
标签作用是将一个URL地址格式化为一个字符串,并且保存在一个变量当中.它具有URL自动重写功能.value指定的URL可以是当前工程的一个URL地址,也可以是其他web工程的URL.但是这时需要con ...
单页导航菜单视觉设计HTML模板
单页导航菜单视觉设计HTML模板,视觉,企业,html,单页,单页导航菜单视觉设计HTML模板是一款磨砂背景的大气时尚HTML设计网页模板 http://www.huiyi8.com/moban/
BZOJ 4582 [Usaco2016 Open]Diamond Collector：贪心【相差不超过k】
题目链接:http://www.lydsy.com/JudgeOnline/problem.php?id=4582 题意: 给你n个数. 让你将其中的一些数放入两个不同的集合中,并保证同一集合内两两元 ...
清除float浮动三种方式
Float的作用? w3c的官方解释: Content flows down the right side of a left-floated box and down the left side o ...
集训Day3
被疯狂造谣+请家长但生活还得继续 ...今天的题口胡一下吧明天码 PKUSC2018 D1T1 对于x:若x不翻,则x的一半到x的数都不能翻若x翻,则x到2x都得翻剩下随便安排排列组合一下 P ...
POJ3468 A Simple Problem with Integers（数状数组||区间修改的RMQ问题）
You have N integers, A1, A2, ... , AN. You need to deal with two kinds of operations. One type of op ...
IE botton 点击文字下沉
IE点击文字下沉这个应该是浏览器自带的,只要是用button标签应该都是避免不了的. 如果实在接受不了的话,用一个元素比如div.p等块级元素或者是i.b.s.u.span等行内元素.用样式去模拟bu ...

Python模块之re

Python模块之re的更多相关文章

随机推荐

热门专题