Jul_31 PYTHON REGULAR EXPRESSIONS

1.Special Symbols and Characters

1.1 single regex 1

.　　,Match any character(except \n)

^　　,Match start of string

$　　,Match end of string

*　　,Match 0 or more occurrences preceding regex

+　　,Match 1 or more occurrences preceding regex

?　　,Match 0 or 1 occurrence preceding regex

{N}　　,Match N occurrences preceding regex

{M,N}　　,Match from M to N occurrences preceding regex

[...]　　,Match any single character from character class

[..x-y..]　　,Match any single character in the range from x to y ;["-a],In an ASCII system,all characters that fall between '"' and "a",that is ,between ordinals 34 and 97。

[^...]　　,Do not match any character from character class ,including any ranges ,if present

(*|+|?|{})?　　,Apply "non-greedy" versiongs of above occurrence/repetition symbols；默认情况下* + ? {}都是贪婪模式，在其后加上'?'就成了非贪婪模式。

(...)　　,Match enclosed regex and save as subgroup .

1.2 single regex 2

\d　　,Match any decimal digit ,same as [0-9](\D is inverse of \d:do not match any numeric digit)

\w　　,Match any alphanumeric character,same as [A-Za-z0-9](\W is inverse of \w)

\s　　,Match any whitespace character,same as [\n\t\r\v\f](\S is inverse of \s)

\b　　,Match any word boundary(\B is inverse of \b)

\N　　,Match saved subgroup N(see (...) above) ;exam:print(\1,\3,\16)

\c　　,transferred meaning ,without its special meaning;exam:\.,\\,\*

\A(\Z)　　,Match start (end) fo string (also see ^ and $ above)

1.3 complex regex

(?=...)　　,前向肯定断言。如果当前包含的正则表达式（这里以 ... 表示）在当前位置成功匹配，则代表成功，否则失败。一旦该部分正则表达式被匹配引擎尝试过，就不会继续进行匹配了；剩下的模式在此断言开始的地方继续尝试。举例：love(?=FishC) 只匹配后边紧跟着 FishC的字符串 love。

(?!...)　　,前向否定断言。这跟前向肯定断言相反（不匹配则表示成功，匹配表示失败）。举例：FishC(?!\.com)只匹配后边不是 .com& 的字符串 Fish。

(?<=...)　　,后向肯定断言。跟前向肯定断言一样，只是方向相反。举例：(?<=love)FishC 只匹配前边紧跟着 love 的字符串 FishC。

(?<!...)　　,后向否定断言。跟前向否定断言一样，只是方向相反。举例：(?<!FishC)\.com 只匹配前边不是 FishC的字符串 .com。

(?:)　　,该子组匹配的字符串无法从后面获取。

(?(id/name)yes-pattern|no-pattern)　　,1. 如果子组的序号或名字存在的话，则尝试 yes-pattern 匹配模式；否则尝试 no-pattern 匹配模式;

　　　　　　　　　　　　　　　　　2. no-pattern 是可选的

　　　　　　　　　　　　　　　　举例：(<)?(\w+@\w+(?:\.\w+)+)(?(1)>|$) 是一个匹配邮件格式的正则表达式，可以匹配 <user@fishc.com>; 和 'user@fishc.com'，但是不会匹配 '<user@fishc.com' 或 'user@fishc.com>'

1.4 匹配邮箱地址举例

import re

data = 'z843248880@163.com'
data1 = '<z843248880@163.com>'
data2 = '<z843248880@163.com'
data3 = 'z843248880@163.com>'
p1 = '(<)?(\w+@\w+(?:\.\w+)+)(?(1)>|$)'
p2 = '\w+@\w+\.\w+'
p3 = '(<)?\w+@\w+\.\w+(?(1)>|$)'
m1 = re.match(p3, data3)
print(m1.group())
PS:p1里的(?:\.\w+)代表这里的"\.\w+"匹配的字符串不会被后面获取；p1里的"(?(1)>|$)"表示，如果前面有“<"，则此处匹配">"，如果前面没有"<"，则此处匹配结束符”$“，"(1)"代表的前面的第一个括号里的字符串，也就是"(<)"；p1和p3的作用一样；p2不能排除仅有"<"或仅有">"的情况。

1.5 The re Modules:Core Functons and Methods

match(pattern,string,flags=0)　　,Attempt to match pattern to string with optional flags;return match object on success,None on failure;it is start of the string to match.

search(pattern,string,flags=0)　　,Search for first occurrence of pattern within string with optional flags;return match object on success,None on failure;it is start of the string to 　　　　　　　　　　　　　　　　match.

findall(pattern,string[,flags=0])　　,Look for all occurrences of pattern in string;return a list of matches.

finditer(pattern,string[,flags=0])　　,Same as findall(),except returns an iterator instead of a list;for each match,the iterator returns a match object.

split(pattern,string,max=0)　　,Split string into a list according to regex pattern delimiter and return list of successful matches,aplitting at most max times(split all occurrences is the 　　　　　　　　　　　　　　default)

1.6 the usage of "?i" and "?m"

>>> import re
>>> re.findall(r'(?i)yes','yes Yes YES')
['yes', 'Yes', 'YES']
>>> re.findall(r'(?i)th\w+','The quickest way is through to this tunnel.')
['The', 'through', 'this']
>>> re.findall(r'(?im)(^th[\w ]+)',''')
... this line is the first,
... another line,
... that line,it's the best.
... ''')
['this line is the first', 'that line']
>>> re.findall(r'(?i)(^th[\w ]+)','''
... this line is the first,
... another line,
... that line ,it's the best.
... ''')
[]
>>> re.findall(r'(?i)(^th[\w \n,]+)','''
... this line is th,
... anonjkl line,
... that line,it the best.
... ''')
[]

By using "multiline" we can perform the search across multiple lines of the target string rather than treating the entire string as a single entity.

1.7 the usage of spilt

re.split(r'\s\s+',eachline)　　,at least two whitespace.

re.split(r'\s\s+|\t',eachline.rstrip())　　,at least two whitespace or one tablekey;rstrip(),delete the '\n'.

1.8 one example

from random import randrange,choice
from string import ascii_lowercase as lc
from sys import maxsize
from time import ctime

tlds = ('com','org','net','gov','edu')

for i in range(randrange(5,11)):
　　dtint= randrange(1469880872)
　　dstr = ctime(dtint)
　　llen = randrange(4,8)
　　login = ''.join(choice(lc) for j in range(llen))
　　dlen = randrange(llen,13)
　　dom = ''.join(choice(lc) for j in range(dlen))
　　print('%s::%s@%s.%s::%d-%d-%d' % (dstr,login,dom,choice(tlds),dtint,llen,dlen))

result:

Sat Nov 7 01:09:06 1998::hbtua@yzhnjyjanwuq.gov::910372146-5-12
Sat Oct 17 09:27:56 2015::djbljsf@uidicjppd.gov::1445045276-7-9
Sun Nov 18 06:10:07 1979::fkobvlf@zlnlyjej.org::311724607-7-8
Wed Jul 23 17:23:03 1986::hovwgi@wiidgvnng.net::522490983-6-9
Tue Feb 24 02:15:27 1998::xnuab@sgahgahv.gov::888257727-5-8
Thu Jun 1 14:20:55 1989::rdwqhu@xzazufffut.net::612681655-6-10
Mon Mar 6 14:36:59 1978::qabkezi@sehnxqcuxexf.net::258014219-7-12
Sun Apr 11 15:01:56 1982::agzp@sygikhagdasq.gov::387356516-4-12

1.9 Matching a string

import re
data = 'Wed Jul 22 08:42:15 2015::qaolc@ombddhysxuv.com::1437525736-347-28'
#pat_old = '^Mon|^Tue|^Wed|^Thu|^Fri|^Sta|^Sun'
pat = '^(Mon|Tue|Wed|Thu|Fri|Sta|Sun)'
m = re.match(pat, data)
print(type(m))
print(m.group(0))

pat2 = '^(\w{3})'
m2 = re.match(pat2, data)
print(type(m2))
print(m2.group(1))

pa3 = '.+(\d+-\d+-\d+)'
m3 = re.search(pa3, data)
print(type(m3))
print(m3.group())
m4 = re.match(pa3, data)
print(m4.group(1))

pa4 = '.+?(\d+-\d+-\d+)'
m5 = re.match(pa4, data)
print(m5.group(1))

pa5 = '.+::(\d+-\d+-\d+)'
m6 = re.match(pa5, data)
print(m6.group(1))

result:

<class '_sre.SRE_Match' at 0x89df00>
Wed
<class '_sre.SRE_Match' at 0x89df00>
Wed
<class '_sre.SRE_Match' at 0x89df00>
Wed Jul 22 08:42:15 2015::qaolc@ombddhysxuv.com::1437525736-347-28
6-347-28　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　//greedy

1437525736-347-28　　　　　　　　　　　　　　　　　　　　　　　　　　　　 //because the '?' behind of '.+',so none-greedy;(see above in 1.1)

1437525736-347-28

1.10 greedy and no-greedy

'.+' is greedy; '.+?' is not greedy.

2.用正则取出字母和数字，并将字母和数字分别输出

import re

str = 'python1班'

print(re.search(r'(\w+)(\d)', str).group(0))  #取全部匹配的

print(re.search(r'(\w+)(\d)', str).group(1))  #取第一个括号匹配的

print(re.search(r'(\w+)(\d)', str).group(2))  #取第二个括号匹配的

结果：

python1

python

1

Jul_31 PYTHON REGULAR EXPRESSIONS的更多相关文章

[Python] Regular Expressions
1. regular expression Regular expression is a special sequence of characters that helps you match or ...
PCRE Perl Compatible Regular Expressions Learning
catalog . PCRE Introduction . pcre2api . pcre2jit . PCRE Programing 1. PCRE Introduction The PCRE li ...
正则表达式（Regular expressions）使用笔记
Regular expressions are a powerful language for matching text patterns. This page gives a basic intr ...
【Python学习笔记】Coursera课程《Using Python to Access Web Data 》密歇根大学 Charles Severance——Week2 Regular Expressions课堂笔记
Coursera课程<Using Python to Access Web Data > 密歇根大学 Charles Severance Week2 Regular Expressions ...
Python re module (regular expressions)
regular expressions (RE) 简介 re模块是python中处理正在表达式的一个模块 r"""Support for regular expressi ...
Python之Regular Expressions（正则表达式）
在编写处理字符串的程序或网页时,经常会有查找符合某些复杂规则的字符串的需要.正则表达式就是用于描述这些规则的工具.换句话说,正则表达式就是记录文本规则的代码. 很可能你使用过Windows/Dos下用 ...
Regular Expressions --正则表达式官方教程
http://docs.oracle.com/javase/tutorial/essential/regex/index.html This lesson explains how to use th ...
Introducing Regular Expressions 学习笔记
Introducing Regular Expressions 读书笔记工具: regexbuddy:http://download.csdn.net/tag/regexbuddy%E7%A0%B4 ...
8 Regular Expressions You Should Know
Regular expressions are a language of their own. When you learn a new programming language, they're ...

随机推荐

Android 使用dagger2进行依赖注入(基础篇）
0. 前言 Dagger2是首个使用生成代码实现完整依赖注入的框架,极大减少了使用者的编码负担,本文主要介绍如何使用dagger2进行依赖注入.如果你不还不了解依赖注入,请看这一篇. 1. 简单的依赖 ...
excel中的单位换算函数convert()
有时,我们在处理数据的时候,需要进行单位换算,比如“7小时24分”换算成小时,可以直接除以或乘以相应的进制来计算,但是在excel中,有一个convert()函数更加方便: 此函数属于工程函数,平时可 ...
JMeter基础知识
JMeter介绍 JMeter是开源的性能测试工具和接口测试工具,工作原理和Loadrunner一样:作为浏览器和WebServer之间的网关,捕获Browser请求和WebServer响应,然后通过 ...
cf(#div1 A. Dreamoon and Sums)(数论)
A. Dreamoon and Sums time limit per test 1.5 seconds memory limit per test 256 megabytes input stand ...
IO流总结一
字符流: FileReader FileWriter BufferedReader BufferedWriter readLine(); 字节流: FileInputReader FileOutput ...
EasyUI TreeGrid
数据格式1: { , "rows": [ { "id": 1, "name": "All Tasks", "b ...
多条查询sql语句返回多表数据集
+ + "';SELECT ProductID,ProductTitle,ProductName,SalePrice,ListingPrice,MainPicture,SaledItemCo ...
Neo4j图数据库简介和底层原理
现实中很多数据都是用图来表达的,比如社交网络中人与人的关系.地图数据.或是基因信息等等.RDBMS并不适合表达这类数据,而且由于海量数据的存在,让其显得捉襟见肘.NoSQL数据库的兴起,很好地解决了海 ...
C#入门篇6-11：字符串操作查找与替换
#region 查找与替换 public class C4 { //查找 public static void StrFind() { //目标字符串 string str1 = "~awe ...
useradd/du/df/passwd/usermod命令
一.useradd命令 useradd命令-M -u -s -g 常用 -c:加上备注文字,备注文字保存在passwd的备注栏中. -d:指定用户登入时的启始目录. -D:变更预设值.(修改默认配置 ...

Jul_31 PYTHON REGULAR EXPRESSIONS

Jul_31 PYTHON REGULAR EXPRESSIONS的更多相关文章

随机推荐

热门专题