Introducing Regular Expressions 学习笔记
Introducing Regular Expressions 读书笔记
工具:
regexbuddy:http://download.csdn.net/tag/regexbuddy%E7%A0%B4%E8%A7%A3
在线测试平台:
进阶读物:
资料:
Notepad++ Regex: http://sourceforge.net/apps/mediawiki/notepad-plus/index.php?title=Regular_Expressions
文中 “项 ” 表示 一个字符或者是一个字符组(group)的概念。
chapter 1:
[ ]:字符集匹配
\d:匹配数字0-9,同 "[0-9]"
\D:匹配非数字,包括字母、标点等
. :(dot)通配符,匹配所有字符(除换行符),但当出现在[ ]中时表示字符dot
向后引用 (Backreferences)
( ):成组,capturing group
\1 或 $1:匹配与第一个括号的匹配项内容完全一样的项,不同的正则表达式的实现可能形式不一样,有的只支持\1,有的只支持$1,有的都支持。
例:
(\d)\d\1:匹配202,303,151,不能匹配234.
修饰符
{ }:大括号内为前一项的精确匹配次数,可为{3}(3次),{3,5}(3到5次,即3,4,5次都可以,注意逗号两边不能有空格)或{3,}(大于等于3次)这样的形式。
?:匹配前一项0或1次,尽可能多的匹配,贪心(greedy)
+:匹配前一项1次或多次(内容不必完全一样,只要都符合相同的匹配规则即可,如 "\d+",表示匹配尽可能多的数字),尽可能多的匹配,贪心(greedy)
*:匹配前一项0次或多次(内容不必完全一样),尽可能多的匹配,贪心(greedy)
|:(the vertical bar) indicates alternation, that is, a given choice of alternatives,该符号前面的匹配式匹配失败后就使用后面的匹配式进行匹配
^:出现在开头或 | 之后时表示 行首
$:行尾
\:转义字符,可以将上述修饰符转成普通字符,如字符左括号“\(”
例:
\d{3}-?\d{3}-?\d{4}
(\d{3,4}[.-]?)+
(\d{3}[.-]?){2}\d{4}
^(\(\d{3}\)|^\d{3}[.-]?)?\d{3}[.-]?\d{4}$:匹配(201)971-1975,201-971-1975,(201)971.1975,201.971.1975。
chapter 2
^:match all but these,如[^0-9],表示匹配所有非数字
\w:匹配字母和数字,与 [0-9a-zA-Z] 相同
\W:与\w相反,[^0-9a-zA-Z]
.*:匹配0个或多个任意字符
.+:匹配1个或多个任意字符
dotall mode:开启这个模式后 ‘.’ 将匹配包括换行在内的所有字符
下面是一个快捷字符表:
Table 2-1. Character shorthands
Character Shorthand | Description
\a: Alert
\b :Word boundary,单词边界
[\b]: Backspace character
\B: Non-word boundary,非单词边界
\c :x Control character
\d: Digit character
\D :Non-digit character
\d xxx: Decimal value for a character
\f :Form feed character
\r: Carriage return
\n: Newline character,换行符
pass:[<literal>\o</literal> :Octal value for a character
<replaceable>\xxx</replaceable>]
\s: Space character,空白符
\S :Non-space character
\t :Horizontal tab character,tab符
\v: Vertical tab character
\w: Word character
\W :Non-word character
\0: Nul character
\ xxx: Hexadecimal value for a character
\u xxxx :Unicode value for a character
Table 2-2. Character shorthands for whitespace characters
Character Shorthand | Description
\f :Form feed
\h: Horizontal whitespace
\H: Not horizontal whitespace
\n: Newline
\r: Carriage return
\t :Horizontal tab
\v :Vertical tab (whitespace)
\V: Not vertical whitespace
chapter 3 Boundaries
characters—that is, characters will not be returned in a result. They are also
known as zero-width assertions. A zero-width assertion doesn’t match a character, per
se, but rather a location in a string. Some of these, such as ^ and $, are also called anchors.
多行匹配
很多时候我们可以通过边界符号(^,$,\b,\B)达到我们匹配某些字符串的目的。但是如果字符串有多行呢,这个其实很简单了,只需加个m就指定为多行匹配了。实例:
var str = "first second\nthird fourth\nfifth sixth";
var patt = /(\w+)$/gm
console.log(str.match(patt));
结果:
["second", "fourth", "sixth"]
sixth了,加了m后实际上正则表达式是把\n、\r这些也换行和回车当成边界了,可以这么理解。
var str2 = "first second\nthird fourth\nfifth sixth";
var patt2 = /^(\w+)/gm
console.log(str2.match(patt2 ))
结果:
chapter 4 Alternation, Groups, and Backreferences
You have already seen groups in action. Groups surround text with parentheses to help
perform some operation, such as the following:
• Performing alternation, a choice between two or more optional patterns
• Creating subpatterns
• Capturing a group to later reference with a backreference
• Applying an operation to a grouped pattern, such as a quantifer
• Using non-capturing groups
• Atomic grouping (advanced)
(?d) Unix lines Java
(?i) Case insensitive PCRE, Perl, Java
(?J) Allow duplicate names PCRE*
(?m) Multiline PCRE, Perl, Java
(?s) Single line (dotall) PCRE, Perl, Java
(?u) Unicode case Java
(?U) Default match lazy PCRE
(?x) Ignore whitespace, comments PCRE, Perl, Java
(?-…) Unset or turn off options PCRE
\l$+{one}/' rime.txt
(?<z>0{3})\k<z>
Or this:
(?<z>0{3})\k'z'
Or this:
(?<z>0{3})\g{z}
(?<name>…) A named group
(?name…) Another named group
(?P<name>…) A named group in Python
\k<name> Reference by name in Perl
\k'name' Reference by name in Perl
\g{name} Reference by name in Perl
\k{name} Reference by name in .NET
(?P=name) Reference by name in Python
this way:
(
?:the|The|THE)
?:the)
?:(?i)the)
?i:the)
that does backtracking, this group will turn backtracking off, not for the entire regular
expression but just for that part enclosed in the atomic group. The syntax looks like this:
regex processing is backtracking. The reason why is, as it tries all the possibilities, it
takes time and computing resources. Sometimes it can gobble up a lot of time. When
it gets really bad, it’s called catastrophic backtracking.
chapter 5 Character Classes
[_a-zA-Z \t\n\r] 相同
[0-3[6-9]]
The regex would match 0 through 3 or 6 through 9.
To match a difference (in essence, subtraction):
[a-z&&[^m-r]]
which matches all the letters from a to z, except m through r。
[[:alnum:]] Alphanumeric characters (letters and digits)
[[:alpha:]] Alphabetic characters (letters)
[[:ascii:]] ASCII characters (all 128)
[[:blank:]] Blank characters
[[:ctrl:]] Control characters
[[:graph:]] Graphic characters
[[:lower:]] Lowercase letters
[[:print:]] Printable characters
[[:punct:]] Punctuation characters
[[:space:]] Whitespace characters
[[:upper:]] Uppercase letters
[[:word:]] Word characters
[[:xdigit:]] Hexadecimal digits
chapter 6 Matching Unicode and Other Characters
\uxxxx Unicode (four places)
\xxx Unicode (two places)
\x{xxxx} Unicode (four places)
\x{xx} Unicode (two places)
\000 Octal (base 8)
\cx Control character
\0 Null
\a Bell
\e Escape
[\b] Backspace
chapter 7 Quantifiers
string. It grabs as much as it can, the whole input, trying to make a match. If the first
attempt to match the whole string goes awry, it backs up one character and tries again.
This is called backtracking. It keeps backing up one character at a time until it finds a
match or runs out of characters to try. It also keeps track of what it is doing, so it puts
the most load on resources compared with the next two approaches. It takes a mouthful,
then spits back a little at a time, chewing on what it just ate. You get the idea.
A lazy (sometimes called reluctant) quantifier takes a different tack. It starts at the
beginning of the target, trying to find a match. It looks at the string one character at a
time, trying to find what it is looking for. At last, it will attempt to match the whole
string. To get a quantifier to be lazy, you have to append a question mark (?) to the
regular quantifier. It chews one nibble at a time.
A possessive quantifier grabs the whole target and then tries to find a match, but it
makes only one attempt. It does not do any backtracking. A possessive quantifier appends
a plus sign (+) to the regular quantifier. It doesn’t chew; it just swallows, then
wonders what it just ate. I’ll demonstrate each of these in the pages that follow.
+? :Lazy one or more
*? :Lazy zero or more
{n}? :Lazy n
{n,}? :Lazy n or more
{m,n}?: Lazy m,n
Possessive q
uantifier。
?+: Possessive zero or one (optional)
++ :Possessive one or more
*+ :Possessive zero or more
{n}+: Possessive n
{n,}+ :Possessive n or more
{m,n}+: Possessive m,n
chapter 8 Lookarounds
either in front of or behind a pattern. Lookarounds are also considered zero-width
assertions.
Lookarounds include:
• Positive lookaheads
• Negative lookaheads
• Positive lookbehinds
• Negative lookbehinds
?= marinere):忽略大小写匹配ancyent且该字符串后面的字符要为 marinere。
?!marinere):忽略大小写匹配ancyent且该字符串后面的字符必须不为 marinere。
?<=ancyent) marinere
?<!ancyent) marinere
Introducing Regular Expressions 学习笔记的更多相关文章
- 【Python学习笔记】Coursera课程《Using Python to Access Web Data 》 密歇根大学 Charles Severance——Week2 Regular Expressions课堂笔记
Coursera课程<Using Python to Access Web Data > 密歇根大学 Charles Severance Week2 Regular Expressions ...
- 正则表达式-Regular expression学习笔记
正则表达式 正则表达式(Regular expression)是一种符号表示法,被用来识别文本模式. 最近在学习正则表达式,今天整理一下其中的一些知识点 grep - 打印匹配行 grep 是个很强大 ...
- Regular Expression学习笔记
正则写法 var re = /a/;//简写 /.../里不能为空,因为会误以为是注释: var re = new RegExp('a'); 新建一个RegExp对象:和新建Array对象,Objec ...
- Regular Expressions all in one
Regular Expressions all in one Regular Expressions Cheatsheet https://developer.mozilla.org/en-US/do ...
- 学习笔记之正则表达式 (Regular Expressions)
正则表达式_百度百科 http://baike.baidu.com/link?url=ybgDrN2WQQKN64_gu-diCqdeDqL8LQ-jiQ-ftzzPaNUa9CmgBRDNnyx50 ...
- DirectX 9 UI三种设计学习笔记:文章4章Introducing DirectInput+文章5章Wrapping Direct3D
本文从哈利_创.转载请注明出处.有问题欢迎联系本人! 邮箱:2024958085@qq.com 上一期的地址: DX 9 UI设计学习笔记之二 第4章 Introducin ...
- 正则表达式(Regular expressions)使用笔记
Regular expressions are a powerful language for matching text patterns. This page gives a basic intr ...
- 《Java学习笔记(第8版)》学习指导
<Java学习笔记(第8版)>学习指导 目录 图书简况 学习指导 第一章 Java平台概论 第二章 从JDK到IDE 第三章 基础语法 第四章 认识对象 第五章 对象封装 第六章 继承与多 ...
- Underscore.js 源码学习笔记(下)
上接 Underscore.js 源码学习笔记(上) === 756 行开始 函数部分. var executeBound = function(sourceFunc, boundFunc, cont ...
随机推荐
- Web.config中rewite 节点引起的500.19错误
刚刚接手一个外包的小项目,客户给了发布后的网站文件和数据库,想在本地搭建一套环境先运行下看看网站原有的效果.数据库还原什么都弄好了,数据库字符串也配置好,部署在本地IIS里面,访问了下,结果看到的是5 ...
- uva 10382 Watering Grass_贪心
题意:给你个矩形n*m,再给你n个圆的圆心坐标和半径,问最用最少用几个圆把这个矩形覆盖 思路:直接想发现这问题不容易,后来发现可以把圆看做区间(能把矩形面积覆盖),然后这个问题就容易解决了 #incl ...
- CentOS7 lamp安装 centoOS6 lamp
快速lamp安装 How To Install Linux, Apache, MySQL, PHP (LAMP) stack On CentOS 7 Introduction A "LAMP ...
- 全国计算机等级考试二级教程-C语言程序设计_第5章_循环结构
for循环结构的嵌套 外层循环每循环一次,内层循环会完整循环一次. 外层循环是竖. 内层循环是横. for, do...while, while的选择: 如果有固定次数,如阶乘! ,判断素数,用 fo ...
- spring boot + velocity中文乱码解决方式
在application.properties文件中,加入如下配置: spring.velocity.properties.input.encoding=UTF-8spring.velocity.pr ...
- android: ListView历次优化
第一版: ListView一屏显示多少对象其内部就创建多少View对象.滑动时退出的缓存对象留给滑进去时调用getView传的convertView.因为如果每次都findViewById查找创建视图 ...
- 利用Oracle数据库的UTL_SMTP发送HTML 邮件
Ok, that looks hard, but if you use this procedure I wrote, its really quite easy, it does all of th ...
- Git 和 SVN之间的五个基本区别
GIT不仅仅是个版本控制系统,它也是个内容管理系统(CMS),工作管理系统等.如果你是一个具有使用SVN背景的人,你需要做一定的思想转换,来适应GIT提供的一些概念和特征.所以,这篇文章的主要目的就是 ...
- 一次非典型的SQL报错
昨天调试一个表值函数,结果出现了这个错误. mplicit conversion of varchar value to varchar cannot be performed because the ...
- jQuery 的.data()方法
jQuery文档对.data()方法的描述: As of jQuery 1.4.3 HTML 5 data- attributes will be automatically pulled in to ...