Everything has its own regulation by defining its grammar.
ECMAScript regular expressions pattern syntax

The following syntax is used to construct regex objects (or assign) that have selected ECMAScript as its grammar.

A regular expression pattern is formed by a sequence of characters.
Regular expression operations look sequentially for matches between the
characters of the pattern and the characters in the target sequence: In
principle, each character in the pattern is matched against the
corresponding character in the target sequence, one by one. But the
regex syntax allows for special characters and expressions in the
pattern:

Special pattern characters

Special pattern characters are characters (or sequences of characters)
that have a special meaning when they appear in a regular expression
pattern, either to represent a character that is difficult to express in
a string, or to represent a category of characters. Each of these special pattern characters is matched in the target sequence against a single character (unless a quantifier specifies otherwise).

characters description matches
. not newline any character except line terminators (LF, CR, LS, PS).
\t tab (HT) a horizontal tab character (same as \u0009).
\n newline (LF) a newline (line feed) character (same as \u000A).
\v vertical tab (VT) a vertical tab character (same as \u000B).
\f form feed (FF) a form feed character (same as \u000C).
\r carriage return (CR) a carriage return character (same as \u000D).
\cletter control code a control code character whose code unit value is the same as the remainder of dividing the code unit value of letter by 32.
For example: \ca is the same as \u0001, \cb the same as \u0002, and so on...
\xhh ASCII character a character whose code unit value has an hex value equivalent to the two hex digits hh.
For example: \x4c is the same as L, or \x23 the same as #.
\uhhhh unicode character a character whose code unit value has an hex value equivalent to the four hex digits hhhh.
\0 null a null character (same as \u0000).
\int backreference the result of the submatch whose opening parenthesis is the int-th (int shall begin by a digit other than 0). See groups below for more info.
\d digit a decimal digit character (same as [[:digit:]]).
\D not digit any character that is not a decimal digit character (same as [^[:digit]]).
\s whitespace a whitespace character (same as [[:space:]]).
\S not whitespace any character that is not a whitespace character (same as [^[:space:]]).
\w word an alphanumeric character (same as [[:alnum:]]).
\W not word any character that is not an alphanumeric character (same as [^[:alnum:]]).
\character character the character character as it is, without interpreting its special meaning within a regex expression.
Any character can be escaped except those which form any of the special character sequences above.
Needed for: ^ $ \ . * + ? ( ) [ ] { } |
[class] character class the target character is part of the class (see character classes below)
[^class] negated character class the target character is not part of the class (see character classes below)

Notice that, in C++, character and string literals also escape characters using the backslash character (\), and this affects the syntax for constructing regular expressions from such types. For example:

1
2
std::regex e1 ("\\d");  // regular expression: \d -> matches a digit character
std::regex e2 ("\\\\"); // regular expression: \\ -> matches a single backslash (\) character

Quantifiers

Quantifiers follow a character or a special pattern character. They can modify the amount of times that character is repeated in the match:

characters times effects
* 0 or more The preceding atom is matched 0 or more times.
+ 1 or more The preceding atom is matched 1 or more times.
? 0 or 1 The preceding atom is optional (matched either 0 times or once).
{int} int The preceding atom is matched exactly int times.
{int,} int or more The preceding atom is matched int or more times.
{min,max} between min and max The preceding atom is matched at least min times, but not more than max.

By default, all these quantifiers are greedy (i.e., they take as many
characters that meet the condition as possible). This behavior can be
overridden to ungreedy (i.e., take as few characters that meet the condition as possible) by adding a question mark (?) after the quantifier.

For example:
Matching "(a+).*" against "aardvark" succeeds and yields aa as the first submatch.
While matching "(a+?).*" against "aardvark" also succeeds, but yields a as the first submatch.

Groups

Groups allow to apply quantifiers to a sequence of characters (instead of a single character). There are two kinds of groups:

characters description effects
(subpattern) Group Creates a backreference.
(?:subpattern) Passive group Does not create a backreference.

When a group creates a backreference, the characters that represent the subpattern in the target sequence are stored as a submatch.
Each submatch is numbered after the order of appearance of their
opening parenthesis (the first submatch is number 1, the second is
number 2, and so on...).

These submatches can be used in the regular expression itself to specify that the entire subpattern should appear again somewhere else (see \int in the special characters list). They can also be used in the replacement string or retrieved in the match_results object filled by some regex operations.

Assertions

Assertions are conditions that do not consume characters in the target
sequence: they do not describe a character, but a condition that must be
fulfilled before or after a character.

characters description condition for match
^ Beginning of line Either it is the beginning of the target sequence, or follows a line terminator.
$ End of line Either it is the end of the target sequence, or precedes a line terminator.
\b Word boundary The previous character is a word character and the next is a non-word character (or vice-versa).
Note: The beginning and the end of the target sequence are considered here as non-word characters.
\B Not a word boundary The previous and next characters are both word characters or both are non-word characters.
Note: The beginning and the end of the target sequence are considered here as non-word characters.
(?=subpattern) Positive lookahead The characters following the assertion must match subpattern, but no characters are consumed.
(?!subpattern) Negative lookahead The characters following the assertion must not match subpattern, but no characters are consumed.

Alternatives

A pattern can include different alternatives:

character description effects
| Separator Separates two alternative patterns or subpatterns.

A regular expression can contain multiple alternative patterns simply by separating them with the separator operator (|): The regular expression will match if any of the alternatives match, and as soon as one does.

Subpatterns (in groups or assertions) can also use the separator operator to separate different alternatives.

Character classes

A character class defines a category of characters. It is introduced by enclosing its descriptors in square brackets ([ and ]).
The regex object attempts to match the entire character class against a
single character in the target sequence (unless a quantifier specifies
otherwise).
The character class can contain any combination of:

  • Individual characters: Any character specified is considered part of the class (except \, [, ] and -, which have a special meaning under some circumstances, and may need to be escaped to be part of the class).
    For example:
    [abc] matches a, b or c.
    [^xyz] matches any character except x, y and z.
  • Ranges: They can be specified by using the hyphen character (-) between two valid characters.
    For example:
    [a-z] matches any lowercase letter (a, b, c, ... until z).
    [abc1-5] matches either a, b or c, or a digit between 1 and 5.
  • POSIX-like classes: A whole set of predefined classes can be added to a custom character class. There are three kinds:

    class description notes
    [:classname:] character class Uses the regex traits' isctype member with the appropriate type gotten from applying lookup_classname member on classname for the match.
    [.classname.] collating sequence Uses the regex traits' lookup_collatename to interpret classname.
    [=classname=] character equivalents Uses the regex traits' transform_primary of the result of regex_traits::lookup_collatename for classname to check for matches.

    The choice of available classes depend on the regex traits type and on its selected locale. But at least the following character classes shall be recognized by any regex traits type and locale:

    class description equivalent (with regex_traits, default locale)
    [:alnum:] alpha-numerical character isalnum
    [:alpha:] alphabetic character isalpha
    [:blank:] blank character (matches only space and tab)
    [:cntrl:] control character iscntrl
    [:digit:] decimal digit character isdigit
    [:graph:] character with graphical representation isgraph
    [:lower:] lowercase letter islower
    [:print:] printable character isprint
    [:punct:] punctuation mark character ispunct
    [:space:] whitespace character isspace
    [:upper:] uppercase letter isupper
    [:xdigit:] hexadecimal digit character isxdigit
    [:d:] decimal digit character isdigit
    [:w:] word character isalnum
    [:s:] whitespace character isspace
  • Please note that the brackets in the class names are additional to those opening and closing the class definition.
  • For example:
    [[:alpha:]] is a character class that matches any alphanumeric character.
    [abc[:digit:]] is a character class that matches a, b, c, or a digit.
    [^[:space:]] is a character class that matches any character except a whitespace.
  • Escape characters: All escape characters described above can also be used within a character class specification. The only change is with \b, that here is interpreted as a backspace character (\u0008) instead of a word boundary.
    Notice that within a class definition, those characters that have a special meaning in the regular expression (such as *, ., $)
    don't have such a meaning and are interpreted as normal characters (so
    they do not need to be escaped). Instead, within a class definition, the
    hyphen (-) and the brackets ([ and ]) do have a special meaning under some circumstances, in which case they should be escaped with a backslash (\) to be interpreted as normal characters.

Character classes support depend heavily on the regex traits used by the regex object: the regex object calls its traits's isctype member function with the appropriate arguments. For the standard regex_traits object using the default locale, see cctype for a classification of characters.

ECMAScript Regex的更多相关文章

  1. .net正则表达式大全(.net 的 System.Text.RegularExpressions.Regex.Match()方法使用)

    正则表达式的本质是使用一系列特殊字符模式,来表示某一类字符串.正则表达式无疑是处理文本最有力的工具,而.NET的System.dll类库提供的System.Text.RegularExpression ...

  2. C#正则表达式Regex类的用法

    C#正则表达式Regex类的用法 更多2014/2/18 来源:C#学习浏览量:36891 学习标签: 正则表达式 Regex 本文导读:正则表达式的本质是使用一系列特殊字符模式,来表示某一类字符串, ...

  3. C#正则表达式Regex类使用

    作为文本处理的利器——Perl语言对正则表达式的最强大支持起到了重要的作用,正因为如此,许多其他语言在加入正则表达式引擎的时候都会或多或少的兼顾perl风格的正则表达式,开发出相应的引擎.本人使用pe ...

  4. JavaScript学习总结(一)——ECMAScript、BOM、DOM(核心、浏览器对象模型与文档对象模型)

    一.JavaScript简介 JavaScript是一种解释执行的脚本语言,是一种动态类型.弱类型.基于原型的语言,内置支持类型,它遵循ECMAScript标准.它的解释器被称为JavaScript引 ...

  5. 通用 正则表达式 C# (.NET)Regex 总结

    [参考]C#正则表达式Regex类的用法    语法: 1. new System.Text.RegularExpressions.Regex("\\$\\d{1,2}\\}"). ...

  6. ECMAScript正则表达式6个最新特性

    译者按: 还没学好ES6?ECMAScript 2018已经到来啦! 原文:ECMAScript regular expressions are getting better! 作者: Mathias ...

  7. C++11--正则表达式<regex>

    /* 正则表达式:一个指定的模式用来对文本中的字符串提供精确且灵活的匹配 */ #include <regex> using namespace std; int main() { str ...

  8. C#正则表达式Regex类的介绍

    一.在C#中,要使用正则表达式类,请在源文件开头处添加以下语句: using System.Text.RegularExpressions; 二.RegEx类常用的方法 1.静态Match方法 使用静 ...

  9. 正则表达式Regex

    1.概念 正则表达式,又称规则表达式.(英语:Regular Expression,在代码中常简写为regex.regexp或RE),计算机科学的一个概念.正则表通常被用来检索.替换那些符合某个模式( ...

随机推荐

  1. P4724 【模板】三维凸包

    \(\color{#0066ff}{题目描述}\) 给出空间中n个点,求凸包表面积. \(\color{#0066ff}{输入格式}\) 第一行一个整数n,表示点数. 接下来n行,每行三个实数x,y, ...

  2. Leetcode 566. Reshape the Matrix 矩阵变形(数组,模拟,矩阵操作)

    Leetcode 566. Reshape the Matrix 矩阵变形(数组,模拟,矩阵操作) 题目描述 在MATLAB中,reshape是一个非常有用的函数,它可以将矩阵变为另一种形状且保持数据 ...

  3. 解决三星官方移植的内核默认是没有打开支持V4L USB devices

         在linux比较新的kernel,都标配了各类摄像头的驱动支持,不用我们自己移植驱动,只需通过make menuconfig配置内核支持我们所需的摄像头类型即可.以下是在三星官方内核中配置V ...

  4. Qt 学习之路 2(9):资源文件

    Qt 学习之路 2(9):资源文件  豆子  2012年8月31日  Qt 学习之路 2  62条评论 上一章节中我们介绍了如何使用QAction添加动作.其中,我们使用QIcon加载了一张 png ...

  5. appium环境配置和一个例子

    最近觉得appium挺火的,看了一些资料,本来想使用npm在线安装,遇见各种问题,先简单说一下: 在cmd窗口中使用命令:npm install -g appium安装,报无python的error, ...

  6. element el-tree循环遍历树形结构,并动态赋值disabled属性

    凌晨3点,功夫不负有心人,已经累趴,效果终于出来: 贴上代码: <style scoped> .form { width: 50%; } </style> <templa ...

  7. Solr学习笔记(1) —— Solr概述&Solr的安装

    一.概述 使用Solr实现电商网站中商品信息搜索功能,可以根据关键字.分类.价格搜索商品信息,也可以根据价格进行排序. 1.1 实现方法 在一些大型门户网站.电子商务网站等都需要站内搜索功能,使用传统 ...

  8. Js 处理 错误图片...(不用jquery)

    document.addEventListener("error", function (e) { var elem = e.target; if (elem.tagName.to ...

  9. [AH2017/HNOI2017]礼物(FFT)

    [Luogu3723] [DarkBZOJ4827] 题解 首先,有一个结论:两个手环增加非负整数亮度,等于其中一个增加一个整数亮度(可以为负) 设增加亮度为x.求\(\sum_{i=1}^{n}(a ...

  10. Kibana6.x.x——源码发布

    从官方GitHub上克隆下来的源码自己如何发布? 我在Ubuntu系统中进行的开发,安装了yarn. 执行build命令为:$ yarn build 执行release命令为:$ yarn relea ...