跟正则表达式相关的类有：Pattern、Matcher和String。今天我们就开始Java中正则表达式的学习。世间最好的默契，并非有人懂你的言外之意，而是有人懂你的欲言又止。

Pattern和Matcher的理解

一、正则表达式的使用方法

一般推荐使用的方式如下：

Pattern pattern = Pattern.compile("^[^abc]h$");

Matcher matcher = pattern.matcher("hh");

boolean isMatch = matcher.matches();

另外一种不能复用Pattern的方式如下：

boolean b = Pattern.matches("a*b", "aaaaab");

二、Pattern的一些方法介绍

Pattern类的结构：

public final class Pattern extends Object implements Serializable

构造方法：

static Pattern compile(String regex)

static Pattern compile(String regex, int flags)

内容的实质是调用：

public static Pattern compile(String regex) {

    return new Pattern(regex, 0);

}

三、我们通过一些小的例子来说明Pattern的一些方法，后续我们对一些重要的方法做特殊说明：

public void pattern3() {

    String regex = "abc";

    Pattern pattern = Pattern.compile(regex);

    System.out.println("flag: " + pattern.flags());

    System.out.println("pattern: " + pattern.pattern());

    System.out.println("quote: " + Pattern.quote(regex));

    System.out.println("toString: " + pattern.toString());

    System.out.println("-------------------------------------------------");

    String[] strings = pattern.split("helloabcdabcefgabcgoup");

    for (String string : strings) {

        System.out.println("string: " + string);

    }

    System.out.println("-------------------------------------------------");

    String[] stringsLimit = pattern.split("helloabcdabcefgabcgoup", 2);

    for (String string : stringsLimit) {

        System.out.println("string: " + string);

    }

}

得到的运行结果如下：

flag:

pattern: abc

quote: \Qabc\E

toString: abc

-------------------------------------------------

string: hello

string: d

string: efg

string: goup

-------------------------------------------------

string: hello

string: dabcefgabcgoup

Pattern类的toString()方法和pattern()方法都是返回pattern，也就是Pattern.compile的第一个参数。

Matcher类的说明

这里Matcher比较重要，我们重点说明，当我们调用以下方法之后，有三种方法可以进行匹配：

Pattern pattern = Pattern.compile("^[^abc]h$");

Matcher matcher = pattern.matcher("hh");

The matches method attempts to match the entire input sequence against the pattern.
The lookingAt method attempts to match the input sequence, starting at the beginning, against the pattern.
The find method scans the input sequence looking for the next subsequence that matches the pattern.

我们通过实例来说明它们之间的区别：

String regex = "abc";

String input = "abcabcabc";

Pattern pattern = Pattern.compile(regex);

Matcher matcher = pattern.matcher(input);

matches方法：没有任何输出，因为abc与abcabcabc不是全部匹配的

while (matcher.matches()) {

    System.out.println(matcher.start() + ", end: " + matcher.end() + "group: " + matcher.group());

}

lookingAt方法：控制台打印出0, end: 3group: abc，lookingAt方法永远为真。

while (matcher.lookingAt()) {

    System.out.println(matcher.start() + ", end: " + matcher.end() + "group: " + matcher.group());

}

find方法：返回true之后，再次调用是接着匹配的位置向后匹配。

while (matcher.find()) {

    System.out.println(matcher.start() + ", end: " + matcher.end() + "group: " + matcher.group());

}

打印结果如下：

0, end: 3group: abc

3, end: 6group: abc

6, end: 9group: abc

正则表达式的程序

一、Main方法如下，注意for循环中的是i<=matcher.groupCount()

group是用()划分的，可以根据组的编号来引用某个组。组号为0表示整个表达式，组号1表示被第一对括号括起来的组。依次类推。如：a(bc(d)e)f，就有三个组：组0是abcdef，组1是bcde，组2是d

public static void main(String[] args) {

    String regex = "a(b(c))d";

    String input = "abcdebcabcedbcfcdcfdcd";

    Pattern pattern = Pattern.compile(regex);

    Matcher matcher = pattern.matcher(input);

    matcher.region(0, 8);

    while(matcher.find()) {

        for(int i = 0; i <= matcher.groupCount(); i ++) {

            System.out.println("i = " + i + ", start: " + matcher.start(i) + ", end: " + matcher.end(i) + ", group: " + matcher.group(i));

        }

    }

    System.out.println("------------------------------------------------------");

    matcher.reset("abcd");

    if(matcher.matches()) {

        for(int i = 0; i <= matcher.groupCount(); i ++) {

            System.out.println("start: " + matcher.start(i) + ", end: " + matcher.end(i) + ", group: " + matcher.group(i));

        }

    }

    System.out.println("------------------------------------------------------");

    matcher.reset("abcdabcd");

    if (matcher.lookingAt()) {

        for(int i = 0; i <= matcher.groupCount(); i ++) {

            System.out.println("start: " + matcher.start(i) + ", end: " + matcher.end(i) + ", group: " + matcher.group(i));

        }

    }

}

得到运行结果如下：

i = , start: , end: , group: abcd

i = , start: , end: , group: bc

i = , start: , end: , group: c

------------------------------------------------------

start: , end: , group: abcd

start: , end: , group: bc

start: , end: , group: c

------------------------------------------------------

start: , end: , group: abcd

start: , end: , group: bc

start: , end: , group: c

二、Pattern的split()方法

@Test

public void pattern3() {

    String regex = "abc";

    Pattern pattern = Pattern.compile(regex);

    String[] strings = pattern.split("helloabcdabcefgabcgoup");

    for (String string : strings) {

        System.out.println("string: " + string);

    }

    System.out.println("-------------------------------------------------");

    String[] stringsLimit = pattern.split("helloabcdabcefgabcgoup", 2);

    for (String string : stringsLimit) {

        System.out.println("string: " + string);

    }

}

运行结果如下：

string: hello

string: d

string: efg

string: goup

-------------------------------------------------

string: hello

string: dabcefgabcgoup

它的原理：

public String[] split(CharSequence input, int limit) {

    int index = 0;

    boolean matchLimited = limit > 0;

    ArrayList<String> matchList = new ArrayList<>();

    Matcher m = matcher(input);

    // Add segments before each match found

    while(m.find()) {

        if (!matchLimited || matchList.size() < limit - 1) {

            String match = input.subSequence(index, m.start()).toString();

            matchList.add(match);

            index = m.end();

        } else if (matchList.size() == limit - 1) { // last one

            String match = input.subSequence(index,

                                             input.length()).toString();

            matchList.add(match);

            index = m.end();

        }

    }

    // If no match was found, return this

    if (index == 0)

        return new String[] {input.toString()};

    // Add remaining segment

    if (!matchLimited || matchList.size() < limit)

        matchList.add(input.subSequence(index, input.length()).toString());

    // Construct result

    int resultSize = matchList.size();

    if (limit == 0)

        while (resultSize > 0 && matchList.get(resultSize-1).equals(""))

            resultSize--;

    String[] result = new String[resultSize];

    return matchList.subList(0, resultSize).toArray(result);

}

三、Pattern的标记flag

如果是默认的，打印结果：huhx

@Test

public void pattern4() {

    String regex = "huhx$";

    String input = "My name is huhx\n how are you, HUHX\n I love you, Huhx\n huhx";

    Pattern pattern = Pattern.compile(regex);

    Matcher matcher = pattern.matcher(input);

    while (matcher.find()) {

        System.out.println(matcher.group());

    }

}

如果加上多行和大小写忽略：

Pattern pattern = Pattern.compile(regex, Pattern.CASE_INSENSITIVE | Pattern.MULTILINE);

则打印结果：

huhx

HUHX

Huhx

huhx

四、String类的替换方法

@Test

public void pattern6() {

    String regex = "huhx$";

    String input = "My name is huhx$ and I love you. huhx";

    String inputReplace = input.replace(regex, "Linux");

    String inputReplaceFirst = input.replaceFirst(regex, "Linux");

    String inputReplaceAll = input.replaceAll(regex, "Linux");

    System.out.println("inputReplace: " + inputReplace);

    System.out.println("inputReplaceFirst: " + inputReplaceFirst);

    System.out.println("inputReplaceAll: " + inputReplaceAll);

}

打印结果如下：replace方法是字符串替换，应用的是Pattern.LITERAL标志位。replaceFirst和replaceAll方法应用了默认正则表达式。

inputReplace: My name is Linux and I love you. huhx

inputReplaceFirst: My name is huhx$ and I love you. Linux

inputReplaceAll: My name is huhx$ and I love you. Linux

五、appendReplacement方法

@Test

public void pattern7() {

    String regex = "huhx";

    String input = "My name is huhx and I love you. huhx";

    Pattern p = Pattern.compile(regex);

    Matcher m = p.matcher(input);

    StringBuffer sb = new StringBuffer();

    while (m.find()) {

        m.appendReplacement(sb, "Linux");

    }

    m.appendTail(sb);

    System.out.println(sb.toString());

}

打印结果如下：

My name is Linux and I love you. Linux

正则表达式的方法

一、Pattern.compile(String regex)的代码

private Pattern(String p, int f) {

    pattern = p;

    flags = f;

    // to use UNICODE_CASE if UNICODE_CHARACTER_CLASS present

    if ((flags & UNICODE_CHARACTER_CLASS) != 0)

        flags |= UNICODE_CASE;

    // Reset group index count

    capturingGroupCount = 1;

    localCount = 0;

    if (pattern.length() > 0) {

        compile();

    } else {

        root = new Start(lastAccept);

        matchRoot = lastAccept;

    }

}

capturingGroupCount是跟group有关的，默认为1。重点方法在于compile()，初始化Node类matchRoot。这是主要的匹配的执行类。

二、Pattern的matcher方法

public Matcher matcher(CharSequence input) {

    if (!compiled) {

        synchronized(this) {

            if (!compiled)

                compile();

        }

    }

    Matcher m = new Matcher(this, input);

    return m;

}

Matcher(this, input)的代码如下：groups为int数组，存储group匹配的start和end。后面group(int i)方法返回的匹配结果就是根据group的数值和text得到的。

Matcher(Pattern parent, CharSequence text) {

    this.parentPattern = parent;

    this.text = text;

    // Allocate state storage

    int parentGroupCount = Math.max(parent.capturingGroupCount, 10);

    groups = new int[parentGroupCount * 2];

    locals = new int[parent.localCount];

    // Put fields into initial states

    reset();

}

三、Matcher的find方法

public boolean find() {

    int nextSearchIndex = last;

    if (nextSearchIndex == first)

        nextSearchIndex++;

    // If next search starts before region, start it at region

    if (nextSearchIndex < from)

        nextSearchIndex = from;

    // If next search starts beyond region then it fails

    if (nextSearchIndex > to) {

        for (int i = 0; i < groups.length; i++)

            groups[i] = -1;

        return false;

    }

    return search(nextSearchIndex);

}

search方法：如果没有匹配结果，那么first为-1，下一次调用find方法从头开始匹配。

boolean search(int from) {

    this.hitEnd = false;

    this.requireEnd = false;

    from        = from < 0 ? 0 : from;

    this.first  = from;

    this.oldLast = oldLast < 0 ? from : oldLast;

    for (int i = 0; i < groups.length; i++)

        groups[i] = -1;

    acceptMode = NOANCHOR;

    boolean result = parentPattern.root.match(this, from, text);

    if (!result)

        this.first = -1;

    this.oldLast = this.last;

    return result;

}

四、lookingAt方法

public boolean lookingAt() {

    return match(from, NOANCHOR);

}

match方法：

boolean match(int from, int anchor) {

    this.hitEnd = false;

    this.requireEnd = false;

    from        = from < 0 ? 0 : from;

    this.first  = from;

    this.oldLast = oldLast < 0 ? from : oldLast;

    for (int i = 0; i < groups.length; i++)

        groups[i] = -1;

    acceptMode = anchor;

    boolean result = parentPattern.matchRoot.match(this, from, text);

    if (!result)

        this.first = -1;

    this.oldLast = this.last;

    return result;

}

五、matches方法：match方法是上述match方法。

public boolean matches() {

    return match(from, ENDANCHOR);

}

match方法都有这样的代码： this.first = from;有一个方法可以设置匹配字段的范围：

public Matcher region(int start, int end) {

    if ((start < 0) || (start > getTextLength()))

        throw new IndexOutOfBoundsException("start");

    if ((end < 0) || (end > getTextLength()))

        throw new IndexOutOfBoundsException("end");

    if (start > end)

        throw new IndexOutOfBoundsException("start > end");

    reset();

    from = start;

    to = end;

    return this;

}

官方文档的说明如下：

A matcher finds matches in a subset of its input called the region. By default, the region contains all of the matcher's input. The region can be modified via theregion method and queried via the regionStart and regionEnd methods. The way that the region boundaries interact with some pattern constructs can be changed.

六、得到结果集的信息：group数组存储匹配结果集的开始和结束的index。根据连续的index是可以确定一个字符串的。

group()方法：

public String group() {

    return group(0);

}

group(int group)方法：

public String group(int group) {

    if (first < 0)

        throw new IllegalStateException("No match found");

    if (group < 0 || group > groupCount())

        throw new IndexOutOfBoundsException("No group " + group);

    if ((groups[group*2] == -1) || (groups[group*2+1] == -1))

        return null;

    return getSubSequence(groups[group * 2], groups[group * 2 + 1]).toString();

}

getSubSequence(int start, int end)方法:

CharSequence getSubSequence(int beginIndex, int endIndex) {

    return text.subSequence(beginIndex, endIndex);

}

七、start和end方法

start()方法：

public int start() {

    if (first < 0)

        throw new IllegalStateException("No match available");

    return first;

}

start(int group)方法：

public int start(int group) {

    if (first < 0)

        throw new IllegalStateException("No match available");

    if (group > groupCount())

        throw new IndexOutOfBoundsException("No group " + group);

    return groups[group * 2];

}

end()方法：

public int end() {

    if (first < 0)

        throw new IllegalStateException("No match available");

    return last;

}

end(int group)方法：

public int end(int group) {

    if (first < 0)

        throw new IllegalStateException("No match available");

    if (group > groupCount())

        throw new IndexOutOfBoundsException("No group " + group);

    return groups[group * 2 + 1];

}

友情链接

正则表达式的使用： java基础---->java中正则表达式一

java基础---->java中正则表达式二的更多相关文章

JAVA基础再回首（二十五）——Lock锁的使用、死锁问题、多线程生产者和消费者、线程池、匿名内部类使用多线程、定时器、面试题
JAVA基础再回首(二十五)--Lock锁的使用.死锁问题.多线程生产者和消费者.线程池.匿名内部类使用多线程.定时器.面试题版权声明:转载必须注明本文转自程序猿杜鹏程的博客:http://blog ...
Java基础-Java中的内存分配与回收机制
Java基础-Java中的内存分配与回收机制作者:尹正杰版权声明:原创作品,谢绝转载!否则将追究法律责任. 一. 二.
Java基础-Java中的并法库之重入读写锁（ReentrantReadWriteLock）
Java基础-Java中的并法库之重入读写锁(ReentrantReadWriteLock) 作者:尹正杰版权声明:原创作品,谢绝转载!否则将追究法律责任. 在学习Java的之前,你可能已经听说过读 ...
Java基础-Java中的并法库之线程池技术
Java基础-Java中的并法库之线程池技术作者:尹正杰版权声明:原创作品,谢绝转载!否则将追究法律责任. 一.什么是线程池技术二.
Java基础-Java中23种设计模式之常用的设计模式
Java基础-Java中23种设计模式之常用的设计模式作者:尹正杰版权声明:原创作品,谢绝转载!否则将追究法律责任. 一.设计模式分类设计模式是针对特定场景给出的专家级的解决方案.总的来说设 ...
Java基础-JAVA中常见的数据结构介绍
Java基础-JAVA中常见的数据结构介绍作者:尹正杰版权声明:原创作品,谢绝转载!否则将追究法律责任. 一.什么是数据结构答:数据结构是指数据存储的组织方式.大致上分为线性表.栈(Stack) ...
java基础-引用数据类型之二维数组（Array）
java基础-引用数据类型之二维数组(Array) 作者:尹正杰版权声明:原创作品,谢绝转载!否则将追究法律责任. 之前我们学习过了Java的一维数组,所谓的二维数组就是元素是一堆一维数组的数组,换 ...
Java基础学习中一些词语和语句的使用
在Java基础学习中,我们刚接触Java会遇到一些词和语句的使用不清的情况,不能很清楚的理解它的运行效果会是怎么样的,如:break,continue在程序中运行效果及跳转位置, 1.先来看看brea ...
Java基础-Java中的堆内存和离堆内存机制
Java基础-Java中的堆内存和离堆内存机制作者:尹正杰版权声明:原创作品,谢绝转载!否则将追究法律责任.

随机推荐

今天踩过的坑——structs和phpmyadmin
phpmyadmin 错误:缺少 mcrypt 扩展解决mv -i /etc/php5/conf.d/mcrypt.ini /etc/php5/mods-available/sudo php5enmo ...
一个purge参数引发的惨案——从线上hbase数据被删事故说起
在写这篇blog前,我的心情久久不能平静,虽然明白运维工作如履薄冰,但没有料到这么一个细小的疏漏会带来如此严重的灾难.这是一起其他公司误用puppet参数引发的事故,而且这个参数我也曾被“坑过”. ...
linux web服务器必需的库文件
往往安装完linux之后,本文用的centos6.4,再编译安装其它服务器软件时,总是提示缺少各种库文件,在这里我总结了一下平时web服务器经常需要的一些库,如下: yum -y install m ...
更多文章请访问"程序旅途”
更多文章请访问我的个人独立博客程序旅途
Android UI系列-----EditText和AutoCompleteTextView
在这篇随笔里将主要讲解一下EditText和AutoCompleteTextView这个控件 1.EditText 首先我们先简单来说说EditText这个控件,这个就相当于我们平常web开发中的文本 ...
将自己写的windows服务加入到windows集群中
最近发现windows集群能进行很多自定义,比如在集群中加入自己编写的服务. 能自定义的可不少,截个图: 本次演示中,只想用“通用服务”这个类型. 先列下步骤编写一个记录时间的Windows服务,这 ...
前端测试回顾及我们为什么选择Karma
前端测试,或者UI测试一直是业界一大难题.最近Q.js使用Karma作为测试任务管理工具,本文在回顾前端测试方案的同时,也分析下为什么Q.js选用Karma而不是其他测试框架. 像素级全站对比曾今有 ...
iOS中生成并导入基于Swift编程语言的Framework
从iOS 8.0开始就引入了framework打包方式以及Swift编程语言.我们可以主要利用Swift编程语言将自己的代码打包成framework.不过当前Xcode 7.x在自动导入framewo ...
WebRTC is for Losers:WebRTC是输家
该文章是引述,仅代表作者Dave Michels观点 WebRTC is for Losers WebRTC technology has fallen short on many of its pr ...
修改windows系統下xampp中apache端口被其他程式占用的問題
windows 7安裝後啟動xampp, 提示port 443 被其他程式占用. 網上查找解決方案: http://stackoverflow.com/questions/21182512/how-t ...

java基础---->java中正则表达式二