使用split进行分割时遇到特殊字符的问题

使用split分割时：

String[] a="aa|bb|cc".split("|");

output：

[a, a, |, b, b, |, c, c]

先看一下split的用法：

 String[] java.lang.String.split(String regex)

Splits this string around matches of the given regular expression. 

This method works as if by invoking the two-argument split method with the given expression and a limit argument of zero. Trailing empty strings are therefore not included in the resulting array. 

The string "boo:and:foo", for example, yields the following results with these expressions: 

Regex Result

: { "boo", "and", "foo" }}

o { "b", "", ":and:f" }} 

Parameters:

regex the delimiting regular expression

Returns:

the array of strings computed by splitting this string around matches of the given regular expression

Throws:

PatternSyntaxException - if the regular expression's syntax is invalid

Since:

1.4

See Also:

java.util.regex.Pattern

@spec

JSR-51

可以看到split中参数是一个正则表达式，正则表达式中有一些特殊字符需要注意，它们有自己的用法：

http://www.fon.hum.uva.nl/praat/manual/Regular_expressions_1__Special_characters.html

The following characters are the meta characters that give special meaning to the regular expression search syntax:

\ the backslash escape character.

The backslash gives special meaning to the character following it. For example, the combination "\n" stands for the newline, one of the control characters. The combination "\w" stands for a "word" character, one of the convenience escape sequences while "\1" is one of the substitution special characters.

    Example: The regex "aa\n" tries to match two consecutive "a"s at the end of a line, inclusive the newline character itself.

    Example: "a\+" matches "a+" and not a series of one or "a"s.

^ the caret is the start of line anchor or the negate symbol.

    Example: "^a" matches "a" at the start of a line.

    Example: "[^0-9]" matches any non digit.

$ the dollar is the end of line anchor.

    Example: "b$" matches a "b" at the end of a line.

    Example: "^b$" matches the empty line.

{ } the open and close curly bracket are used as range quantifiers.

    Example: "a{2,3}" matches "aa" or "aaa".

[ ] the open and close square bracket define a character class to match a single character.

The "^" as the first character following the "[" negates and the match is for the characters not listed. The "-" denotes a range of characters. Inside a "[ ]" character class construction most special characters are interpreted as ordinary characters.

    Example: "[d-f]" is the same as "[def]" and matches "d", "e" or "f".

    Example: "[a-z]" matches any lowercase characters in the alfabet.

    Example: "[^0-9]" matches any character that is not a digit.

    Example: A search for "[][()?<>.*?]" in the string "[]()?<>.*?" followed by a replace string "r" has the result "rrrrrrrrrrrrr". Here the search string is one character class and all the meta characters are interpreted as ordinary characters without the need to escape them.

( ) the open and close parenthesis are used for grouping characters (or other regex).

The groups can be referenced in both the search and the substitution phase. There also exist some special constructs with parenthesis.

    Example: "(ab)\1" matches "abab".

. the dot matches any character except the newline.

    Example: ".a" matches two consecutive characters where the last one is "a".

    Example: ".*\.txt$" matches all strings that end in ".txt".

* the star is the match-zero-or-more quantifier.

    Example: "^.*$" matches an entire line.

+ the plus is the match-one-or-more quantifier.

? the question mark is the match-zero-or-one quantifier. The question mark is also used in special constructs with parenthesis and in changing match behaviour.

| the vertical pipe separates a series of alternatives.

    Example: "(a|b|c)a" matches "aa" or "ba" or "ca".

< > the smaller and greater signs are anchors that specify a left or right word boundary.

- the minus indicates a range in a character class (when it is not at the first position after the "[" opening bracket or the last position before the "]" closing bracket.

    Example: "[A-Z]" matches any uppercase character.

    Example: "[A-Z-]" or "[-A-Z]" match any uppercase character or "-".

& the and is the "substitute complete match" symbol.

那么上述方法的解决方法是使用转义来分割：

String[] a="aa|bb|cc".split("\\|");

小结：

对字符串的正则操作时要注意特殊字符的转义。

使用split进行分割时遇到特殊字符的问题的更多相关文章

Java字符串split分割星号*等特殊字符问题(转)
Java的split()方法分割字符串比较常用(见[Java]字符串以某特殊字符分割处理 ),但在有的时候,会遇到星号*等正则表达式中的特殊字符而无法分割的问题. 比如某需求,用户输入产品规格:厚*宽 ...
通过split命令分割大文件
场景线上出了问题,我需要去查找log来定位问题,但是由于线上数据量庞大,这些log文件每过一个小时就会自动回滚一次,尽管如此,有的log文件依然达到了五六g以上的大小. 对于这种巨大的log文件,常 ...
javascript中split字符串分割函数
1. var ss=s.split("fs"); for(var i=0;i<ss.length;i++){ 处理每一个ss[i]; } 2. "2:3:4:5&q ...
Split字符串分割函数
非常非常常用的一个函数Split字符串分割函数. Dim myTest myTest = "aaa/bbb/ccc/ddd/eee/fff/ggg" Dim arrTest arr ...
C#实现字符串按多个字符采用Split方法分割
原文:C#实现字符串按多个字符采用Split方法分割 String字符串如何按多个字符采用Split方法进行分割呢?本文提供VS2005和VS2003的实现方法,VS2005可以用下面的方法: str ...
linux下使用split 来分割大文件
linux下使用split 来分割大文件 2010-07-27 15:46:27| 分类: 技术文稿 | 标签:split 分割 linux |字号订阅平常都是使用ssh来进行远程 ...
freemarker中的split字符串分割
freemarker中的split字符串分割 1.简易说明 split分割:用来根据另外一个字符串的出现将原字符串分割成字符串序列 2.举例说明 <#--freemarker中的split字符串 ...
freemarker中的split字符串分割（十六）
1.简易说明 split分割:用来根据另外一个字符串的出现将原字符串分割成字符串序列 2.举例说明 <#--freemarker中的split字符串分割--> <#list &quo ...
javascript 中 split 函数分割字符串成数组
分割字符串成数组的方法有很多,不过使用最多的还是split函数 <script language="javascript"> str="2,2,3,5,6,6 ...

随机推荐

Android Binder设计与实现 - 设计篇
要 Binder是Android系统进程间通信(IPC)方式之一.Linux已经拥有管道,system V IPC,socket等IPC手段,却还要倚赖Binder来实现进程间通信,说明Binder具 ...
vc多文档应用程序窗口初始化,关闭子框架,标题,动态切换
vc多文档应用程序窗口初始化 http://hi.baidu.com/laocui172/item/8d17a00b252154e1ff240dae VC 多文档视图: 关闭所有子框架 ...
ruby 编写迭代器
class My def initialize(name,age) @name=name @age=age end def sayName puts @name end def sayAge puts ...
Covariance and Contravariance in C#, Part One
http://blogs.msdn.com/b/ericlippert/archive/2007/10/16/covariance-and-contravariance-in-c-part-one.a ...
C#面向对象编程实例-猜拳游戏
1.需求现在要制作一个游戏,玩家与计算机进行猜拳游戏,玩家出拳,计算机出拳,计算机自动判断输赢. 2.需求分析根据需求,来分析一下对象,可分析出:玩家对象(Player).计算机对象(Comput ...
javascript点击图片放大的功能（原生）
使用的图片: 1.jpg <!doctype html> <html lang="en"> <head> <meta charset=&q ...
css滑动门制作圆角按钮
之前做项目的时候,基本都是将圆角背景图切成三块,故而每次用的标签都超级多,a标签中总是包含三个span,然后里面还得放按钮,导航冗余标签极多. 事实上是之前理解的滑动门的精髓不够到位. 现在有两种方式 ...
java CopyOnWriteArrayList的使用
除了加锁外,其实还有一种方式可以防止并发修改异常,这就是将读写分离技术(不是数据库上的). 先回顾一下一个常识: 1.JAVA中“=”操作只是将引用和某个对象关联,假如同时有一个线程将引用指向另外一个 ...
CodeForcesGym 100753B Bounty Hunter II 二分图最小路径覆盖
关键在建图题解:http://www.cnblogs.com/crackpotisback/p/4856159.html 学习:http://www.cnblogs.com/jackiesteed/ ...
Git branch (分支学习)
以前总结的一些git操作,分享在这里. Git 保存的不是文件差异或者变化量,而只是一系列文件快照. - 列出当前所有分支 git branch <--merge> | <--n ...

使用split进行分割时遇到特殊字符的问题

使用split进行分割时遇到特殊字符的问题的更多相关文章

随机推荐

热门专题