Thinking in Java,Fourth Edition(Java 编程思想,第四版)学习笔记(十三)之Strings
Immutable Strings
Objects of the String class are immutable.
If you examine the JDK documentation for the String class, you’ll see that every method in the class that appears to modify a String actually creates and returns a brand new String object containing the modification. The original String is left untouched.
Overloading '+' vs. StringBuilder
Immutability can have efficiency issues. A case in point is the operator ‘+’ that has been overloaded for String objects.
String s = "abc" + mango + "def" + 47; // like: "abc".append(mango).append("def").append(47);
The String "abc" could have a method append( ) that creates a new String object containing "abc" concatenated with the contents of mango. The new String object would then create another new String that added "def," and so on.
This would certainly work, but it requires the creation of a lot of String objects just to put together this new String, and then you have a bunch of intermediate String objects that need to be garbage collected.
Decompile: javap -c ClassName
-c flag will produce the JVM bytecodes
If you try to take shortcuts and do something like append(a + ": " + c), the compiler will jump in and start making more StringBuilder objects again.
StringBuilder was introduced in Java SE5. Prior to this, Java used StringBuffer, which ensured thread safety (see the Concurrency chapter) and so was significantly more expensive.
Unintended recursion
Suppose you’d like your toString( ) to print the address of your class. It seems to make sense to simply refer to this:
public String toString() {
return " InfiniteRecursion address: " + this + "\n"; // 调用,实际上就是调用this.toString() 会导致死循环调用
}
解决方案:
this -> super.toString();
Operations on Strings
只罗列部分方法
| Method | Arguments,Overloading | Use |
| Constructor() |
Overloading:default,String,StringBuilder, StringBuffer,char arrays,byte arrays |
Creating String objects |
| getChars(),getBytes() | ... | Copy chars or bytes into an external array |
| toCharArray() | ||
| compareTo() | 比较大小,区分大小写 | |
| contains() | A CharSequence to search for | |
| contentEquals() | A CharSequence or StringBuffer to compare to | |
| regionMatches() |
Offset into this String, the other String and its offset and length to compare. Overload adds "ignore case." |
|
| startsWith() |
String that is might start with. Overload adds offset into argument |
|
| valueOf() | Overloaded: Object, char[],char[] and offset and count,boolean,char,int,long,float,double | |
| intern() | Produces one and only one String reference per unique character sequence |
Formatting output
System.out.format()
Java SE5 introduced the format() method, available to PrintStream or PrintWriter object.
System.out.println("Row 1: [" + x + " " + y + "]"); // The old way
System.out.format("Row 1: [%d %f]\n", x,y); // The new way
The Formatter class
All of Java's new formatting functionality is handled by the Formatter class in the java.util package.
When you create a Formatter object, you tell it where you want this result to go by passing that information to the constructor.
The constructor is overloaded to take a range of output locatins, but the most useful are PrintStreams, OutputStreams, and Files.
Format specifiers
general syntax: %[argument_index$][flags][width][.precision]conversion
width: control the minimun size of a filed. if necessary padding it with spaces
flags "-": By default, the date is right justified, but this can be overridden by including a "-" in the flags section
precision: specify a maximum
1. For Strings, specifies the maximum number of characters from the String to print.
2. For floating point numbers, specifies the number of decimal places to display ( the default is 6), rounding if there are too many of adding trailing zeros if there are too few.
3. Not for integers
Formatter conversions
d: Integral (as decimal 十进制)
c: Unicode character
b: Boolean value
s: String
f: Floating point(as decimal)
e: Floating point ( in scientific notation)
x: Integral (as hex)
h: Hash code (as hex)
%: Literal "%"
Notice that the ‘b’ conversion works for each variable above. Although it’s valid for any argument type, it might not behave as you’d expect. For boolean primitives or Boolean objects, the result will be true or false, accordingly. However, for any other argument, as long as the argument type is not null the result is always true. Even the numeric value of zero, which is synonymous with false in many languages (including C), will produce true, so be careful when using this conversion with non-boolean types.
String.format()
String.format( ) is a static method which takes all the same arguments as Formatter’s format( ) but returns a String.
result.append(String.format("%05X: ", n)); // 长度为5,不够的前面补零
Regular expressions
Basic
In Java, ‘ \ \ ‘ means "I’m inserting a regular expression backslash, so that the following character has special meaning." For example, if you want to indicate a digit, your regular expression string will be ‘\\d’. If you want to insert a literal backslash, you say ‘\\\\’- However, things like newlines and tabs just use a single backslash: ‘\n\t’.
To indicate "one or more of the preceding expression," you use a ‘+’.
In regular expressions, parentheses have the effect of grouping an expression, and the vertical bar ‘|’ means OR. So (-|\\+)?
(-|\\+)? means that this part of the string may be either a ‘-’ or a ‘+’ or nothing (because of the ‘?’).
Because the ‘+’ character has special meaning in regular expressions, it must be escaped with a ‘\\’ in order to appear as an ordinary character in the expression.
‘\W’, which means a non-word character (the lowercase version, ‘\w’, means a word character)
spilt()
replaceFirst()
replaceAll()
You'll see that the non-String regular expressions have more powerful replacement tools. Non-String regular expressions are also significantly more efficient if you need to use the regular expression more than once.
Create regular expressions
A complete list of constructs for building regular expressions can be found in the JDK documentation for the Pattern class for package java.util.regex




Quantifiers (??)
•Greedy: Quantifiers are greedy unless otherwise altered. A greedy expression finds as many possible matches for the pattern as possible. A typical cause of problems is to assume that your pattern will only match the first possible group of characters, when it’s actually greedy and will keep going until it’s matched the largest possible string.
•Reluctant: Specified with a question mark, this quantifier matches the minimum number of characters necessary to satisfy the pattern. Also called lazy, minimal matching, non-greedy, or ungreedy.
•Possessive: Currently this is only available in Java (not in other languages) and is more advanced, so you probably won’t use it right away. As a regular expression is applied to a string, it generates many states so that it can backtrack if the match fails. Possessive quantifiers do not keep those intermediate states, and thus prevent backtracking. They can be used to prevent a regular expression from running away and also to make it execute more efficiently.

CharSequence
The interface called CharSequence establishes a generalized definition of a character sequence abstracted from the CharBuffer, String, StringBuffer, or StringBuilder classes
Many regular expression operations take CharSequence arguments.
Pattern and Matcher
In general, you’ll compile regular expression objects rather than using the fairly limited String utilities. To do this, you import java.util.regex, then compile a regular expression by using the static Pattern.compile( ) method. This produces a Pattern object based on its String argument. You use the Pattern by calling the matcher( ) method, passing the string that you want to search. The matcher( ) method produces a Matcher object, which has a set of operations to choose from
Pattern p = Pattern.compile(arg);
Matcher m = p.matcher(args[0]);
while(m.find()){}
static boolean matches(String regex, CharSequence input)
boolean matches()
boolean lookingAt()
boolean find()
boolean find(int start)
find()
public class Finding {
public static void main(String[] args) {
Matcher m = Pattern.compile("\\w+")
.matcher("Evening is full of the linnet’s wings");
while(m.find())
printnb(m.group() + " ");
print();
int i = 0;
while(m.find(i)) {
printnb(m.group() + " ");
i++;
}
}
} /* Output:
Evening is full of the linnet s wings
Evening vening ening ning ing ng g is is s full full ull ll l of of f the the he e linnet linnet innet nnet net et t s s wings wings ings ngs gs s
*/
Groups
Groups are regular expressions set off by parentheses that can be called up late with their group number. Group 0 indicates the whole expression match, group 1 is the first parethesized group
A(B(C))D : Group 0 is ABCD, group 1 is BC, and group2 is C.
The Matcher object has methods to give you information about groups:
public int groupcount()--returns the number of groups in this matcher's pattern. Group 0 is not included in this count
public String group()--returns group 0 from the previous match operation
public String group(int i )--returns the given group number during the previous match operation. If the match was successful, but the group specified failed to match any part of the input string, then null is returned
public int start(int group) returns the start index of the group found in the previous match opetation
public int end(int group) returns the index of the last character, plus one, of the group found in the previous operation
the normal behavior is to match "$" with the end of the entire input sequence, so you must explicitly tell the regular expression to pay attention to newlines with the input. This is accomplished with the '(?m)' pattern flag at the beginning of the sequence
start() and end()
Following a successful matching operation, start( ) returns the start index of the previous match, and end( ) returns the index of the last character matched, plus one. Invoking either start( ) or end( ) following an unsuccessful matching operation (or before attempting a matching operation) produces an IllegalStateException.
Notice that find( ) will locate the regular expression anywhere in the input, but lookingAt( ) and matches( ) only succeed if the regular expression starts matching at the very beginning of the input. While matches( ) only succeeds if the entire input matches the regular expression, lookingAt( ) succeeds if only the first part of the input matches.
Pattern flags
Pattern Pattern.compile(String regex, int flag)
Pattern.compile("(?m)(\\S+)\\s+((\\S+)\\s+(\\S+))$") 等价于
Pattern.compile("(\\S+)\\s+((\\S+)\\s+(\\S+))$",Pattern.MULTILINE)
Note that the behavior of most of the flags can also be obtained by inserting the parenthesized characters into your regular expression preceding the place where you want the mode to take effect.
常用的:Pattern.CAST_INSENSITIVE (?i)
Pattern.COMMENTS (?x) -- In this mode, whitespace is ignored, and embedded comments starting with # are ignored until the end of a line. Unix lines mode can also be enabled via the embedded flag expression.
Pattern.MULTILINE (?m) -- In multiline mode, the expressions ‘^’ and ‘$’ match the beginning and ending of a line, respectively.’^’ also matches the beginning of the input string, and ‘$’ also matches the end of the input string. By default, these expressions only match at the beginning and the end of the entire input string.
split()
String[] split(CharSequence input)
String[] split(CharSequence input, int limit); // limits the number of splits that occur
Replace operations
replaceFirst(String replacement)
replaceAll(String replacement)
appendReplacement(StringBuffer sbuf,String replacement)
appendTail(StringBuffer sbuf, String replacement)
/*! Here’s a block of text to use as input to
the regular expression matcher. Note that we’ll
first extract the block of text by looking for
the special delimiters, then process the
extracted block. !*/
public class TheReplacements {
public static void main(String[] args) throws Exception {
String s = TextFile.read("TheReplacements.java");
// Match the specially commented block of text above:
Matcher mInput =
Pattern.compile("/\\*!(.*)!\\*/", Pattern.DOTALL) // 默认情况, "." 不匹配行终结符,在dotall模式下,匹配所有字符
.matcher(s);
if(mInput.find())
s = mInput.group(1); // Captured by parentheses
// Replace two or more spaces with a single space:
s = s.replaceAll(" {2,}", " ");
// Replace one or more spaces at the beginning of each
// line with no spaces. Must enable MULTILINE mode:
s = s.replaceAll("(?m)^ +", "");
print(s);
s = s.replaceFirst("[aeiou]", "(VOWEL1)");
StringBuffer sbuf = new StringBuffer();
Pattern p = Pattern.compile("[aeiou]");
Matcher m = p.matcher(s);
// Process the find information as you perform the replacements;
while(m.find())
m.appendReplacement(sbuf, m.group().toUpperCase());
// Put in the remainder of the text:
m.appendTail(sbuf);
print(sbuf);
}
} /* Output:
Here’s a block of text to use as input to
the regular expression matcher. Note that we’ll
first extract the block of text by looking for
the special delimiters, then process the
extracted block.
H(VOWEL1)rE’s A blOck Of tExt tO UsE As InpUt tO
thE rEgUlAr ExprEssIOn mAtchEr. NOtE thAt wE’ll
fIrst ExtrAct thE blOck Of tExt by lOOkIng fOr
thE spEcIAl dElImItErs, thEn prOcEss thE
ExtrActEd blOck.
*///:~
These two replacements are performed with the equivalent (but more convenient, in this case) replaceAll( ) that’s part of String. Note that since each replacement is only used once in the program, there’s no extra cost to doing it this way rather than precompiling it as a Pattern.
reset()
An existing Matcher object can be applied to a new character sequence using the reset() methods
m.reset("fix the rig with rags");
m.reset() set the Matcher to the beginning of the current sequence.
Regular expressions and Java I/O
深入学习正则表达式,可以参考Mastering Regular Expressions, 2nd Edition, by Jeffrey E.F.Friedl (O‘Reilly, 2002)
Scanning input
StringReader turns a String into a readable stream
Scanner constructor can take just about any kind of input object, including a File object, an InputStream, a String and a Readable
With Scanner, the input, tokenizing, and parsing are all ensconced in various different kinds of "next" methods. A plain next( ) returns the next String token, and there are "next" methods for all the primitive types (except char) as well as for BigDecimal and Biglnteger.
There are also corresponding "hasNext" methods that return true if the next input token is of the correct type.
the most recent exception is available through the ioException( ) method, so you are able to examine it if necessary. (??)
Scanner delimiters
By default, a Scanner splits input tokens along whitespace, but you can also specify your own delimiter pattern in the form of a regular expression
scanner.useDelimiter("\\s*,\\s*");
delimiter() return the current Pattern being used as a delimiter
Scanning with regular expressions
In addition to scanning for predefined primitive types, you can also scan for your own userdefined patterns
public class ThreatAnalyzer {
static String threatData =
"58.27.82.161@02/10/2005\n" +
"204.45.234.40@02/11/2005\n" +
"58.27.82.161@02/11/2005\n" +
"58.27.82.161@02/12/2005\n" +
"58.27.82.161@02/12/2005\n" +
"[Next log section with different data format]";
public static void main(String[] args) {
Scanner scanner = new Scanner(threatData);
String pattern = "(\\d+[.]\\d+[.]\\d+[.]\\d+)@" +
"(\\d{2}/\\d{2}/\\d{4})";
while(scanner.hasNext(pattern)) {
scanner.next(pattern);
MatchResult match = scanner.match();
String ip = match.group(1);
String date = match.group(2);
System.out.format("Threat on %s from %s\n", date,ip);
}
}
} /* Output:
Threat on 02/10/2005 from 58.27.82.161
Threat on 02/11/2005 from 204.45.234.40
Threat on 02/11/2005 from 58.27.82.161
Threat on 02/12/2005 from 58.27.82.161
Threat on 02/12/2005 from 58.27.82.161
*///:~
There's one caveat when scanning with regular expressions. The pattern is matched against the next input token only, so if your pattern contains a delimiter it will never be matched
StringTokenizer
Before regular expressions or the Scanner class, the way to split a string into parts was to "tokenize" it with StringTokenize
Summary
you must sometimes pay attention to efficiency detail such as the appropriate use of StringBuilder
Thinking in Java,Fourth Edition(Java 编程思想,第四版)学习笔记(十三)之Strings的更多相关文章
- Thinking in Java,Fourth Edition(Java 编程思想,第四版)学习笔记(七)之Access Control
Access control ( or implementation hiding) is about "not getting it right the first time." ...
- Thinking in Java,Fourth Edition(Java 编程思想,第四版)学习笔记(六)之Initialization & Cleanup
Two of these safety issues are initialization and cleanup. initialization -> bug cleanup -> ru ...
- Thinking in Java,Fourth Edition(Java 编程思想,第四版)学习笔记(二)之Introduction to Objects
The genesis of the computer revolution was a machine. The genesis of out programming languages thus ...
- Thinking in Java,Fourth Edition(Java 编程思想,第四版)学习笔记(十四)之Type Information
Runtime type information (RTTI) allow you to discover and use type information while a program is ru ...
- Thinking in Java,Fourth Edition(Java 编程思想,第四版)学习笔记(十二)之Error Handling with Exceptions
The ideal time to catch an error is at compile time, before you even try to run the program. However ...
- Thinking in Java,Fourth Edition(Java 编程思想,第四版)学习笔记(十一)之Holding Your Objects
To solve the general programming problem, you need to create any number of objects, anytime, anywher ...
- Thinking in Java,Fourth Edition(Java 编程思想,第四版)学习笔记(十)之Inner Classes
The inner class is a valuable feature because it allows you to group classes that logically belong t ...
- Thinking in Java,Fourth Edition(Java 编程思想,第四版)学习笔记(九)之Interfaces
Interfaces and abstract classes provide more structured way to separate interface from implementatio ...
- Thinking in Java,Fourth Edition(Java 编程思想,第四版)学习笔记(八)之Polymorphism
Polymorphism is the third essential feature of an object-oriented programming language,after data ab ...
随机推荐
- 倒影box-reflect(可图片可文字)
需要写兼容写法: -webkit-box-reflect:below 3px -webkit-(repeating)linear/redial-gradient(...): 1.先写direction ...
- 【SQL SERVER】索引
在做开发过程中经常会接触数据库索引,不只是DBA才需要知道索引知识,了解索引可以让我们写出更高质量代码. 索引概述 聚集索引 非聚集索引 唯一索引 筛选索引 非聚集索引包含列 索引概述 索引的存在主要 ...
- API开放平台接口设计-------令牌方式
1.需求:现在A公司与B公司进行合作,B公司需要调用A公司开放的外网接口获取数据,如何保证外网开放接口的安全性? 2,使用令牌方式 比如支付宝对外提供支付的接口,爱乐生公司需要调用支付宝的接口.在爱乐 ...
- 基于MVP模式实现四则运算器
基于MVP模式四则运算器 来到新东家,项目的框架采用的是MVP模式,刚来公司的时候,项目经理给予分配小任务,首先熟悉MVP模式,而后普通的四则运算器的实现使用MVP分层.这里主要回顾当时做任务时候的对 ...
- 不可被忽视的操作系统( FreeRTOS )【1】
把大多数人每个星期的双休过过成了奢侈的节假日放假,把每天23点后定义为自己的自由时间,应该如何去思考这个问题 ? 双休的两天里,不!是放假的两天里,终于有较长的时间好好的学习一下一直断断续续的Free ...
- CSS3新增的选择器
1. 层次选择器 子元素选择器: 只选择子元素 父选择器 > 子选择器 后面紧相邻的兄弟元素: 选择器1 + 选择器2 后面所有的兄弟元素: 选择器1 ~ 选择器2 2.属性选择器 ...
- Eclipse打包jar
对一个包打jar包 右键包名-Export-Jar File-选择所在包的class文件(注意),如果选择java文件会失败-然后Finish 检查jar包是否正确,使用如jd-gui这样的反编译工具 ...
- PostgreSql 自定义函数:批量调整某个字段长度
CREATE or replace FUNCTION alterColumn(cloumnName VARCHAR(32), out v_retcode text)AS$BODY$ declare r ...
- ASP.NET Core技术研究-探秘依赖注入框架
ASP.NET Core在底层内置了一个依赖注入框架,通过依赖注入的方式注册服务.提供服务.依赖注入不仅服务于ASP.NET Core自身,同时也是应用程序的服务提供者. 毫不夸张的说,ASP.NET ...
- LeetCode 题解 | 70. 爬楼梯
假设你正在爬楼梯.需要 n 阶你才能到达楼顶. 每次你可以爬 1 或 2 个台阶.你有多少种不同的方法可以爬到楼顶呢? 注意:给定 n 是一个正整数. 示例 1: 输入: 2 输出: 2 解释: 有两 ...