本文转载于:http://blog.csdn.net/zhangphil/article/details/47164665

Android系统本身自带有有将汉字转化为英文拼音的类和方法。具体的类就是HanziToPinyin.java。Android系统自身实现的通讯录中就使用了HanziToPinyin.java对中文通讯录做分组整理。通过HanziToPinyin.java可以将汉字转化为拼音输出,在一些应用中非常必须,比如联系人的分组,假设一个人通讯录中存有若干姓张(ZHANG)的联系人,那么所有姓张的联系人按理都应该分组在“Z”组下。又比如微信、QQ等等此类社交类APP,凡是涉及到联系人、好友分组排序的应用场景,则均需要将汉字转化为拼音然后依据首字母排序归类。
HanziToPinyin.java不是一个公开的类,只是谷歌官方内部在实现Android通讯录中私有使用的一个类,我们不能够直接像使用普通Android SDK API一样使用,但这没关系,我们完全可以将这个类文件拷贝出来,放到我们自己的项目中,直接使用。
HanziToPinyin.java的代码文件,谷歌官方的通讯录APP下:

packages/providers/ContactsProvider /src/com/android/providers/contacts/HanziToPinyin.java

网上也有这个HanziToPinyin.java类文件的项目地址。但是,直接使用这个 类不能正常工作,错误原因是:

"There is no Chinese collator, HanziToPinyin is disabled"

发生这一错误的代码块是在HanziToPinyin.java的方法:
public static HanziToPinyin getInstance();
具体原因是这个方法在一些非原生定制的Android系统中,对中文Locale的定义规则不同,导致原代码文件中的locale[i].equals(Locale.CHINA)返回false,不能识别,致使以后的代码全部失去功效。

对此问题的修复(解决方案)

我改进了判断条件,增加一些代码:
final Locale chinaAddition = new Locale("zh");
将此chinaAddition作为辅助条件也加入到条件判断中,

 if ( locale[i].equals(Locale.CHINA) ||  locale[i].equals(chinaAddition) ){

}

下面是我改进后的getInstance()方法全部代码:

 public static HanziToPinyin getInstance() {
synchronized (HanziToPinyin.class) {
if (sInstance != null) {
return sInstance;
}
// Check if zh_CN collation data is available
final Locale locale[] = Collator.getAvailableLocales(); // 增加的代码,增强。
final Locale chinaAddition = new Locale("zh"); for (int i = 0; i < locale.length; i++) {
if (locale[i].equals(Locale.CHINA)
|| locale[i].equals(chinaAddition)) {
// Do self validation just once.
if (DEBUG) {
Log.d(TAG, "Self validation. Result: "
+ doSelfValidation());
}
sInstance = new HanziToPinyin(true);
return sInstance;
}
}
Log.w(TAG,
"There is no Chinese collator, HanziToPinyin is disabled");
sInstance = new HanziToPinyin(false);
return sInstance;
}
}

经由改进增强,HanziToPinyin.java的全部源代码如下(代码可以复制到自己的项目中直接使用):

  1 /*
2 * Copyright (C) 2011 The Android Open Source Project
3 *
4 * Licensed under the Apache License, Version 2.0 (the "License");
5 * you may not use this file except in compliance with the License.
6 * You may obtain a copy of the License at
7 *
8 * http://www.apache.org/licenses/LICENSE-2.0
9 *
10 * Unless required by applicable law or agreed to in writing, software
11 * distributed under the License is distributed on an "AS IS" BASIS,
12 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13 * See the License for the specific language governing permissions and
14 * limitations under the License.
15 */
16
17 package zhangphil.hanyupinyin;
18
19 import android.text.TextUtils;
20 import android.util.Log;
21
22 import java.text.Collator;
23 import java.util.ArrayList;
24 import java.util.Locale;
25
26 /**
27 * An object to convert Chinese character to its corresponding pinyin string.
28 * For characters with multiple possible pinyin string, only one is selected
29 * according to collator. Polyphone is not supported in this implementation.
30 * This class is implemented to achieve the best runtime performance and minimum
31 * runtime resources with tolerable sacrifice of accuracy. This implementation
32 * highly depends on zh_CN ICU collation data and must be always synchronized
33 * with ICU.
34 *
35 * Currently this file is aligned to zh.txt in ICU 4.6 鏉ヨ嚜android4.2婧愮爜
36 */
37 public class HanziToPinyin {
38 private static final String TAG = "HanziToPinyin";
39
40 // Turn on this flag when we want to check internal data structure.
41 private static final boolean DEBUG = false;
42
43 /**
44 * Unihans array.
45 *
46 * Each unihans is the first one within same pinyin when collator is zh_CN.
47 */
48 public static final char[] UNIHANS = { '\u963f', '\u54ce', '\u5b89',
49 '\u80ae', '\u51f9', '\u516b', '\u6300', '\u6273', '\u90a6',
50 '\u52f9', '\u9642', '\u5954', '\u4f3b', '\u5c44', '\u8fb9',
51 '\u706c', '\u618b', '\u6c43', '\u51ab', '\u7676', '\u5cec',
52 '\u5693', '\u5072', '\u53c2', '\u4ed3', '\u64a1', '\u518a',
53 '\u5d7e', '\u66fd', '\u66fe', '\u5c64', '\u53c9', '\u8286',
54 '\u8fbf', '\u4f25', '\u6284', '\u8f66', '\u62bb', '\u6c88',
55 '\u6c89', '\u9637', '\u5403', '\u5145', '\u62bd', '\u51fa',
56 '\u6b3b', '\u63e3', '\u5ddb', '\u5205', '\u5439', '\u65fe',
57 '\u9034', '\u5472', '\u5306', '\u51d1', '\u7c97', '\u6c46',
58 '\u5d14', '\u90a8', '\u6413', '\u5491', '\u5446', '\u4e39',
59 '\u5f53', '\u5200', '\u561a', '\u6265', '\u706f', '\u6c10',
60 '\u55f2', '\u7538', '\u5201', '\u7239', '\u4e01', '\u4e1f',
61 '\u4e1c', '\u543a', '\u53be', '\u8011', '\u8968', '\u5428',
62 '\u591a', '\u59b8', '\u8bf6', '\u5940', '\u97a5', '\u513f',
63 '\u53d1', '\u5e06', '\u531a', '\u98de', '\u5206', '\u4e30',
64 '\u8985', '\u4ecf', '\u7d11', '\u4f15', '\u65ee', '\u4f85',
65 '\u7518', '\u5188', '\u768b', '\u6208', '\u7ed9', '\u6839',
66 '\u522f', '\u5de5', '\u52fe', '\u4f30', '\u74dc', '\u4e56',
67 '\u5173', '\u5149', '\u5f52', '\u4e28', '\u5459', '\u54c8',
68 '\u548d', '\u4f44', '\u592f', '\u8320', '\u8bc3', '\u9ed2',
69 '\u62eb', '\u4ea8', '\u5677', '\u53ff', '\u9f41', '\u4e6f',
70 '\u82b1', '\u6000', '\u72bf', '\u5ddf', '\u7070', '\u660f',
71 '\u5419', '\u4e0c', '\u52a0', '\u620b', '\u6c5f', '\u827d',
72 '\u9636', '\u5dfe', '\u5755', '\u5182', '\u4e29', '\u51e5',
73 '\u59e2', '\u5658', '\u519b', '\u5494', '\u5f00', '\u520a',
74 '\u5ffc', '\u5c3b', '\u533c', '\u808e', '\u52a5', '\u7a7a',
75 '\u62a0', '\u625d', '\u5938', '\u84af', '\u5bbd', '\u5321',
76 '\u4e8f', '\u5764', '\u6269', '\u5783', '\u6765', '\u5170',
77 '\u5577', '\u635e', '\u808b', '\u52d2', '\u5d1a', '\u5215',
78 '\u4fe9', '\u5941', '\u826f', '\u64a9', '\u5217', '\u62ce',
79 '\u5222', '\u6e9c', '\u56d6', '\u9f99', '\u779c', '\u565c',
80 '\u5a08', '\u7567', '\u62a1', '\u7f57', '\u5463', '\u5988',
81 '\u57cb', '\u5ada', '\u7264', '\u732b', '\u4e48', '\u5445',
82 '\u95e8', '\u753f', '\u54aa', '\u5b80', '\u55b5', '\u4e5c',
83 '\u6c11', '\u540d', '\u8c2c', '\u6478', '\u54de', '\u6bea',
84 '\u55ef', '\u62cf', '\u8149', '\u56e1', '\u56d4', '\u5b6c',
85 '\u7592', '\u5a1e', '\u6041', '\u80fd', '\u59ae', '\u62c8',
86 '\u5b22', '\u9e1f', '\u634f', '\u56dc', '\u5b81', '\u599e',
87 '\u519c', '\u7fba', '\u5974', '\u597b', '\u759f', '\u9ec1',
88 '\u90cd', '\u5594', '\u8bb4', '\u5991', '\u62cd', '\u7705',
89 '\u4e53', '\u629b', '\u5478', '\u55b7', '\u5309', '\u4e15',
90 '\u56e8', '\u527d', '\u6c15', '\u59d8', '\u4e52', '\u948b',
91 '\u5256', '\u4ec6', '\u4e03', '\u6390', '\u5343', '\u545b',
92 '\u6084', '\u767f', '\u4eb2', '\u72c5', '\u828e', '\u4e18',
93 '\u533a', '\u5cd1', '\u7f3a', '\u590b', '\u5465', '\u7a63',
94 '\u5a06', '\u60f9', '\u4eba', '\u6254', '\u65e5', '\u8338',
95 '\u53b9', '\u909a', '\u633c', '\u5827', '\u5a51', '\u77a4',
96 '\u637c', '\u4ee8', '\u6be2', '\u4e09', '\u6852', '\u63bb',
97 '\u95aa', '\u68ee', '\u50e7', '\u6740', '\u7b5b', '\u5c71',
98 '\u4f24', '\u5f30', '\u5962', '\u7533', '\u8398', '\u6552',
99 '\u5347', '\u5c38', '\u53ce', '\u4e66', '\u5237', '\u8870',
100 '\u95e9', '\u53cc', '\u8c01', '\u542e', '\u8bf4', '\u53b6',
101 '\u5fea', '\u635c', '\u82cf', '\u72fb', '\u590a', '\u5b59',
102 '\u5506', '\u4ed6', '\u56fc', '\u574d', '\u6c64', '\u5932',
103 '\u5fd1', '\u71a5', '\u5254', '\u5929', '\u65eb', '\u5e16',
104 '\u5385', '\u56f2', '\u5077', '\u51f8', '\u6e4d', '\u63a8',
105 '\u541e', '\u4e47', '\u7a75', '\u6b6a', '\u5f2f', '\u5c23',
106 '\u5371', '\u6637', '\u7fc1', '\u631d', '\u4e4c', '\u5915',
107 '\u8672', '\u4eda', '\u4e61', '\u7071', '\u4e9b', '\u5fc3',
108 '\u661f', '\u51f6', '\u4f11', '\u5401', '\u5405', '\u524a',
109 '\u5743', '\u4e2b', '\u6079', '\u592e', '\u5e7a', '\u503b',
110 '\u4e00', '\u56d9', '\u5e94', '\u54df', '\u4f63', '\u4f18',
111 '\u625c', '\u56e6', '\u66f0', '\u6655', '\u7b60', '\u7b7c',
112 '\u5e00', '\u707d', '\u5142', '\u5328', '\u50ae', '\u5219',
113 '\u8d3c', '\u600e', '\u5897', '\u624e', '\u635a', '\u6cbe',
114 '\u5f20', '\u957f', '\u9577', '\u4f4b', '\u8707', '\u8d1e',
115 '\u4e89', '\u4e4b', '\u5cd9', '\u5ea2', '\u4e2d', '\u5dde',
116 '\u6731', '\u6293', '\u62fd', '\u4e13', '\u5986', '\u96b9',
117 '\u5b92', '\u5353', '\u4e72', '\u5b97', '\u90b9', '\u79df',
118 '\u94bb', '\u539c', '\u5c0a', '\u6628', '\u5159', '\u9fc3',
119 '\u9fc4', };
120
121 /**
122 * Pinyin array.
123 *
124 * Each pinyin is corresponding to unihans of same offset in the unihans
125 * array.
126 */
127 public static final byte[][] PINYINS = { { 65, 0, 0, 0, 0, 0 },
128 { 65, 73, 0, 0, 0, 0 }, { 65, 78, 0, 0, 0, 0 },
129 { 65, 78, 71, 0, 0, 0 }, { 65, 79, 0, 0, 0, 0 },
130 { 66, 65, 0, 0, 0, 0 }, { 66, 65, 73, 0, 0, 0 },
131 { 66, 65, 78, 0, 0, 0 }, { 66, 65, 78, 71, 0, 0 },
132 { 66, 65, 79, 0, 0, 0 }, { 66, 69, 73, 0, 0, 0 },
133 { 66, 69, 78, 0, 0, 0 }, { 66, 69, 78, 71, 0, 0 },
134 { 66, 73, 0, 0, 0, 0 }, { 66, 73, 65, 78, 0, 0 },
135 { 66, 73, 65, 79, 0, 0 }, { 66, 73, 69, 0, 0, 0 },
136 { 66, 73, 78, 0, 0, 0 }, { 66, 73, 78, 71, 0, 0 },
137 { 66, 79, 0, 0, 0, 0 }, { 66, 85, 0, 0, 0, 0 },
138 { 67, 65, 0, 0, 0, 0 }, { 67, 65, 73, 0, 0, 0 },
139 { 67, 65, 78, 0, 0, 0 }, { 67, 65, 78, 71, 0, 0 },
140 { 67, 65, 79, 0, 0, 0 }, { 67, 69, 0, 0, 0, 0 },
141 { 67, 69, 78, 0, 0, 0 }, { 67, 69, 78, 71, 0, 0 },
142 { 90, 69, 78, 71, 0, 0 }, { 67, 69, 78, 71, 0, 0 },
143 { 67, 72, 65, 0, 0, 0 }, { 67, 72, 65, 73, 0, 0 },
144 { 67, 72, 65, 78, 0, 0 }, { 67, 72, 65, 78, 71, 0 },
145 { 67, 72, 65, 79, 0, 0 }, { 67, 72, 69, 0, 0, 0 },
146 { 67, 72, 69, 78, 0, 0 }, { 83, 72, 69, 78, 0, 0 },
147 { 67, 72, 69, 78, 0, 0 }, { 67, 72, 69, 78, 71, 0 },
148 { 67, 72, 73, 0, 0, 0 }, { 67, 72, 79, 78, 71, 0 },
149 { 67, 72, 79, 85, 0, 0 }, { 67, 72, 85, 0, 0, 0 },
150 { 67, 72, 85, 65, 0, 0 }, { 67, 72, 85, 65, 73, 0 },
151 { 67, 72, 85, 65, 78, 0 }, { 67, 72, 85, 65, 78, 71 },
152 { 67, 72, 85, 73, 0, 0 }, { 67, 72, 85, 78, 0, 0 },
153 { 67, 72, 85, 79, 0, 0 }, { 67, 73, 0, 0, 0, 0 },
154 { 67, 79, 78, 71, 0, 0 }, { 67, 79, 85, 0, 0, 0 },
155 { 67, 85, 0, 0, 0, 0 }, { 67, 85, 65, 78, 0, 0 },
156 { 67, 85, 73, 0, 0, 0 }, { 67, 85, 78, 0, 0, 0 },
157 { 67, 85, 79, 0, 0, 0 }, { 68, 65, 0, 0, 0, 0 },
158 { 68, 65, 73, 0, 0, 0 }, { 68, 65, 78, 0, 0, 0 },
159 { 68, 65, 78, 71, 0, 0 }, { 68, 65, 79, 0, 0, 0 },
160 { 68, 69, 0, 0, 0, 0 }, { 68, 69, 78, 0, 0, 0 },
161 { 68, 69, 78, 71, 0, 0 }, { 68, 73, 0, 0, 0, 0 },
162 { 68, 73, 65, 0, 0, 0 }, { 68, 73, 65, 78, 0, 0 },
163 { 68, 73, 65, 79, 0, 0 }, { 68, 73, 69, 0, 0, 0 },
164 { 68, 73, 78, 71, 0, 0 }, { 68, 73, 85, 0, 0, 0 },
165 { 68, 79, 78, 71, 0, 0 }, { 68, 79, 85, 0, 0, 0 },
166 { 68, 85, 0, 0, 0, 0 }, { 68, 85, 65, 78, 0, 0 },
167 { 68, 85, 73, 0, 0, 0 }, { 68, 85, 78, 0, 0, 0 },
168 { 68, 85, 79, 0, 0, 0 }, { 69, 0, 0, 0, 0, 0 },
169 { 69, 73, 0, 0, 0, 0 }, { 69, 78, 0, 0, 0, 0 },
170 { 69, 78, 71, 0, 0, 0 }, { 69, 82, 0, 0, 0, 0 },
171 { 70, 65, 0, 0, 0, 0 }, { 70, 65, 78, 0, 0, 0 },
172 { 70, 65, 78, 71, 0, 0 }, { 70, 69, 73, 0, 0, 0 },
173 { 70, 69, 78, 0, 0, 0 }, { 70, 69, 78, 71, 0, 0 },
174 { 70, 73, 65, 79, 0, 0 }, { 70, 79, 0, 0, 0, 0 },
175 { 70, 79, 85, 0, 0, 0 }, { 70, 85, 0, 0, 0, 0 },
176 { 71, 65, 0, 0, 0, 0 }, { 71, 65, 73, 0, 0, 0 },
177 { 71, 65, 78, 0, 0, 0 }, { 71, 65, 78, 71, 0, 0 },
178 { 71, 65, 79, 0, 0, 0 }, { 71, 69, 0, 0, 0, 0 },
179 { 71, 69, 73, 0, 0, 0 }, { 71, 69, 78, 0, 0, 0 },
180 { 71, 69, 78, 71, 0, 0 }, { 71, 79, 78, 71, 0, 0 },
181 { 71, 79, 85, 0, 0, 0 }, { 71, 85, 0, 0, 0, 0 },
182 { 71, 85, 65, 0, 0, 0 }, { 71, 85, 65, 73, 0, 0 },
183 { 71, 85, 65, 78, 0, 0 }, { 71, 85, 65, 78, 71, 0 },
184 { 71, 85, 73, 0, 0, 0 }, { 71, 85, 78, 0, 0, 0 },
185 { 71, 85, 79, 0, 0, 0 }, { 72, 65, 0, 0, 0, 0 },
186 { 72, 65, 73, 0, 0, 0 }, { 72, 65, 78, 0, 0, 0 },
187 { 72, 65, 78, 71, 0, 0 }, { 72, 65, 79, 0, 0, 0 },
188 { 72, 69, 0, 0, 0, 0 }, { 72, 69, 73, 0, 0, 0 },
189 { 72, 69, 78, 0, 0, 0 }, { 72, 69, 78, 71, 0, 0 },
190 { 72, 77, 0, 0, 0, 0 }, { 72, 79, 78, 71, 0, 0 },
191 { 72, 79, 85, 0, 0, 0 }, { 72, 85, 0, 0, 0, 0 },
192 { 72, 85, 65, 0, 0, 0 }, { 72, 85, 65, 73, 0, 0 },
193 { 72, 85, 65, 78, 0, 0 }, { 72, 85, 65, 78, 71, 0 },
194 { 72, 85, 73, 0, 0, 0 }, { 72, 85, 78, 0, 0, 0 },
195 { 72, 85, 79, 0, 0, 0 }, { 74, 73, 0, 0, 0, 0 },
196 { 74, 73, 65, 0, 0, 0 }, { 74, 73, 65, 78, 0, 0 },
197 { 74, 73, 65, 78, 71, 0 }, { 74, 73, 65, 79, 0, 0 },
198 { 74, 73, 69, 0, 0, 0 }, { 74, 73, 78, 0, 0, 0 },
199 { 74, 73, 78, 71, 0, 0 }, { 74, 73, 79, 78, 71, 0 },
200 { 74, 73, 85, 0, 0, 0 }, { 74, 85, 0, 0, 0, 0 },
201 { 74, 85, 65, 78, 0, 0 }, { 74, 85, 69, 0, 0, 0 },
202 { 74, 85, 78, 0, 0, 0 }, { 75, 65, 0, 0, 0, 0 },
203 { 75, 65, 73, 0, 0, 0 }, { 75, 65, 78, 0, 0, 0 },
204 { 75, 65, 78, 71, 0, 0 }, { 75, 65, 79, 0, 0, 0 },
205 { 75, 69, 0, 0, 0, 0 }, { 75, 69, 78, 0, 0, 0 },
206 { 75, 69, 78, 71, 0, 0 }, { 75, 79, 78, 71, 0, 0 },
207 { 75, 79, 85, 0, 0, 0 }, { 75, 85, 0, 0, 0, 0 },
208 { 75, 85, 65, 0, 0, 0 }, { 75, 85, 65, 73, 0, 0 },
209 { 75, 85, 65, 78, 0, 0 }, { 75, 85, 65, 78, 71, 0 },
210 { 75, 85, 73, 0, 0, 0 }, { 75, 85, 78, 0, 0, 0 },
211 { 75, 85, 79, 0, 0, 0 }, { 76, 65, 0, 0, 0, 0 },
212 { 76, 65, 73, 0, 0, 0 }, { 76, 65, 78, 0, 0, 0 },
213 { 76, 65, 78, 71, 0, 0 }, { 76, 65, 79, 0, 0, 0 },
214 { 76, 69, 0, 0, 0, 0 }, { 76, 69, 73, 0, 0, 0 },
215 { 76, 69, 78, 71, 0, 0 }, { 76, 73, 0, 0, 0, 0 },
216 { 76, 73, 65, 0, 0, 0 }, { 76, 73, 65, 78, 0, 0 },
217 { 76, 73, 65, 78, 71, 0 }, { 76, 73, 65, 79, 0, 0 },
218 { 76, 73, 69, 0, 0, 0 }, { 76, 73, 78, 0, 0, 0 },
219 { 76, 73, 78, 71, 0, 0 }, { 76, 73, 85, 0, 0, 0 },
220 { 76, 79, 0, 0, 0, 0 }, { 76, 79, 78, 71, 0, 0 },
221 { 76, 79, 85, 0, 0, 0 }, { 76, 85, 0, 0, 0, 0 },
222 { 76, 85, 65, 78, 0, 0 }, { 76, 85, 69, 0, 0, 0 },
223 { 76, 85, 78, 0, 0, 0 }, { 76, 85, 79, 0, 0, 0 },
224 { 77, 0, 0, 0, 0, 0 }, { 77, 65, 0, 0, 0, 0 },
225 { 77, 65, 73, 0, 0, 0 }, { 77, 65, 78, 0, 0, 0 },
226 { 77, 65, 78, 71, 0, 0 }, { 77, 65, 79, 0, 0, 0 },
227 { 77, 69, 0, 0, 0, 0 }, { 77, 69, 73, 0, 0, 0 },
228 { 77, 69, 78, 0, 0, 0 }, { 77, 69, 78, 71, 0, 0 },
229 { 77, 73, 0, 0, 0, 0 }, { 77, 73, 65, 78, 0, 0 },
230 { 77, 73, 65, 79, 0, 0 }, { 77, 73, 69, 0, 0, 0 },
231 { 77, 73, 78, 0, 0, 0 }, { 77, 73, 78, 71, 0, 0 },
232 { 77, 73, 85, 0, 0, 0 }, { 77, 79, 0, 0, 0, 0 },
233 { 77, 79, 85, 0, 0, 0 }, { 77, 85, 0, 0, 0, 0 },
234 { 78, 0, 0, 0, 0, 0 }, { 78, 65, 0, 0, 0, 0 },
235 { 78, 65, 73, 0, 0, 0 }, { 78, 65, 78, 0, 0, 0 },
236 { 78, 65, 78, 71, 0, 0 }, { 78, 65, 79, 0, 0, 0 },
237 { 78, 69, 0, 0, 0, 0 }, { 78, 69, 73, 0, 0, 0 },
238 { 78, 69, 78, 0, 0, 0 }, { 78, 69, 78, 71, 0, 0 },
239 { 78, 73, 0, 0, 0, 0 }, { 78, 73, 65, 78, 0, 0 },
240 { 78, 73, 65, 78, 71, 0 }, { 78, 73, 65, 79, 0, 0 },
241 { 78, 73, 69, 0, 0, 0 }, { 78, 73, 78, 0, 0, 0 },
242 { 78, 73, 78, 71, 0, 0 }, { 78, 73, 85, 0, 0, 0 },
243 { 78, 79, 78, 71, 0, 0 }, { 78, 79, 85, 0, 0, 0 },
244 { 78, 85, 0, 0, 0, 0 }, { 78, 85, 65, 78, 0, 0 },
245 { 78, 85, 69, 0, 0, 0 }, { 78, 85, 78, 0, 0, 0 },
246 { 78, 85, 79, 0, 0, 0 }, { 79, 0, 0, 0, 0, 0 },
247 { 79, 85, 0, 0, 0, 0 }, { 80, 65, 0, 0, 0, 0 },
248 { 80, 65, 73, 0, 0, 0 }, { 80, 65, 78, 0, 0, 0 },
249 { 80, 65, 78, 71, 0, 0 }, { 80, 65, 79, 0, 0, 0 },
250 { 80, 69, 73, 0, 0, 0 }, { 80, 69, 78, 0, 0, 0 },
251 { 80, 69, 78, 71, 0, 0 }, { 80, 73, 0, 0, 0, 0 },
252 { 80, 73, 65, 78, 0, 0 }, { 80, 73, 65, 79, 0, 0 },
253 { 80, 73, 69, 0, 0, 0 }, { 80, 73, 78, 0, 0, 0 },
254 { 80, 73, 78, 71, 0, 0 }, { 80, 79, 0, 0, 0, 0 },
255 { 80, 79, 85, 0, 0, 0 }, { 80, 85, 0, 0, 0, 0 },
256 { 81, 73, 0, 0, 0, 0 }, { 81, 73, 65, 0, 0, 0 },
257 { 81, 73, 65, 78, 0, 0 }, { 81, 73, 65, 78, 71, 0 },
258 { 81, 73, 65, 79, 0, 0 }, { 81, 73, 69, 0, 0, 0 },
259 { 81, 73, 78, 0, 0, 0 }, { 81, 73, 78, 71, 0, 0 },
260 { 81, 73, 79, 78, 71, 0 }, { 81, 73, 85, 0, 0, 0 },
261 { 81, 85, 0, 0, 0, 0 }, { 81, 85, 65, 78, 0, 0 },
262 { 81, 85, 69, 0, 0, 0 }, { 81, 85, 78, 0, 0, 0 },
263 { 82, 65, 78, 0, 0, 0 }, { 82, 65, 78, 71, 0, 0 },
264 { 82, 65, 79, 0, 0, 0 }, { 82, 69, 0, 0, 0, 0 },
265 { 82, 69, 78, 0, 0, 0 }, { 82, 69, 78, 71, 0, 0 },
266 { 82, 73, 0, 0, 0, 0 }, { 82, 79, 78, 71, 0, 0 },
267 { 82, 79, 85, 0, 0, 0 }, { 82, 85, 0, 0, 0, 0 },
268 { 82, 85, 65, 0, 0, 0 }, { 82, 85, 65, 78, 0, 0 },
269 { 82, 85, 73, 0, 0, 0 }, { 82, 85, 78, 0, 0, 0 },
270 { 82, 85, 79, 0, 0, 0 }, { 83, 65, 0, 0, 0, 0 },
271 { 83, 65, 73, 0, 0, 0 }, { 83, 65, 78, 0, 0, 0 },
272 { 83, 65, 78, 71, 0, 0 }, { 83, 65, 79, 0, 0, 0 },
273 { 83, 69, 0, 0, 0, 0 }, { 83, 69, 78, 0, 0, 0 },
274 { 83, 69, 78, 71, 0, 0 }, { 83, 72, 65, 0, 0, 0 },
275 { 83, 72, 65, 73, 0, 0 }, { 83, 72, 65, 78, 0, 0 },
276 { 83, 72, 65, 78, 71, 0 }, { 83, 72, 65, 79, 0, 0 },
277 { 83, 72, 69, 0, 0, 0 }, { 83, 72, 69, 78, 0, 0 },
278 { 88, 73, 78, 0, 0, 0 }, { 83, 72, 69, 78, 0, 0 },
279 { 83, 72, 69, 78, 71, 0 }, { 83, 72, 73, 0, 0, 0 },
280 { 83, 72, 79, 85, 0, 0 }, { 83, 72, 85, 0, 0, 0 },
281 { 83, 72, 85, 65, 0, 0 }, { 83, 72, 85, 65, 73, 0 },
282 { 83, 72, 85, 65, 78, 0 }, { 83, 72, 85, 65, 78, 71 },
283 { 83, 72, 85, 73, 0, 0 }, { 83, 72, 85, 78, 0, 0 },
284 { 83, 72, 85, 79, 0, 0 }, { 83, 73, 0, 0, 0, 0 },
285 { 83, 79, 78, 71, 0, 0 }, { 83, 79, 85, 0, 0, 0 },
286 { 83, 85, 0, 0, 0, 0 }, { 83, 85, 65, 78, 0, 0 },
287 { 83, 85, 73, 0, 0, 0 }, { 83, 85, 78, 0, 0, 0 },
288 { 83, 85, 79, 0, 0, 0 }, { 84, 65, 0, 0, 0, 0 },
289 { 84, 65, 73, 0, 0, 0 }, { 84, 65, 78, 0, 0, 0 },
290 { 84, 65, 78, 71, 0, 0 }, { 84, 65, 79, 0, 0, 0 },
291 { 84, 69, 0, 0, 0, 0 }, { 84, 69, 78, 71, 0, 0 },
292 { 84, 73, 0, 0, 0, 0 }, { 84, 73, 65, 78, 0, 0 },
293 { 84, 73, 65, 79, 0, 0 }, { 84, 73, 69, 0, 0, 0 },
294 { 84, 73, 78, 71, 0, 0 }, { 84, 79, 78, 71, 0, 0 },
295 { 84, 79, 85, 0, 0, 0 }, { 84, 85, 0, 0, 0, 0 },
296 { 84, 85, 65, 78, 0, 0 }, { 84, 85, 73, 0, 0, 0 },
297 { 84, 85, 78, 0, 0, 0 }, { 84, 85, 79, 0, 0, 0 },
298 { 87, 65, 0, 0, 0, 0 }, { 87, 65, 73, 0, 0, 0 },
299 { 87, 65, 78, 0, 0, 0 }, { 87, 65, 78, 71, 0, 0 },
300 { 87, 69, 73, 0, 0, 0 }, { 87, 69, 78, 0, 0, 0 },
301 { 87, 69, 78, 71, 0, 0 }, { 87, 79, 0, 0, 0, 0 },
302 { 87, 85, 0, 0, 0, 0 }, { 88, 73, 0, 0, 0, 0 },
303 { 88, 73, 65, 0, 0, 0 }, { 88, 73, 65, 78, 0, 0 },
304 { 88, 73, 65, 78, 71, 0 }, { 88, 73, 65, 79, 0, 0 },
305 { 88, 73, 69, 0, 0, 0 }, { 88, 73, 78, 0, 0, 0 },
306 { 88, 73, 78, 71, 0, 0 }, { 88, 73, 79, 78, 71, 0 },
307 { 88, 73, 85, 0, 0, 0 }, { 88, 85, 0, 0, 0, 0 },
308 { 88, 85, 65, 78, 0, 0 }, { 88, 85, 69, 0, 0, 0 },
309 { 88, 85, 78, 0, 0, 0 }, { 89, 65, 0, 0, 0, 0 },
310 { 89, 65, 78, 0, 0, 0 }, { 89, 65, 78, 71, 0, 0 },
311 { 89, 65, 79, 0, 0, 0 }, { 89, 69, 0, 0, 0, 0 },
312 { 89, 73, 0, 0, 0, 0 }, { 89, 73, 78, 0, 0, 0 },
313 { 89, 73, 78, 71, 0, 0 }, { 89, 79, 0, 0, 0, 0 },
314 { 89, 79, 78, 71, 0, 0 }, { 89, 79, 85, 0, 0, 0 },
315 { 89, 85, 0, 0, 0, 0 }, { 89, 85, 65, 78, 0, 0 },
316 { 89, 85, 69, 0, 0, 0 }, { 89, 85, 78, 0, 0, 0 },
317 { 74, 85, 78, 0, 0, 0 }, { 89, 85, 78, 0, 0, 0 },
318 { 90, 65, 0, 0, 0, 0 }, { 90, 65, 73, 0, 0, 0 },
319 { 90, 65, 78, 0, 0, 0 }, { 90, 65, 78, 71, 0, 0 },
320 { 90, 65, 79, 0, 0, 0 }, { 90, 69, 0, 0, 0, 0 },
321 { 90, 69, 73, 0, 0, 0 }, { 90, 69, 78, 0, 0, 0 },
322 { 90, 69, 78, 71, 0, 0 }, { 90, 72, 65, 0, 0, 0 },
323 { 90, 72, 65, 73, 0, 0 }, { 90, 72, 65, 78, 0, 0 },
324 { 90, 72, 65, 78, 71, 0 }, { 67, 72, 65, 78, 71, 0 },
325 { 90, 72, 65, 78, 71, 0 }, { 90, 72, 65, 79, 0, 0 },
326 { 90, 72, 69, 0, 0, 0 }, { 90, 72, 69, 78, 0, 0 },
327 { 90, 72, 69, 78, 71, 0 }, { 90, 72, 73, 0, 0, 0 },
328 { 83, 72, 73, 0, 0, 0 }, { 90, 72, 73, 0, 0, 0 },
329 { 90, 72, 79, 78, 71, 0 }, { 90, 72, 79, 85, 0, 0 },
330 { 90, 72, 85, 0, 0, 0 }, { 90, 72, 85, 65, 0, 0 },
331 { 90, 72, 85, 65, 73, 0 }, { 90, 72, 85, 65, 78, 0 },
332 { 90, 72, 85, 65, 78, 71 }, { 90, 72, 85, 73, 0, 0 },
333 { 90, 72, 85, 78, 0, 0 }, { 90, 72, 85, 79, 0, 0 },
334 { 90, 73, 0, 0, 0, 0 }, { 90, 79, 78, 71, 0, 0 },
335 { 90, 79, 85, 0, 0, 0 }, { 90, 85, 0, 0, 0, 0 },
336 { 90, 85, 65, 78, 0, 0 }, { 90, 85, 73, 0, 0, 0 },
337 { 90, 85, 78, 0, 0, 0 }, { 90, 85, 79, 0, 0, 0 },
338 { 0, 0, 0, 0, 0, 0 }, { 83, 72, 65, 78, 0, 0 },
339 { 0, 0, 0, 0, 0, 0 }, };
340
341 /**
342 * First and last Chinese character with known Pinyin according to zh
343 * collation
344 */
345 private static final String FIRST_PINYIN_UNIHAN = "\u963F";
346 private static final String LAST_PINYIN_UNIHAN = "\u9FFF";
347
348 private static final Collator COLLATOR = Collator.getInstance(Locale.CHINA);
349
350 private static HanziToPinyin sInstance;
351 private final boolean mHasChinaCollator;
352
353 public static class Token {
354 /**
355 * Separator between target string for each source char
356 */
357 public static final String SEPARATOR = " ";
358
359 public static final int LATIN = 1;
360 public static final int PINYIN = 2;
361 public static final int UNKNOWN = 3;
362
363 public Token() {
364 }
365
366 public Token(int type, String source, String target) {
367 this.type = type;
368 this.source = source;
369 this.target = target;
370 }
371
372 /**
373 * Type of this token, ASCII, PINYIN or UNKNOWN.
374 */
375 public int type;
376 /**
377 * Original string before translation.
378 */
379 public String source;
380 /**
381 * Translated string of source. For Han, target is corresponding Pinyin.
382 * Otherwise target is original string in source.
383 */
384 public String target;
385 }
386
387 protected HanziToPinyin(boolean hasChinaCollator) {
388 mHasChinaCollator = hasChinaCollator;
389 }
390
391 public static HanziToPinyin getInstance() {
392 synchronized (HanziToPinyin.class) {
393 if (sInstance != null) {
394 return sInstance;
395 }
396 // Check if zh_CN collation data is available
397 final Locale locale[] = Collator.getAvailableLocales();
398
399 // 增加的代码,增强。
400 final Locale chinaAddition = new Locale("zh");
401
402 for (int i = 0; i < locale.length; i++) {
403 if (locale[i].equals(Locale.CHINA)
404 || locale[i].equals(chinaAddition)) {
405 // Do self validation just once.
406 if (DEBUG) {
407 Log.d(TAG, "Self validation. Result: "
408 + doSelfValidation());
409 }
410 sInstance = new HanziToPinyin(true);
411 return sInstance;
412 }
413 }
414 Log.w(TAG,
415 "There is no Chinese collator, HanziToPinyin is disabled");
416 sInstance = new HanziToPinyin(false);
417 return sInstance;
418 }
419 }
420
421 /**
422 * Validate if our internal table has some wrong value.
423 *
424 * @return true when the table looks correct.
425 */
426 private static boolean doSelfValidation() {
427 char lastChar = UNIHANS[0];
428 String lastString = Character.toString(lastChar);
429 for (char c : UNIHANS) {
430 if (lastChar == c) {
431 continue;
432 }
433 final String curString = Character.toString(c);
434 int cmp = COLLATOR.compare(lastString, curString);
435 if (cmp >= 0) {
436 Log.e(TAG, "Internal error in Unihan table. "
437 + "The last string \"" + lastString
438 + "\" is greater than current string \"" + curString
439 + "\".");
440 return false;
441 }
442 lastString = curString;
443 }
444 return true;
445 }
446
447 private Token getToken(char character) {
448 Token token = new Token();
449 final String letter = Character.toString(character);
450 token.source = letter;
451 int offset = -1;
452 int cmp;
453 if (character < 256) {
454 token.type = Token.LATIN;
455 token.target = letter;
456 return token;
457 } else {
458 cmp = COLLATOR.compare(letter, FIRST_PINYIN_UNIHAN);
459 if (cmp < 0) {
460 token.type = Token.UNKNOWN;
461 token.target = letter;
462 return token;
463 } else if (cmp == 0) {
464 token.type = Token.PINYIN;
465 offset = 0;
466 } else {
467 cmp = COLLATOR.compare(letter, LAST_PINYIN_UNIHAN);
468 if (cmp > 0) {
469 token.type = Token.UNKNOWN;
470 token.target = letter;
471 return token;
472 } else if (cmp == 0) {
473 token.type = Token.PINYIN;
474 offset = UNIHANS.length - 1;
475 }
476 }
477 }
478
479 token.type = Token.PINYIN;
480 if (offset < 0) {
481 int begin = 0;
482 int end = UNIHANS.length - 1;
483 while (begin <= end) {
484 offset = (begin + end) / 2;
485 final String unihan = Character.toString(UNIHANS[offset]);
486 cmp = COLLATOR.compare(letter, unihan);
487 if (cmp == 0) {
488 break;
489 } else if (cmp > 0) {
490 begin = offset + 1;
491 } else {
492 end = offset - 1;
493 }
494 }
495 }
496 if (cmp < 0) {
497 offset--;
498 }
499 StringBuilder pinyin = new StringBuilder();
500 for (int j = 0; j < PINYINS[offset].length && PINYINS[offset][j] != 0; j++) {
501 pinyin.append((char) PINYINS[offset][j]);
502 }
503 token.target = pinyin.toString();
504 if (TextUtils.isEmpty(token.target)) {
505 token.type = Token.UNKNOWN;
506 token.target = token.source;
507 }
508 return token;
509 }
510
511 /**
512 * Convert the input to a array of tokens. The sequence of ASCII or Unknown
513 * characters without space will be put into a Token, One Hanzi character
514 * which has pinyin will be treated as a Token. If these is no China
515 * collator, the empty token array is returned.
516 */
517 public ArrayList<Token> get(final String input) {
518 ArrayList<Token> tokens = new ArrayList<Token>();
519 if (!mHasChinaCollator || TextUtils.isEmpty(input)) {
520 // return empty tokens.
521 return tokens;
522 }
523 final int inputLength = input.length();
524 final StringBuilder sb = new StringBuilder();
525 int tokenType = Token.LATIN;
526 // Go through the input, create a new token when
527 // a. Token type changed
528 // b. Get the Pinyin of current charater.
529 // c. current character is space.
530 for (int i = 0; i < inputLength; i++) {
531 final char character = input.charAt(i);
532 if (character == ' ') {
533 if (sb.length() > 0) {
534 addToken(sb, tokens, tokenType);
535 }
536 } else if (character < 256) {
537 if (tokenType != Token.LATIN && sb.length() > 0) {
538 addToken(sb, tokens, tokenType);
539 }
540 tokenType = Token.LATIN;
541 sb.append(character);
542 } else {
543 Token t = getToken(character);
544 if (t.type == Token.PINYIN) {
545 if (sb.length() > 0) {
546 addToken(sb, tokens, tokenType);
547 }
548 tokens.add(t);
549 tokenType = Token.PINYIN;
550 } else {
551 if (tokenType != t.type && sb.length() > 0) {
552 addToken(sb, tokens, tokenType);
553 }
554 tokenType = t.type;
555 sb.append(character);
556 }
557 }
558 }
559 if (sb.length() > 0) {
560 addToken(sb, tokens, tokenType);
561 }
562 return tokens;
563 }
564
565 private void addToken(final StringBuilder sb,
566 final ArrayList<Token> tokens, final int tokenType) {
567 String str = sb.toString();
568 tokens.add(new Token(tokenType, str, str));
569 sb.setLength(0);
570 }
571 }

HanziToPinyin.java

写一个MainActivity.java测试汉字转化为汉语拼音输出的效果:

 package zhangphil.hanyupinyin;

 import java.util.ArrayList;

 import zhangphil.hanyupinyin.HanziToPinyin.Token;
import android.app.Activity;
import android.os.Bundle; public class MainActivity extends Activity { @Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState); String s = "安卓";
System.out.println("汉字转拼音输出: " + getPinYin(s));
} // 输入汉字返回拼音的通用方法函数。
public static String getPinYin(String hanzi) {
ArrayList<Token> tokens = HanziToPinyin.getInstance().get(hanzi);
StringBuilder sb = new StringBuilder();
if (tokens != null && tokens.size() > 0) {
for (Token token : tokens) {
if (Token.PINYIN == token.type) {
sb.append(token.target);
} else {
sb.append(token.source);
}
}
} return sb.toString().toUpperCase();
}
}

结果输出如图:

(转)汉字转拼音HanziToPinyin的更多相关文章

  1. 优化后的 google提供的汉字转拼音类(针对某些htc等手机的不兼容情况)

    /* * Copyright (C) 2011 The Android Open Source Project * * Licensed under the Apache License, Versi ...

  2. 文件一键上传、汉字转拼音、excel文件上传下载功能模块的实现

    ----------------------------------------------------------------------------------------------[版权申明: ...

  3. iOS 汉字的拼音

    获取汉字的拼音 #import <Foundation/Foundation.h> @interface NSString (Utils) /** * 汉字的拼音 * * @return ...

  4. JavaScript 汉字与拼音互转终极方案 附JS拼音输入法

    转:http://www.codeceo.com/article/javascript-pinyin.html 前言 网上关于JS实现汉字和拼音互转的文章很多,但是比较杂乱,都是互相抄来抄去,而且有的 ...

  5. SQL汉字转拼音函数-支持首字母、全拼

    SQL汉字转拼音函数-支持首字母.全拼 FROM :http://my.oschina.net/ind/blog/191659 作者不详 --方法一sqlserver汉字转拼音首字母 --调用方法 s ...

  6. 【干货】JS版汉字与拼音互转终极方案,附简单的JS拼音输入法

    前言 网上关于JS实现汉字和拼音互转的文章很多,但是比较杂乱,都是互相抄来抄去,而且有的不支持多音字,有的不支持声调,有的字典文件太大,还比如有时候我仅仅是需要获取汉字拼音首字母却要引入200kb的字 ...

  7. C#汉字转拼音(支持多音字)

    之前由于项目需要,中间需要一个汉字转拼音和首拼的功能来做查询,感觉这种功能基本已经成熟化了,于是查找了相关的代码,首先引入眼帘的是下面两篇文章 1.C# 汉字转拼音(支持GB2312字符集中所有汉字) ...

  8. C#汉字转拼音(npinyin)将中文转换成拼音全文或首字母

    汉字转拼音貌似一直是C#开发的一个难题,无论什么方案都有一定的bug,之前使用了两种方案. 1.Chinese2Spell.cs 一些不能识别的汉字全部转为Z 2.Microsoft Visual S ...

  9. C#汉字转拼音帮助类

    using System; using System.Collections.Generic; using System.Text; using System.Text.RegularExpressi ...

随机推荐

  1. solr 范围查询

    []表示查询一个包括边界范围, {}表示查询一个不包括边界范围 A TO * 表示没有上界即>=A或是>A ,视使用的是[]还是{}而定 * TO A 表示没有下界即<=A或是< ...

  2. [HTML] Google IE-x.js 解决IEx与W3C标准的冲突

    如果分别用IE5.IE6.IE7浏览同一个网页,将可能出现不一样的效果.这是它们之间对CSS的解析选择器不一样或错误和个别bug所导致.为了解决这些错误和bug.我们不得不找到一个能平衡于它们之间的解 ...

  3. Android小项目之十一 应用程序的主界面

    ------- 源自梦想.永远是你IT事业的好友.只是勇敢地说出我学到! ---------- 按惯例,写在前面的:可能在学习Android的过程中,大家会和我一样,学习过大量的基础知识,很多的知识点 ...

  4. 页面 Backspace 功能禁锢

    var flag=false; window.document.onkeydown = function keyDown() { if(event.keyCode==8){ event.returnV ...

  5. php 笔试面试 总结

    一次小小的笔试面试经历,虽然是一些简单的问题,但是自己在这儿总结一下,也查一些资料,得出一些较好的答案,也能帮助自己成长. 1.自己熟悉的http状态码及其意义 其实这个题答案随处可见.这儿也还是记录 ...

  6. linux ----虚拟机无法与本地机通信

    1.以前能正常通信,电脑重启或休眠唤醒后, 虚拟机centos无法与本地主机通信,但能ping通同一局域网的其他主机,也能与外网通信 故障原因: 未找到 解决办法: 1.重启电脑 2.重启xshell ...

  7. Java中String常用方法

    java中String的常用方法1.length() 字符串的长度 例:char chars[]={'a','b'.'c'}; String s=new String(chars); int len= ...

  8. C++复习笔记

    好多东西都忘了,现在重新复习一遍,把遇到的要点都记录下来.随时更新. 指针 C保证在为数组分配存储空间的时候,指向数组之后的第一个位置的指针也是合法的.也就是说保证指针 a + SIZE 是合法的,但 ...

  9. CentOS对新加入的硬盘格式化

    [root@rac1 ~]# fdisk /dev/sdbDevice contains neither a valid DOS partition table, nor Sun, SGI or OS ...

  10. Laravel-Administrator enum使用数字key

    参考连接:Enum filter with numeric values 修改Fields\Enum::build()方法 $options['options'][] = array( 'id' =& ...