本文转载于:http://blog.csdn.net/zhangphil/article/details/47164665

Android系统本身自带有有将汉字转化为英文拼音的类和方法。具体的类就是HanziToPinyin.java。Android系统自身实现的通讯录中就使用了HanziToPinyin.java对中文通讯录做分组整理。通过HanziToPinyin.java可以将汉字转化为拼音输出,在一些应用中非常必须,比如联系人的分组,假设一个人通讯录中存有若干姓张(ZHANG)的联系人,那么所有姓张的联系人按理都应该分组在“Z”组下。又比如微信、QQ等等此类社交类APP,凡是涉及到联系人、好友分组排序的应用场景,则均需要将汉字转化为拼音然后依据首字母排序归类。
HanziToPinyin.java不是一个公开的类,只是谷歌官方内部在实现Android通讯录中私有使用的一个类,我们不能够直接像使用普通Android SDK API一样使用,但这没关系,我们完全可以将这个类文件拷贝出来,放到我们自己的项目中,直接使用。
HanziToPinyin.java的代码文件,谷歌官方的通讯录APP下:

packages/providers/ContactsProvider /src/com/android/providers/contacts/HanziToPinyin.java

网上也有这个HanziToPinyin.java类文件的项目地址。但是,直接使用这个 类不能正常工作,错误原因是:

"There is no Chinese collator, HanziToPinyin is disabled"

发生这一错误的代码块是在HanziToPinyin.java的方法:
public static HanziToPinyin getInstance();
具体原因是这个方法在一些非原生定制的Android系统中,对中文Locale的定义规则不同,导致原代码文件中的locale[i].equals(Locale.CHINA)返回false,不能识别,致使以后的代码全部失去功效。

对此问题的修复(解决方案)

我改进了判断条件,增加一些代码:
final Locale chinaAddition = new Locale("zh");
将此chinaAddition作为辅助条件也加入到条件判断中,

 if ( locale[i].equals(Locale.CHINA) ||  locale[i].equals(chinaAddition) ){

}

下面是我改进后的getInstance()方法全部代码:

 public static HanziToPinyin getInstance() {
synchronized (HanziToPinyin.class) {
if (sInstance != null) {
return sInstance;
}
// Check if zh_CN collation data is available
final Locale locale[] = Collator.getAvailableLocales(); // 增加的代码,增强。
final Locale chinaAddition = new Locale("zh"); for (int i = 0; i < locale.length; i++) {
if (locale[i].equals(Locale.CHINA)
|| locale[i].equals(chinaAddition)) {
// Do self validation just once.
if (DEBUG) {
Log.d(TAG, "Self validation. Result: "
+ doSelfValidation());
}
sInstance = new HanziToPinyin(true);
return sInstance;
}
}
Log.w(TAG,
"There is no Chinese collator, HanziToPinyin is disabled");
sInstance = new HanziToPinyin(false);
return sInstance;
}
}

经由改进增强,HanziToPinyin.java的全部源代码如下(代码可以复制到自己的项目中直接使用):

  1 /*
2 * Copyright (C) 2011 The Android Open Source Project
3 *
4 * Licensed under the Apache License, Version 2.0 (the "License");
5 * you may not use this file except in compliance with the License.
6 * You may obtain a copy of the License at
7 *
8 * http://www.apache.org/licenses/LICENSE-2.0
9 *
10 * Unless required by applicable law or agreed to in writing, software
11 * distributed under the License is distributed on an "AS IS" BASIS,
12 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13 * See the License for the specific language governing permissions and
14 * limitations under the License.
15 */
16
17 package zhangphil.hanyupinyin;
18
19 import android.text.TextUtils;
20 import android.util.Log;
21
22 import java.text.Collator;
23 import java.util.ArrayList;
24 import java.util.Locale;
25
26 /**
27 * An object to convert Chinese character to its corresponding pinyin string.
28 * For characters with multiple possible pinyin string, only one is selected
29 * according to collator. Polyphone is not supported in this implementation.
30 * This class is implemented to achieve the best runtime performance and minimum
31 * runtime resources with tolerable sacrifice of accuracy. This implementation
32 * highly depends on zh_CN ICU collation data and must be always synchronized
33 * with ICU.
34 *
35 * Currently this file is aligned to zh.txt in ICU 4.6 鏉ヨ嚜android4.2婧愮爜
36 */
37 public class HanziToPinyin {
38 private static final String TAG = "HanziToPinyin";
39
40 // Turn on this flag when we want to check internal data structure.
41 private static final boolean DEBUG = false;
42
43 /**
44 * Unihans array.
45 *
46 * Each unihans is the first one within same pinyin when collator is zh_CN.
47 */
48 public static final char[] UNIHANS = { '\u963f', '\u54ce', '\u5b89',
49 '\u80ae', '\u51f9', '\u516b', '\u6300', '\u6273', '\u90a6',
50 '\u52f9', '\u9642', '\u5954', '\u4f3b', '\u5c44', '\u8fb9',
51 '\u706c', '\u618b', '\u6c43', '\u51ab', '\u7676', '\u5cec',
52 '\u5693', '\u5072', '\u53c2', '\u4ed3', '\u64a1', '\u518a',
53 '\u5d7e', '\u66fd', '\u66fe', '\u5c64', '\u53c9', '\u8286',
54 '\u8fbf', '\u4f25', '\u6284', '\u8f66', '\u62bb', '\u6c88',
55 '\u6c89', '\u9637', '\u5403', '\u5145', '\u62bd', '\u51fa',
56 '\u6b3b', '\u63e3', '\u5ddb', '\u5205', '\u5439', '\u65fe',
57 '\u9034', '\u5472', '\u5306', '\u51d1', '\u7c97', '\u6c46',
58 '\u5d14', '\u90a8', '\u6413', '\u5491', '\u5446', '\u4e39',
59 '\u5f53', '\u5200', '\u561a', '\u6265', '\u706f', '\u6c10',
60 '\u55f2', '\u7538', '\u5201', '\u7239', '\u4e01', '\u4e1f',
61 '\u4e1c', '\u543a', '\u53be', '\u8011', '\u8968', '\u5428',
62 '\u591a', '\u59b8', '\u8bf6', '\u5940', '\u97a5', '\u513f',
63 '\u53d1', '\u5e06', '\u531a', '\u98de', '\u5206', '\u4e30',
64 '\u8985', '\u4ecf', '\u7d11', '\u4f15', '\u65ee', '\u4f85',
65 '\u7518', '\u5188', '\u768b', '\u6208', '\u7ed9', '\u6839',
66 '\u522f', '\u5de5', '\u52fe', '\u4f30', '\u74dc', '\u4e56',
67 '\u5173', '\u5149', '\u5f52', '\u4e28', '\u5459', '\u54c8',
68 '\u548d', '\u4f44', '\u592f', '\u8320', '\u8bc3', '\u9ed2',
69 '\u62eb', '\u4ea8', '\u5677', '\u53ff', '\u9f41', '\u4e6f',
70 '\u82b1', '\u6000', '\u72bf', '\u5ddf', '\u7070', '\u660f',
71 '\u5419', '\u4e0c', '\u52a0', '\u620b', '\u6c5f', '\u827d',
72 '\u9636', '\u5dfe', '\u5755', '\u5182', '\u4e29', '\u51e5',
73 '\u59e2', '\u5658', '\u519b', '\u5494', '\u5f00', '\u520a',
74 '\u5ffc', '\u5c3b', '\u533c', '\u808e', '\u52a5', '\u7a7a',
75 '\u62a0', '\u625d', '\u5938', '\u84af', '\u5bbd', '\u5321',
76 '\u4e8f', '\u5764', '\u6269', '\u5783', '\u6765', '\u5170',
77 '\u5577', '\u635e', '\u808b', '\u52d2', '\u5d1a', '\u5215',
78 '\u4fe9', '\u5941', '\u826f', '\u64a9', '\u5217', '\u62ce',
79 '\u5222', '\u6e9c', '\u56d6', '\u9f99', '\u779c', '\u565c',
80 '\u5a08', '\u7567', '\u62a1', '\u7f57', '\u5463', '\u5988',
81 '\u57cb', '\u5ada', '\u7264', '\u732b', '\u4e48', '\u5445',
82 '\u95e8', '\u753f', '\u54aa', '\u5b80', '\u55b5', '\u4e5c',
83 '\u6c11', '\u540d', '\u8c2c', '\u6478', '\u54de', '\u6bea',
84 '\u55ef', '\u62cf', '\u8149', '\u56e1', '\u56d4', '\u5b6c',
85 '\u7592', '\u5a1e', '\u6041', '\u80fd', '\u59ae', '\u62c8',
86 '\u5b22', '\u9e1f', '\u634f', '\u56dc', '\u5b81', '\u599e',
87 '\u519c', '\u7fba', '\u5974', '\u597b', '\u759f', '\u9ec1',
88 '\u90cd', '\u5594', '\u8bb4', '\u5991', '\u62cd', '\u7705',
89 '\u4e53', '\u629b', '\u5478', '\u55b7', '\u5309', '\u4e15',
90 '\u56e8', '\u527d', '\u6c15', '\u59d8', '\u4e52', '\u948b',
91 '\u5256', '\u4ec6', '\u4e03', '\u6390', '\u5343', '\u545b',
92 '\u6084', '\u767f', '\u4eb2', '\u72c5', '\u828e', '\u4e18',
93 '\u533a', '\u5cd1', '\u7f3a', '\u590b', '\u5465', '\u7a63',
94 '\u5a06', '\u60f9', '\u4eba', '\u6254', '\u65e5', '\u8338',
95 '\u53b9', '\u909a', '\u633c', '\u5827', '\u5a51', '\u77a4',
96 '\u637c', '\u4ee8', '\u6be2', '\u4e09', '\u6852', '\u63bb',
97 '\u95aa', '\u68ee', '\u50e7', '\u6740', '\u7b5b', '\u5c71',
98 '\u4f24', '\u5f30', '\u5962', '\u7533', '\u8398', '\u6552',
99 '\u5347', '\u5c38', '\u53ce', '\u4e66', '\u5237', '\u8870',
100 '\u95e9', '\u53cc', '\u8c01', '\u542e', '\u8bf4', '\u53b6',
101 '\u5fea', '\u635c', '\u82cf', '\u72fb', '\u590a', '\u5b59',
102 '\u5506', '\u4ed6', '\u56fc', '\u574d', '\u6c64', '\u5932',
103 '\u5fd1', '\u71a5', '\u5254', '\u5929', '\u65eb', '\u5e16',
104 '\u5385', '\u56f2', '\u5077', '\u51f8', '\u6e4d', '\u63a8',
105 '\u541e', '\u4e47', '\u7a75', '\u6b6a', '\u5f2f', '\u5c23',
106 '\u5371', '\u6637', '\u7fc1', '\u631d', '\u4e4c', '\u5915',
107 '\u8672', '\u4eda', '\u4e61', '\u7071', '\u4e9b', '\u5fc3',
108 '\u661f', '\u51f6', '\u4f11', '\u5401', '\u5405', '\u524a',
109 '\u5743', '\u4e2b', '\u6079', '\u592e', '\u5e7a', '\u503b',
110 '\u4e00', '\u56d9', '\u5e94', '\u54df', '\u4f63', '\u4f18',
111 '\u625c', '\u56e6', '\u66f0', '\u6655', '\u7b60', '\u7b7c',
112 '\u5e00', '\u707d', '\u5142', '\u5328', '\u50ae', '\u5219',
113 '\u8d3c', '\u600e', '\u5897', '\u624e', '\u635a', '\u6cbe',
114 '\u5f20', '\u957f', '\u9577', '\u4f4b', '\u8707', '\u8d1e',
115 '\u4e89', '\u4e4b', '\u5cd9', '\u5ea2', '\u4e2d', '\u5dde',
116 '\u6731', '\u6293', '\u62fd', '\u4e13', '\u5986', '\u96b9',
117 '\u5b92', '\u5353', '\u4e72', '\u5b97', '\u90b9', '\u79df',
118 '\u94bb', '\u539c', '\u5c0a', '\u6628', '\u5159', '\u9fc3',
119 '\u9fc4', };
120
121 /**
122 * Pinyin array.
123 *
124 * Each pinyin is corresponding to unihans of same offset in the unihans
125 * array.
126 */
127 public static final byte[][] PINYINS = { { 65, 0, 0, 0, 0, 0 },
128 { 65, 73, 0, 0, 0, 0 }, { 65, 78, 0, 0, 0, 0 },
129 { 65, 78, 71, 0, 0, 0 }, { 65, 79, 0, 0, 0, 0 },
130 { 66, 65, 0, 0, 0, 0 }, { 66, 65, 73, 0, 0, 0 },
131 { 66, 65, 78, 0, 0, 0 }, { 66, 65, 78, 71, 0, 0 },
132 { 66, 65, 79, 0, 0, 0 }, { 66, 69, 73, 0, 0, 0 },
133 { 66, 69, 78, 0, 0, 0 }, { 66, 69, 78, 71, 0, 0 },
134 { 66, 73, 0, 0, 0, 0 }, { 66, 73, 65, 78, 0, 0 },
135 { 66, 73, 65, 79, 0, 0 }, { 66, 73, 69, 0, 0, 0 },
136 { 66, 73, 78, 0, 0, 0 }, { 66, 73, 78, 71, 0, 0 },
137 { 66, 79, 0, 0, 0, 0 }, { 66, 85, 0, 0, 0, 0 },
138 { 67, 65, 0, 0, 0, 0 }, { 67, 65, 73, 0, 0, 0 },
139 { 67, 65, 78, 0, 0, 0 }, { 67, 65, 78, 71, 0, 0 },
140 { 67, 65, 79, 0, 0, 0 }, { 67, 69, 0, 0, 0, 0 },
141 { 67, 69, 78, 0, 0, 0 }, { 67, 69, 78, 71, 0, 0 },
142 { 90, 69, 78, 71, 0, 0 }, { 67, 69, 78, 71, 0, 0 },
143 { 67, 72, 65, 0, 0, 0 }, { 67, 72, 65, 73, 0, 0 },
144 { 67, 72, 65, 78, 0, 0 }, { 67, 72, 65, 78, 71, 0 },
145 { 67, 72, 65, 79, 0, 0 }, { 67, 72, 69, 0, 0, 0 },
146 { 67, 72, 69, 78, 0, 0 }, { 83, 72, 69, 78, 0, 0 },
147 { 67, 72, 69, 78, 0, 0 }, { 67, 72, 69, 78, 71, 0 },
148 { 67, 72, 73, 0, 0, 0 }, { 67, 72, 79, 78, 71, 0 },
149 { 67, 72, 79, 85, 0, 0 }, { 67, 72, 85, 0, 0, 0 },
150 { 67, 72, 85, 65, 0, 0 }, { 67, 72, 85, 65, 73, 0 },
151 { 67, 72, 85, 65, 78, 0 }, { 67, 72, 85, 65, 78, 71 },
152 { 67, 72, 85, 73, 0, 0 }, { 67, 72, 85, 78, 0, 0 },
153 { 67, 72, 85, 79, 0, 0 }, { 67, 73, 0, 0, 0, 0 },
154 { 67, 79, 78, 71, 0, 0 }, { 67, 79, 85, 0, 0, 0 },
155 { 67, 85, 0, 0, 0, 0 }, { 67, 85, 65, 78, 0, 0 },
156 { 67, 85, 73, 0, 0, 0 }, { 67, 85, 78, 0, 0, 0 },
157 { 67, 85, 79, 0, 0, 0 }, { 68, 65, 0, 0, 0, 0 },
158 { 68, 65, 73, 0, 0, 0 }, { 68, 65, 78, 0, 0, 0 },
159 { 68, 65, 78, 71, 0, 0 }, { 68, 65, 79, 0, 0, 0 },
160 { 68, 69, 0, 0, 0, 0 }, { 68, 69, 78, 0, 0, 0 },
161 { 68, 69, 78, 71, 0, 0 }, { 68, 73, 0, 0, 0, 0 },
162 { 68, 73, 65, 0, 0, 0 }, { 68, 73, 65, 78, 0, 0 },
163 { 68, 73, 65, 79, 0, 0 }, { 68, 73, 69, 0, 0, 0 },
164 { 68, 73, 78, 71, 0, 0 }, { 68, 73, 85, 0, 0, 0 },
165 { 68, 79, 78, 71, 0, 0 }, { 68, 79, 85, 0, 0, 0 },
166 { 68, 85, 0, 0, 0, 0 }, { 68, 85, 65, 78, 0, 0 },
167 { 68, 85, 73, 0, 0, 0 }, { 68, 85, 78, 0, 0, 0 },
168 { 68, 85, 79, 0, 0, 0 }, { 69, 0, 0, 0, 0, 0 },
169 { 69, 73, 0, 0, 0, 0 }, { 69, 78, 0, 0, 0, 0 },
170 { 69, 78, 71, 0, 0, 0 }, { 69, 82, 0, 0, 0, 0 },
171 { 70, 65, 0, 0, 0, 0 }, { 70, 65, 78, 0, 0, 0 },
172 { 70, 65, 78, 71, 0, 0 }, { 70, 69, 73, 0, 0, 0 },
173 { 70, 69, 78, 0, 0, 0 }, { 70, 69, 78, 71, 0, 0 },
174 { 70, 73, 65, 79, 0, 0 }, { 70, 79, 0, 0, 0, 0 },
175 { 70, 79, 85, 0, 0, 0 }, { 70, 85, 0, 0, 0, 0 },
176 { 71, 65, 0, 0, 0, 0 }, { 71, 65, 73, 0, 0, 0 },
177 { 71, 65, 78, 0, 0, 0 }, { 71, 65, 78, 71, 0, 0 },
178 { 71, 65, 79, 0, 0, 0 }, { 71, 69, 0, 0, 0, 0 },
179 { 71, 69, 73, 0, 0, 0 }, { 71, 69, 78, 0, 0, 0 },
180 { 71, 69, 78, 71, 0, 0 }, { 71, 79, 78, 71, 0, 0 },
181 { 71, 79, 85, 0, 0, 0 }, { 71, 85, 0, 0, 0, 0 },
182 { 71, 85, 65, 0, 0, 0 }, { 71, 85, 65, 73, 0, 0 },
183 { 71, 85, 65, 78, 0, 0 }, { 71, 85, 65, 78, 71, 0 },
184 { 71, 85, 73, 0, 0, 0 }, { 71, 85, 78, 0, 0, 0 },
185 { 71, 85, 79, 0, 0, 0 }, { 72, 65, 0, 0, 0, 0 },
186 { 72, 65, 73, 0, 0, 0 }, { 72, 65, 78, 0, 0, 0 },
187 { 72, 65, 78, 71, 0, 0 }, { 72, 65, 79, 0, 0, 0 },
188 { 72, 69, 0, 0, 0, 0 }, { 72, 69, 73, 0, 0, 0 },
189 { 72, 69, 78, 0, 0, 0 }, { 72, 69, 78, 71, 0, 0 },
190 { 72, 77, 0, 0, 0, 0 }, { 72, 79, 78, 71, 0, 0 },
191 { 72, 79, 85, 0, 0, 0 }, { 72, 85, 0, 0, 0, 0 },
192 { 72, 85, 65, 0, 0, 0 }, { 72, 85, 65, 73, 0, 0 },
193 { 72, 85, 65, 78, 0, 0 }, { 72, 85, 65, 78, 71, 0 },
194 { 72, 85, 73, 0, 0, 0 }, { 72, 85, 78, 0, 0, 0 },
195 { 72, 85, 79, 0, 0, 0 }, { 74, 73, 0, 0, 0, 0 },
196 { 74, 73, 65, 0, 0, 0 }, { 74, 73, 65, 78, 0, 0 },
197 { 74, 73, 65, 78, 71, 0 }, { 74, 73, 65, 79, 0, 0 },
198 { 74, 73, 69, 0, 0, 0 }, { 74, 73, 78, 0, 0, 0 },
199 { 74, 73, 78, 71, 0, 0 }, { 74, 73, 79, 78, 71, 0 },
200 { 74, 73, 85, 0, 0, 0 }, { 74, 85, 0, 0, 0, 0 },
201 { 74, 85, 65, 78, 0, 0 }, { 74, 85, 69, 0, 0, 0 },
202 { 74, 85, 78, 0, 0, 0 }, { 75, 65, 0, 0, 0, 0 },
203 { 75, 65, 73, 0, 0, 0 }, { 75, 65, 78, 0, 0, 0 },
204 { 75, 65, 78, 71, 0, 0 }, { 75, 65, 79, 0, 0, 0 },
205 { 75, 69, 0, 0, 0, 0 }, { 75, 69, 78, 0, 0, 0 },
206 { 75, 69, 78, 71, 0, 0 }, { 75, 79, 78, 71, 0, 0 },
207 { 75, 79, 85, 0, 0, 0 }, { 75, 85, 0, 0, 0, 0 },
208 { 75, 85, 65, 0, 0, 0 }, { 75, 85, 65, 73, 0, 0 },
209 { 75, 85, 65, 78, 0, 0 }, { 75, 85, 65, 78, 71, 0 },
210 { 75, 85, 73, 0, 0, 0 }, { 75, 85, 78, 0, 0, 0 },
211 { 75, 85, 79, 0, 0, 0 }, { 76, 65, 0, 0, 0, 0 },
212 { 76, 65, 73, 0, 0, 0 }, { 76, 65, 78, 0, 0, 0 },
213 { 76, 65, 78, 71, 0, 0 }, { 76, 65, 79, 0, 0, 0 },
214 { 76, 69, 0, 0, 0, 0 }, { 76, 69, 73, 0, 0, 0 },
215 { 76, 69, 78, 71, 0, 0 }, { 76, 73, 0, 0, 0, 0 },
216 { 76, 73, 65, 0, 0, 0 }, { 76, 73, 65, 78, 0, 0 },
217 { 76, 73, 65, 78, 71, 0 }, { 76, 73, 65, 79, 0, 0 },
218 { 76, 73, 69, 0, 0, 0 }, { 76, 73, 78, 0, 0, 0 },
219 { 76, 73, 78, 71, 0, 0 }, { 76, 73, 85, 0, 0, 0 },
220 { 76, 79, 0, 0, 0, 0 }, { 76, 79, 78, 71, 0, 0 },
221 { 76, 79, 85, 0, 0, 0 }, { 76, 85, 0, 0, 0, 0 },
222 { 76, 85, 65, 78, 0, 0 }, { 76, 85, 69, 0, 0, 0 },
223 { 76, 85, 78, 0, 0, 0 }, { 76, 85, 79, 0, 0, 0 },
224 { 77, 0, 0, 0, 0, 0 }, { 77, 65, 0, 0, 0, 0 },
225 { 77, 65, 73, 0, 0, 0 }, { 77, 65, 78, 0, 0, 0 },
226 { 77, 65, 78, 71, 0, 0 }, { 77, 65, 79, 0, 0, 0 },
227 { 77, 69, 0, 0, 0, 0 }, { 77, 69, 73, 0, 0, 0 },
228 { 77, 69, 78, 0, 0, 0 }, { 77, 69, 78, 71, 0, 0 },
229 { 77, 73, 0, 0, 0, 0 }, { 77, 73, 65, 78, 0, 0 },
230 { 77, 73, 65, 79, 0, 0 }, { 77, 73, 69, 0, 0, 0 },
231 { 77, 73, 78, 0, 0, 0 }, { 77, 73, 78, 71, 0, 0 },
232 { 77, 73, 85, 0, 0, 0 }, { 77, 79, 0, 0, 0, 0 },
233 { 77, 79, 85, 0, 0, 0 }, { 77, 85, 0, 0, 0, 0 },
234 { 78, 0, 0, 0, 0, 0 }, { 78, 65, 0, 0, 0, 0 },
235 { 78, 65, 73, 0, 0, 0 }, { 78, 65, 78, 0, 0, 0 },
236 { 78, 65, 78, 71, 0, 0 }, { 78, 65, 79, 0, 0, 0 },
237 { 78, 69, 0, 0, 0, 0 }, { 78, 69, 73, 0, 0, 0 },
238 { 78, 69, 78, 0, 0, 0 }, { 78, 69, 78, 71, 0, 0 },
239 { 78, 73, 0, 0, 0, 0 }, { 78, 73, 65, 78, 0, 0 },
240 { 78, 73, 65, 78, 71, 0 }, { 78, 73, 65, 79, 0, 0 },
241 { 78, 73, 69, 0, 0, 0 }, { 78, 73, 78, 0, 0, 0 },
242 { 78, 73, 78, 71, 0, 0 }, { 78, 73, 85, 0, 0, 0 },
243 { 78, 79, 78, 71, 0, 0 }, { 78, 79, 85, 0, 0, 0 },
244 { 78, 85, 0, 0, 0, 0 }, { 78, 85, 65, 78, 0, 0 },
245 { 78, 85, 69, 0, 0, 0 }, { 78, 85, 78, 0, 0, 0 },
246 { 78, 85, 79, 0, 0, 0 }, { 79, 0, 0, 0, 0, 0 },
247 { 79, 85, 0, 0, 0, 0 }, { 80, 65, 0, 0, 0, 0 },
248 { 80, 65, 73, 0, 0, 0 }, { 80, 65, 78, 0, 0, 0 },
249 { 80, 65, 78, 71, 0, 0 }, { 80, 65, 79, 0, 0, 0 },
250 { 80, 69, 73, 0, 0, 0 }, { 80, 69, 78, 0, 0, 0 },
251 { 80, 69, 78, 71, 0, 0 }, { 80, 73, 0, 0, 0, 0 },
252 { 80, 73, 65, 78, 0, 0 }, { 80, 73, 65, 79, 0, 0 },
253 { 80, 73, 69, 0, 0, 0 }, { 80, 73, 78, 0, 0, 0 },
254 { 80, 73, 78, 71, 0, 0 }, { 80, 79, 0, 0, 0, 0 },
255 { 80, 79, 85, 0, 0, 0 }, { 80, 85, 0, 0, 0, 0 },
256 { 81, 73, 0, 0, 0, 0 }, { 81, 73, 65, 0, 0, 0 },
257 { 81, 73, 65, 78, 0, 0 }, { 81, 73, 65, 78, 71, 0 },
258 { 81, 73, 65, 79, 0, 0 }, { 81, 73, 69, 0, 0, 0 },
259 { 81, 73, 78, 0, 0, 0 }, { 81, 73, 78, 71, 0, 0 },
260 { 81, 73, 79, 78, 71, 0 }, { 81, 73, 85, 0, 0, 0 },
261 { 81, 85, 0, 0, 0, 0 }, { 81, 85, 65, 78, 0, 0 },
262 { 81, 85, 69, 0, 0, 0 }, { 81, 85, 78, 0, 0, 0 },
263 { 82, 65, 78, 0, 0, 0 }, { 82, 65, 78, 71, 0, 0 },
264 { 82, 65, 79, 0, 0, 0 }, { 82, 69, 0, 0, 0, 0 },
265 { 82, 69, 78, 0, 0, 0 }, { 82, 69, 78, 71, 0, 0 },
266 { 82, 73, 0, 0, 0, 0 }, { 82, 79, 78, 71, 0, 0 },
267 { 82, 79, 85, 0, 0, 0 }, { 82, 85, 0, 0, 0, 0 },
268 { 82, 85, 65, 0, 0, 0 }, { 82, 85, 65, 78, 0, 0 },
269 { 82, 85, 73, 0, 0, 0 }, { 82, 85, 78, 0, 0, 0 },
270 { 82, 85, 79, 0, 0, 0 }, { 83, 65, 0, 0, 0, 0 },
271 { 83, 65, 73, 0, 0, 0 }, { 83, 65, 78, 0, 0, 0 },
272 { 83, 65, 78, 71, 0, 0 }, { 83, 65, 79, 0, 0, 0 },
273 { 83, 69, 0, 0, 0, 0 }, { 83, 69, 78, 0, 0, 0 },
274 { 83, 69, 78, 71, 0, 0 }, { 83, 72, 65, 0, 0, 0 },
275 { 83, 72, 65, 73, 0, 0 }, { 83, 72, 65, 78, 0, 0 },
276 { 83, 72, 65, 78, 71, 0 }, { 83, 72, 65, 79, 0, 0 },
277 { 83, 72, 69, 0, 0, 0 }, { 83, 72, 69, 78, 0, 0 },
278 { 88, 73, 78, 0, 0, 0 }, { 83, 72, 69, 78, 0, 0 },
279 { 83, 72, 69, 78, 71, 0 }, { 83, 72, 73, 0, 0, 0 },
280 { 83, 72, 79, 85, 0, 0 }, { 83, 72, 85, 0, 0, 0 },
281 { 83, 72, 85, 65, 0, 0 }, { 83, 72, 85, 65, 73, 0 },
282 { 83, 72, 85, 65, 78, 0 }, { 83, 72, 85, 65, 78, 71 },
283 { 83, 72, 85, 73, 0, 0 }, { 83, 72, 85, 78, 0, 0 },
284 { 83, 72, 85, 79, 0, 0 }, { 83, 73, 0, 0, 0, 0 },
285 { 83, 79, 78, 71, 0, 0 }, { 83, 79, 85, 0, 0, 0 },
286 { 83, 85, 0, 0, 0, 0 }, { 83, 85, 65, 78, 0, 0 },
287 { 83, 85, 73, 0, 0, 0 }, { 83, 85, 78, 0, 0, 0 },
288 { 83, 85, 79, 0, 0, 0 }, { 84, 65, 0, 0, 0, 0 },
289 { 84, 65, 73, 0, 0, 0 }, { 84, 65, 78, 0, 0, 0 },
290 { 84, 65, 78, 71, 0, 0 }, { 84, 65, 79, 0, 0, 0 },
291 { 84, 69, 0, 0, 0, 0 }, { 84, 69, 78, 71, 0, 0 },
292 { 84, 73, 0, 0, 0, 0 }, { 84, 73, 65, 78, 0, 0 },
293 { 84, 73, 65, 79, 0, 0 }, { 84, 73, 69, 0, 0, 0 },
294 { 84, 73, 78, 71, 0, 0 }, { 84, 79, 78, 71, 0, 0 },
295 { 84, 79, 85, 0, 0, 0 }, { 84, 85, 0, 0, 0, 0 },
296 { 84, 85, 65, 78, 0, 0 }, { 84, 85, 73, 0, 0, 0 },
297 { 84, 85, 78, 0, 0, 0 }, { 84, 85, 79, 0, 0, 0 },
298 { 87, 65, 0, 0, 0, 0 }, { 87, 65, 73, 0, 0, 0 },
299 { 87, 65, 78, 0, 0, 0 }, { 87, 65, 78, 71, 0, 0 },
300 { 87, 69, 73, 0, 0, 0 }, { 87, 69, 78, 0, 0, 0 },
301 { 87, 69, 78, 71, 0, 0 }, { 87, 79, 0, 0, 0, 0 },
302 { 87, 85, 0, 0, 0, 0 }, { 88, 73, 0, 0, 0, 0 },
303 { 88, 73, 65, 0, 0, 0 }, { 88, 73, 65, 78, 0, 0 },
304 { 88, 73, 65, 78, 71, 0 }, { 88, 73, 65, 79, 0, 0 },
305 { 88, 73, 69, 0, 0, 0 }, { 88, 73, 78, 0, 0, 0 },
306 { 88, 73, 78, 71, 0, 0 }, { 88, 73, 79, 78, 71, 0 },
307 { 88, 73, 85, 0, 0, 0 }, { 88, 85, 0, 0, 0, 0 },
308 { 88, 85, 65, 78, 0, 0 }, { 88, 85, 69, 0, 0, 0 },
309 { 88, 85, 78, 0, 0, 0 }, { 89, 65, 0, 0, 0, 0 },
310 { 89, 65, 78, 0, 0, 0 }, { 89, 65, 78, 71, 0, 0 },
311 { 89, 65, 79, 0, 0, 0 }, { 89, 69, 0, 0, 0, 0 },
312 { 89, 73, 0, 0, 0, 0 }, { 89, 73, 78, 0, 0, 0 },
313 { 89, 73, 78, 71, 0, 0 }, { 89, 79, 0, 0, 0, 0 },
314 { 89, 79, 78, 71, 0, 0 }, { 89, 79, 85, 0, 0, 0 },
315 { 89, 85, 0, 0, 0, 0 }, { 89, 85, 65, 78, 0, 0 },
316 { 89, 85, 69, 0, 0, 0 }, { 89, 85, 78, 0, 0, 0 },
317 { 74, 85, 78, 0, 0, 0 }, { 89, 85, 78, 0, 0, 0 },
318 { 90, 65, 0, 0, 0, 0 }, { 90, 65, 73, 0, 0, 0 },
319 { 90, 65, 78, 0, 0, 0 }, { 90, 65, 78, 71, 0, 0 },
320 { 90, 65, 79, 0, 0, 0 }, { 90, 69, 0, 0, 0, 0 },
321 { 90, 69, 73, 0, 0, 0 }, { 90, 69, 78, 0, 0, 0 },
322 { 90, 69, 78, 71, 0, 0 }, { 90, 72, 65, 0, 0, 0 },
323 { 90, 72, 65, 73, 0, 0 }, { 90, 72, 65, 78, 0, 0 },
324 { 90, 72, 65, 78, 71, 0 }, { 67, 72, 65, 78, 71, 0 },
325 { 90, 72, 65, 78, 71, 0 }, { 90, 72, 65, 79, 0, 0 },
326 { 90, 72, 69, 0, 0, 0 }, { 90, 72, 69, 78, 0, 0 },
327 { 90, 72, 69, 78, 71, 0 }, { 90, 72, 73, 0, 0, 0 },
328 { 83, 72, 73, 0, 0, 0 }, { 90, 72, 73, 0, 0, 0 },
329 { 90, 72, 79, 78, 71, 0 }, { 90, 72, 79, 85, 0, 0 },
330 { 90, 72, 85, 0, 0, 0 }, { 90, 72, 85, 65, 0, 0 },
331 { 90, 72, 85, 65, 73, 0 }, { 90, 72, 85, 65, 78, 0 },
332 { 90, 72, 85, 65, 78, 71 }, { 90, 72, 85, 73, 0, 0 },
333 { 90, 72, 85, 78, 0, 0 }, { 90, 72, 85, 79, 0, 0 },
334 { 90, 73, 0, 0, 0, 0 }, { 90, 79, 78, 71, 0, 0 },
335 { 90, 79, 85, 0, 0, 0 }, { 90, 85, 0, 0, 0, 0 },
336 { 90, 85, 65, 78, 0, 0 }, { 90, 85, 73, 0, 0, 0 },
337 { 90, 85, 78, 0, 0, 0 }, { 90, 85, 79, 0, 0, 0 },
338 { 0, 0, 0, 0, 0, 0 }, { 83, 72, 65, 78, 0, 0 },
339 { 0, 0, 0, 0, 0, 0 }, };
340
341 /**
342 * First and last Chinese character with known Pinyin according to zh
343 * collation
344 */
345 private static final String FIRST_PINYIN_UNIHAN = "\u963F";
346 private static final String LAST_PINYIN_UNIHAN = "\u9FFF";
347
348 private static final Collator COLLATOR = Collator.getInstance(Locale.CHINA);
349
350 private static HanziToPinyin sInstance;
351 private final boolean mHasChinaCollator;
352
353 public static class Token {
354 /**
355 * Separator between target string for each source char
356 */
357 public static final String SEPARATOR = " ";
358
359 public static final int LATIN = 1;
360 public static final int PINYIN = 2;
361 public static final int UNKNOWN = 3;
362
363 public Token() {
364 }
365
366 public Token(int type, String source, String target) {
367 this.type = type;
368 this.source = source;
369 this.target = target;
370 }
371
372 /**
373 * Type of this token, ASCII, PINYIN or UNKNOWN.
374 */
375 public int type;
376 /**
377 * Original string before translation.
378 */
379 public String source;
380 /**
381 * Translated string of source. For Han, target is corresponding Pinyin.
382 * Otherwise target is original string in source.
383 */
384 public String target;
385 }
386
387 protected HanziToPinyin(boolean hasChinaCollator) {
388 mHasChinaCollator = hasChinaCollator;
389 }
390
391 public static HanziToPinyin getInstance() {
392 synchronized (HanziToPinyin.class) {
393 if (sInstance != null) {
394 return sInstance;
395 }
396 // Check if zh_CN collation data is available
397 final Locale locale[] = Collator.getAvailableLocales();
398
399 // 增加的代码,增强。
400 final Locale chinaAddition = new Locale("zh");
401
402 for (int i = 0; i < locale.length; i++) {
403 if (locale[i].equals(Locale.CHINA)
404 || locale[i].equals(chinaAddition)) {
405 // Do self validation just once.
406 if (DEBUG) {
407 Log.d(TAG, "Self validation. Result: "
408 + doSelfValidation());
409 }
410 sInstance = new HanziToPinyin(true);
411 return sInstance;
412 }
413 }
414 Log.w(TAG,
415 "There is no Chinese collator, HanziToPinyin is disabled");
416 sInstance = new HanziToPinyin(false);
417 return sInstance;
418 }
419 }
420
421 /**
422 * Validate if our internal table has some wrong value.
423 *
424 * @return true when the table looks correct.
425 */
426 private static boolean doSelfValidation() {
427 char lastChar = UNIHANS[0];
428 String lastString = Character.toString(lastChar);
429 for (char c : UNIHANS) {
430 if (lastChar == c) {
431 continue;
432 }
433 final String curString = Character.toString(c);
434 int cmp = COLLATOR.compare(lastString, curString);
435 if (cmp >= 0) {
436 Log.e(TAG, "Internal error in Unihan table. "
437 + "The last string \"" + lastString
438 + "\" is greater than current string \"" + curString
439 + "\".");
440 return false;
441 }
442 lastString = curString;
443 }
444 return true;
445 }
446
447 private Token getToken(char character) {
448 Token token = new Token();
449 final String letter = Character.toString(character);
450 token.source = letter;
451 int offset = -1;
452 int cmp;
453 if (character < 256) {
454 token.type = Token.LATIN;
455 token.target = letter;
456 return token;
457 } else {
458 cmp = COLLATOR.compare(letter, FIRST_PINYIN_UNIHAN);
459 if (cmp < 0) {
460 token.type = Token.UNKNOWN;
461 token.target = letter;
462 return token;
463 } else if (cmp == 0) {
464 token.type = Token.PINYIN;
465 offset = 0;
466 } else {
467 cmp = COLLATOR.compare(letter, LAST_PINYIN_UNIHAN);
468 if (cmp > 0) {
469 token.type = Token.UNKNOWN;
470 token.target = letter;
471 return token;
472 } else if (cmp == 0) {
473 token.type = Token.PINYIN;
474 offset = UNIHANS.length - 1;
475 }
476 }
477 }
478
479 token.type = Token.PINYIN;
480 if (offset < 0) {
481 int begin = 0;
482 int end = UNIHANS.length - 1;
483 while (begin <= end) {
484 offset = (begin + end) / 2;
485 final String unihan = Character.toString(UNIHANS[offset]);
486 cmp = COLLATOR.compare(letter, unihan);
487 if (cmp == 0) {
488 break;
489 } else if (cmp > 0) {
490 begin = offset + 1;
491 } else {
492 end = offset - 1;
493 }
494 }
495 }
496 if (cmp < 0) {
497 offset--;
498 }
499 StringBuilder pinyin = new StringBuilder();
500 for (int j = 0; j < PINYINS[offset].length && PINYINS[offset][j] != 0; j++) {
501 pinyin.append((char) PINYINS[offset][j]);
502 }
503 token.target = pinyin.toString();
504 if (TextUtils.isEmpty(token.target)) {
505 token.type = Token.UNKNOWN;
506 token.target = token.source;
507 }
508 return token;
509 }
510
511 /**
512 * Convert the input to a array of tokens. The sequence of ASCII or Unknown
513 * characters without space will be put into a Token, One Hanzi character
514 * which has pinyin will be treated as a Token. If these is no China
515 * collator, the empty token array is returned.
516 */
517 public ArrayList<Token> get(final String input) {
518 ArrayList<Token> tokens = new ArrayList<Token>();
519 if (!mHasChinaCollator || TextUtils.isEmpty(input)) {
520 // return empty tokens.
521 return tokens;
522 }
523 final int inputLength = input.length();
524 final StringBuilder sb = new StringBuilder();
525 int tokenType = Token.LATIN;
526 // Go through the input, create a new token when
527 // a. Token type changed
528 // b. Get the Pinyin of current charater.
529 // c. current character is space.
530 for (int i = 0; i < inputLength; i++) {
531 final char character = input.charAt(i);
532 if (character == ' ') {
533 if (sb.length() > 0) {
534 addToken(sb, tokens, tokenType);
535 }
536 } else if (character < 256) {
537 if (tokenType != Token.LATIN && sb.length() > 0) {
538 addToken(sb, tokens, tokenType);
539 }
540 tokenType = Token.LATIN;
541 sb.append(character);
542 } else {
543 Token t = getToken(character);
544 if (t.type == Token.PINYIN) {
545 if (sb.length() > 0) {
546 addToken(sb, tokens, tokenType);
547 }
548 tokens.add(t);
549 tokenType = Token.PINYIN;
550 } else {
551 if (tokenType != t.type && sb.length() > 0) {
552 addToken(sb, tokens, tokenType);
553 }
554 tokenType = t.type;
555 sb.append(character);
556 }
557 }
558 }
559 if (sb.length() > 0) {
560 addToken(sb, tokens, tokenType);
561 }
562 return tokens;
563 }
564
565 private void addToken(final StringBuilder sb,
566 final ArrayList<Token> tokens, final int tokenType) {
567 String str = sb.toString();
568 tokens.add(new Token(tokenType, str, str));
569 sb.setLength(0);
570 }
571 }

HanziToPinyin.java

写一个MainActivity.java测试汉字转化为汉语拼音输出的效果:

 package zhangphil.hanyupinyin;

 import java.util.ArrayList;

 import zhangphil.hanyupinyin.HanziToPinyin.Token;
import android.app.Activity;
import android.os.Bundle; public class MainActivity extends Activity { @Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState); String s = "安卓";
System.out.println("汉字转拼音输出: " + getPinYin(s));
} // 输入汉字返回拼音的通用方法函数。
public static String getPinYin(String hanzi) {
ArrayList<Token> tokens = HanziToPinyin.getInstance().get(hanzi);
StringBuilder sb = new StringBuilder();
if (tokens != null && tokens.size() > 0) {
for (Token token : tokens) {
if (Token.PINYIN == token.type) {
sb.append(token.target);
} else {
sb.append(token.source);
}
}
} return sb.toString().toUpperCase();
}
}

结果输出如图:

(转)汉字转拼音HanziToPinyin的更多相关文章

  1. 优化后的 google提供的汉字转拼音类(针对某些htc等手机的不兼容情况)

    /* * Copyright (C) 2011 The Android Open Source Project * * Licensed under the Apache License, Versi ...

  2. 文件一键上传、汉字转拼音、excel文件上传下载功能模块的实现

    ----------------------------------------------------------------------------------------------[版权申明: ...

  3. iOS 汉字的拼音

    获取汉字的拼音 #import <Foundation/Foundation.h> @interface NSString (Utils) /** * 汉字的拼音 * * @return ...

  4. JavaScript 汉字与拼音互转终极方案 附JS拼音输入法

    转:http://www.codeceo.com/article/javascript-pinyin.html 前言 网上关于JS实现汉字和拼音互转的文章很多,但是比较杂乱,都是互相抄来抄去,而且有的 ...

  5. SQL汉字转拼音函数-支持首字母、全拼

    SQL汉字转拼音函数-支持首字母.全拼 FROM :http://my.oschina.net/ind/blog/191659 作者不详 --方法一sqlserver汉字转拼音首字母 --调用方法 s ...

  6. 【干货】JS版汉字与拼音互转终极方案,附简单的JS拼音输入法

    前言 网上关于JS实现汉字和拼音互转的文章很多,但是比较杂乱,都是互相抄来抄去,而且有的不支持多音字,有的不支持声调,有的字典文件太大,还比如有时候我仅仅是需要获取汉字拼音首字母却要引入200kb的字 ...

  7. C#汉字转拼音(支持多音字)

    之前由于项目需要,中间需要一个汉字转拼音和首拼的功能来做查询,感觉这种功能基本已经成熟化了,于是查找了相关的代码,首先引入眼帘的是下面两篇文章 1.C# 汉字转拼音(支持GB2312字符集中所有汉字) ...

  8. C#汉字转拼音(npinyin)将中文转换成拼音全文或首字母

    汉字转拼音貌似一直是C#开发的一个难题,无论什么方案都有一定的bug,之前使用了两种方案. 1.Chinese2Spell.cs 一些不能识别的汉字全部转为Z 2.Microsoft Visual S ...

  9. C#汉字转拼音帮助类

    using System; using System.Collections.Generic; using System.Text; using System.Text.RegularExpressi ...

随机推荐

  1. 结合源码看nginx-1.4.0之nginx多进程机制详解

    目录 0. 摘要 1. nginx多进程设计思想 2. nginx多进程设计数据结构 3. nginx进程间通信机制 4. 一个简单的多进程例子 5. 小结 6. 参考源码

  2. 【Shell脚本学习19】Shell while循环

    while循环用于不断执行一系列命令,也用于从输入文件中读取数据:命令通常为测试条件.其格式为: while command do    Statement(s) to be executed if ...

  3. 算法java(Robert Sedgewick)基本API-StdOut.java

    /************************************************************************* * Compilation: javac StdO ...

  4. codeforces 613B B. Skills(枚举+二分+贪心)

    题目链接: B. Skills time limit per test 2 seconds memory limit per test 256 megabytes input standard inp ...

  5. hdu 4612 Warm up 桥缩点

    4612Warm hdu up 题目:给出一个图,添加一条边之后,问能够在新图中得到的最少的桥的数量. 分析:我们可以双联通分量进行缩点,原图变成了一棵树.问题变成了:求树中添加一条边之后,使得不在圈 ...

  6. C++之时间统计

    1.最精确 QueryPerformanceFrequency(&nFreq); cout <<nFreq.QuadPart<<endl;//获得计数频率 QueryP ...

  7. 如何调试异步加载的js文件(浏览器调试动态加载js)

    描述 1:jQuery->var obj= new $.js_Obj():等异步加载js文件,执行方法. obj.method(): 2:页面估计不变,通过声明不同的js文件,进行页面内容的转换 ...

  8. C#中调用API

    介绍 API( Application Programming Interface ),我想大家不会陌生,它是我们Windows编程的常客,虽然基于.Net平台的C#有了强大的类库,但是,我们还是不能 ...

  9. 使用SQLite3持久保存应用程序数据

    前言 SQL是一种数据库查询语言,用于存取数据以及查询.更新和管理关系数据库系统,因为强大的查询功能和简单的语法,已经成为主流数据库的标准语言.SQLite3是一种嵌入式的数据库,无需服务器支持,它将 ...

  10. 第一个Cocos2d-JS游戏

    我们的编写的第一个Cocos2d-JS程序,命名为HelloJS,从该工程开始学习其它的内容.创建工程我们创建Cocos2d-JS工程可以通过Cocos2d-x提供的命令工具cocos实现,但这种方式 ...