本文转载于:http://blog.csdn.net/zhangphil/article/details/47164665

Android系统本身自带有有将汉字转化为英文拼音的类和方法。具体的类就是HanziToPinyin.java。Android系统自身实现的通讯录中就使用了HanziToPinyin.java对中文通讯录做分组整理。通过HanziToPinyin.java可以将汉字转化为拼音输出,在一些应用中非常必须,比如联系人的分组,假设一个人通讯录中存有若干姓张(ZHANG)的联系人,那么所有姓张的联系人按理都应该分组在“Z”组下。又比如微信、QQ等等此类社交类APP,凡是涉及到联系人、好友分组排序的应用场景,则均需要将汉字转化为拼音然后依据首字母排序归类。
HanziToPinyin.java不是一个公开的类,只是谷歌官方内部在实现Android通讯录中私有使用的一个类,我们不能够直接像使用普通Android SDK API一样使用,但这没关系,我们完全可以将这个类文件拷贝出来,放到我们自己的项目中,直接使用。
HanziToPinyin.java的代码文件,谷歌官方的通讯录APP下:

packages/providers/ContactsProvider /src/com/android/providers/contacts/HanziToPinyin.java

网上也有这个HanziToPinyin.java类文件的项目地址。但是,直接使用这个 类不能正常工作,错误原因是:

"There is no Chinese collator, HanziToPinyin is disabled"

发生这一错误的代码块是在HanziToPinyin.java的方法:
public static HanziToPinyin getInstance();
具体原因是这个方法在一些非原生定制的Android系统中,对中文Locale的定义规则不同,导致原代码文件中的locale[i].equals(Locale.CHINA)返回false,不能识别,致使以后的代码全部失去功效。

对此问题的修复(解决方案)

我改进了判断条件,增加一些代码:
final Locale chinaAddition = new Locale("zh");
将此chinaAddition作为辅助条件也加入到条件判断中,

 if ( locale[i].equals(Locale.CHINA) ||  locale[i].equals(chinaAddition) ){

}

下面是我改进后的getInstance()方法全部代码:

 public static HanziToPinyin getInstance() {
synchronized (HanziToPinyin.class) {
if (sInstance != null) {
return sInstance;
}
// Check if zh_CN collation data is available
final Locale locale[] = Collator.getAvailableLocales(); // 增加的代码,增强。
final Locale chinaAddition = new Locale("zh"); for (int i = 0; i < locale.length; i++) {
if (locale[i].equals(Locale.CHINA)
|| locale[i].equals(chinaAddition)) {
// Do self validation just once.
if (DEBUG) {
Log.d(TAG, "Self validation. Result: "
+ doSelfValidation());
}
sInstance = new HanziToPinyin(true);
return sInstance;
}
}
Log.w(TAG,
"There is no Chinese collator, HanziToPinyin is disabled");
sInstance = new HanziToPinyin(false);
return sInstance;
}
}

经由改进增强,HanziToPinyin.java的全部源代码如下(代码可以复制到自己的项目中直接使用):

  1 /*
2 * Copyright (C) 2011 The Android Open Source Project
3 *
4 * Licensed under the Apache License, Version 2.0 (the "License");
5 * you may not use this file except in compliance with the License.
6 * You may obtain a copy of the License at
7 *
8 * http://www.apache.org/licenses/LICENSE-2.0
9 *
10 * Unless required by applicable law or agreed to in writing, software
11 * distributed under the License is distributed on an "AS IS" BASIS,
12 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13 * See the License for the specific language governing permissions and
14 * limitations under the License.
15 */
16
17 package zhangphil.hanyupinyin;
18
19 import android.text.TextUtils;
20 import android.util.Log;
21
22 import java.text.Collator;
23 import java.util.ArrayList;
24 import java.util.Locale;
25
26 /**
27 * An object to convert Chinese character to its corresponding pinyin string.
28 * For characters with multiple possible pinyin string, only one is selected
29 * according to collator. Polyphone is not supported in this implementation.
30 * This class is implemented to achieve the best runtime performance and minimum
31 * runtime resources with tolerable sacrifice of accuracy. This implementation
32 * highly depends on zh_CN ICU collation data and must be always synchronized
33 * with ICU.
34 *
35 * Currently this file is aligned to zh.txt in ICU 4.6 鏉ヨ嚜android4.2婧愮爜
36 */
37 public class HanziToPinyin {
38 private static final String TAG = "HanziToPinyin";
39
40 // Turn on this flag when we want to check internal data structure.
41 private static final boolean DEBUG = false;
42
43 /**
44 * Unihans array.
45 *
46 * Each unihans is the first one within same pinyin when collator is zh_CN.
47 */
48 public static final char[] UNIHANS = { '\u963f', '\u54ce', '\u5b89',
49 '\u80ae', '\u51f9', '\u516b', '\u6300', '\u6273', '\u90a6',
50 '\u52f9', '\u9642', '\u5954', '\u4f3b', '\u5c44', '\u8fb9',
51 '\u706c', '\u618b', '\u6c43', '\u51ab', '\u7676', '\u5cec',
52 '\u5693', '\u5072', '\u53c2', '\u4ed3', '\u64a1', '\u518a',
53 '\u5d7e', '\u66fd', '\u66fe', '\u5c64', '\u53c9', '\u8286',
54 '\u8fbf', '\u4f25', '\u6284', '\u8f66', '\u62bb', '\u6c88',
55 '\u6c89', '\u9637', '\u5403', '\u5145', '\u62bd', '\u51fa',
56 '\u6b3b', '\u63e3', '\u5ddb', '\u5205', '\u5439', '\u65fe',
57 '\u9034', '\u5472', '\u5306', '\u51d1', '\u7c97', '\u6c46',
58 '\u5d14', '\u90a8', '\u6413', '\u5491', '\u5446', '\u4e39',
59 '\u5f53', '\u5200', '\u561a', '\u6265', '\u706f', '\u6c10',
60 '\u55f2', '\u7538', '\u5201', '\u7239', '\u4e01', '\u4e1f',
61 '\u4e1c', '\u543a', '\u53be', '\u8011', '\u8968', '\u5428',
62 '\u591a', '\u59b8', '\u8bf6', '\u5940', '\u97a5', '\u513f',
63 '\u53d1', '\u5e06', '\u531a', '\u98de', '\u5206', '\u4e30',
64 '\u8985', '\u4ecf', '\u7d11', '\u4f15', '\u65ee', '\u4f85',
65 '\u7518', '\u5188', '\u768b', '\u6208', '\u7ed9', '\u6839',
66 '\u522f', '\u5de5', '\u52fe', '\u4f30', '\u74dc', '\u4e56',
67 '\u5173', '\u5149', '\u5f52', '\u4e28', '\u5459', '\u54c8',
68 '\u548d', '\u4f44', '\u592f', '\u8320', '\u8bc3', '\u9ed2',
69 '\u62eb', '\u4ea8', '\u5677', '\u53ff', '\u9f41', '\u4e6f',
70 '\u82b1', '\u6000', '\u72bf', '\u5ddf', '\u7070', '\u660f',
71 '\u5419', '\u4e0c', '\u52a0', '\u620b', '\u6c5f', '\u827d',
72 '\u9636', '\u5dfe', '\u5755', '\u5182', '\u4e29', '\u51e5',
73 '\u59e2', '\u5658', '\u519b', '\u5494', '\u5f00', '\u520a',
74 '\u5ffc', '\u5c3b', '\u533c', '\u808e', '\u52a5', '\u7a7a',
75 '\u62a0', '\u625d', '\u5938', '\u84af', '\u5bbd', '\u5321',
76 '\u4e8f', '\u5764', '\u6269', '\u5783', '\u6765', '\u5170',
77 '\u5577', '\u635e', '\u808b', '\u52d2', '\u5d1a', '\u5215',
78 '\u4fe9', '\u5941', '\u826f', '\u64a9', '\u5217', '\u62ce',
79 '\u5222', '\u6e9c', '\u56d6', '\u9f99', '\u779c', '\u565c',
80 '\u5a08', '\u7567', '\u62a1', '\u7f57', '\u5463', '\u5988',
81 '\u57cb', '\u5ada', '\u7264', '\u732b', '\u4e48', '\u5445',
82 '\u95e8', '\u753f', '\u54aa', '\u5b80', '\u55b5', '\u4e5c',
83 '\u6c11', '\u540d', '\u8c2c', '\u6478', '\u54de', '\u6bea',
84 '\u55ef', '\u62cf', '\u8149', '\u56e1', '\u56d4', '\u5b6c',
85 '\u7592', '\u5a1e', '\u6041', '\u80fd', '\u59ae', '\u62c8',
86 '\u5b22', '\u9e1f', '\u634f', '\u56dc', '\u5b81', '\u599e',
87 '\u519c', '\u7fba', '\u5974', '\u597b', '\u759f', '\u9ec1',
88 '\u90cd', '\u5594', '\u8bb4', '\u5991', '\u62cd', '\u7705',
89 '\u4e53', '\u629b', '\u5478', '\u55b7', '\u5309', '\u4e15',
90 '\u56e8', '\u527d', '\u6c15', '\u59d8', '\u4e52', '\u948b',
91 '\u5256', '\u4ec6', '\u4e03', '\u6390', '\u5343', '\u545b',
92 '\u6084', '\u767f', '\u4eb2', '\u72c5', '\u828e', '\u4e18',
93 '\u533a', '\u5cd1', '\u7f3a', '\u590b', '\u5465', '\u7a63',
94 '\u5a06', '\u60f9', '\u4eba', '\u6254', '\u65e5', '\u8338',
95 '\u53b9', '\u909a', '\u633c', '\u5827', '\u5a51', '\u77a4',
96 '\u637c', '\u4ee8', '\u6be2', '\u4e09', '\u6852', '\u63bb',
97 '\u95aa', '\u68ee', '\u50e7', '\u6740', '\u7b5b', '\u5c71',
98 '\u4f24', '\u5f30', '\u5962', '\u7533', '\u8398', '\u6552',
99 '\u5347', '\u5c38', '\u53ce', '\u4e66', '\u5237', '\u8870',
100 '\u95e9', '\u53cc', '\u8c01', '\u542e', '\u8bf4', '\u53b6',
101 '\u5fea', '\u635c', '\u82cf', '\u72fb', '\u590a', '\u5b59',
102 '\u5506', '\u4ed6', '\u56fc', '\u574d', '\u6c64', '\u5932',
103 '\u5fd1', '\u71a5', '\u5254', '\u5929', '\u65eb', '\u5e16',
104 '\u5385', '\u56f2', '\u5077', '\u51f8', '\u6e4d', '\u63a8',
105 '\u541e', '\u4e47', '\u7a75', '\u6b6a', '\u5f2f', '\u5c23',
106 '\u5371', '\u6637', '\u7fc1', '\u631d', '\u4e4c', '\u5915',
107 '\u8672', '\u4eda', '\u4e61', '\u7071', '\u4e9b', '\u5fc3',
108 '\u661f', '\u51f6', '\u4f11', '\u5401', '\u5405', '\u524a',
109 '\u5743', '\u4e2b', '\u6079', '\u592e', '\u5e7a', '\u503b',
110 '\u4e00', '\u56d9', '\u5e94', '\u54df', '\u4f63', '\u4f18',
111 '\u625c', '\u56e6', '\u66f0', '\u6655', '\u7b60', '\u7b7c',
112 '\u5e00', '\u707d', '\u5142', '\u5328', '\u50ae', '\u5219',
113 '\u8d3c', '\u600e', '\u5897', '\u624e', '\u635a', '\u6cbe',
114 '\u5f20', '\u957f', '\u9577', '\u4f4b', '\u8707', '\u8d1e',
115 '\u4e89', '\u4e4b', '\u5cd9', '\u5ea2', '\u4e2d', '\u5dde',
116 '\u6731', '\u6293', '\u62fd', '\u4e13', '\u5986', '\u96b9',
117 '\u5b92', '\u5353', '\u4e72', '\u5b97', '\u90b9', '\u79df',
118 '\u94bb', '\u539c', '\u5c0a', '\u6628', '\u5159', '\u9fc3',
119 '\u9fc4', };
120
121 /**
122 * Pinyin array.
123 *
124 * Each pinyin is corresponding to unihans of same offset in the unihans
125 * array.
126 */
127 public static final byte[][] PINYINS = { { 65, 0, 0, 0, 0, 0 },
128 { 65, 73, 0, 0, 0, 0 }, { 65, 78, 0, 0, 0, 0 },
129 { 65, 78, 71, 0, 0, 0 }, { 65, 79, 0, 0, 0, 0 },
130 { 66, 65, 0, 0, 0, 0 }, { 66, 65, 73, 0, 0, 0 },
131 { 66, 65, 78, 0, 0, 0 }, { 66, 65, 78, 71, 0, 0 },
132 { 66, 65, 79, 0, 0, 0 }, { 66, 69, 73, 0, 0, 0 },
133 { 66, 69, 78, 0, 0, 0 }, { 66, 69, 78, 71, 0, 0 },
134 { 66, 73, 0, 0, 0, 0 }, { 66, 73, 65, 78, 0, 0 },
135 { 66, 73, 65, 79, 0, 0 }, { 66, 73, 69, 0, 0, 0 },
136 { 66, 73, 78, 0, 0, 0 }, { 66, 73, 78, 71, 0, 0 },
137 { 66, 79, 0, 0, 0, 0 }, { 66, 85, 0, 0, 0, 0 },
138 { 67, 65, 0, 0, 0, 0 }, { 67, 65, 73, 0, 0, 0 },
139 { 67, 65, 78, 0, 0, 0 }, { 67, 65, 78, 71, 0, 0 },
140 { 67, 65, 79, 0, 0, 0 }, { 67, 69, 0, 0, 0, 0 },
141 { 67, 69, 78, 0, 0, 0 }, { 67, 69, 78, 71, 0, 0 },
142 { 90, 69, 78, 71, 0, 0 }, { 67, 69, 78, 71, 0, 0 },
143 { 67, 72, 65, 0, 0, 0 }, { 67, 72, 65, 73, 0, 0 },
144 { 67, 72, 65, 78, 0, 0 }, { 67, 72, 65, 78, 71, 0 },
145 { 67, 72, 65, 79, 0, 0 }, { 67, 72, 69, 0, 0, 0 },
146 { 67, 72, 69, 78, 0, 0 }, { 83, 72, 69, 78, 0, 0 },
147 { 67, 72, 69, 78, 0, 0 }, { 67, 72, 69, 78, 71, 0 },
148 { 67, 72, 73, 0, 0, 0 }, { 67, 72, 79, 78, 71, 0 },
149 { 67, 72, 79, 85, 0, 0 }, { 67, 72, 85, 0, 0, 0 },
150 { 67, 72, 85, 65, 0, 0 }, { 67, 72, 85, 65, 73, 0 },
151 { 67, 72, 85, 65, 78, 0 }, { 67, 72, 85, 65, 78, 71 },
152 { 67, 72, 85, 73, 0, 0 }, { 67, 72, 85, 78, 0, 0 },
153 { 67, 72, 85, 79, 0, 0 }, { 67, 73, 0, 0, 0, 0 },
154 { 67, 79, 78, 71, 0, 0 }, { 67, 79, 85, 0, 0, 0 },
155 { 67, 85, 0, 0, 0, 0 }, { 67, 85, 65, 78, 0, 0 },
156 { 67, 85, 73, 0, 0, 0 }, { 67, 85, 78, 0, 0, 0 },
157 { 67, 85, 79, 0, 0, 0 }, { 68, 65, 0, 0, 0, 0 },
158 { 68, 65, 73, 0, 0, 0 }, { 68, 65, 78, 0, 0, 0 },
159 { 68, 65, 78, 71, 0, 0 }, { 68, 65, 79, 0, 0, 0 },
160 { 68, 69, 0, 0, 0, 0 }, { 68, 69, 78, 0, 0, 0 },
161 { 68, 69, 78, 71, 0, 0 }, { 68, 73, 0, 0, 0, 0 },
162 { 68, 73, 65, 0, 0, 0 }, { 68, 73, 65, 78, 0, 0 },
163 { 68, 73, 65, 79, 0, 0 }, { 68, 73, 69, 0, 0, 0 },
164 { 68, 73, 78, 71, 0, 0 }, { 68, 73, 85, 0, 0, 0 },
165 { 68, 79, 78, 71, 0, 0 }, { 68, 79, 85, 0, 0, 0 },
166 { 68, 85, 0, 0, 0, 0 }, { 68, 85, 65, 78, 0, 0 },
167 { 68, 85, 73, 0, 0, 0 }, { 68, 85, 78, 0, 0, 0 },
168 { 68, 85, 79, 0, 0, 0 }, { 69, 0, 0, 0, 0, 0 },
169 { 69, 73, 0, 0, 0, 0 }, { 69, 78, 0, 0, 0, 0 },
170 { 69, 78, 71, 0, 0, 0 }, { 69, 82, 0, 0, 0, 0 },
171 { 70, 65, 0, 0, 0, 0 }, { 70, 65, 78, 0, 0, 0 },
172 { 70, 65, 78, 71, 0, 0 }, { 70, 69, 73, 0, 0, 0 },
173 { 70, 69, 78, 0, 0, 0 }, { 70, 69, 78, 71, 0, 0 },
174 { 70, 73, 65, 79, 0, 0 }, { 70, 79, 0, 0, 0, 0 },
175 { 70, 79, 85, 0, 0, 0 }, { 70, 85, 0, 0, 0, 0 },
176 { 71, 65, 0, 0, 0, 0 }, { 71, 65, 73, 0, 0, 0 },
177 { 71, 65, 78, 0, 0, 0 }, { 71, 65, 78, 71, 0, 0 },
178 { 71, 65, 79, 0, 0, 0 }, { 71, 69, 0, 0, 0, 0 },
179 { 71, 69, 73, 0, 0, 0 }, { 71, 69, 78, 0, 0, 0 },
180 { 71, 69, 78, 71, 0, 0 }, { 71, 79, 78, 71, 0, 0 },
181 { 71, 79, 85, 0, 0, 0 }, { 71, 85, 0, 0, 0, 0 },
182 { 71, 85, 65, 0, 0, 0 }, { 71, 85, 65, 73, 0, 0 },
183 { 71, 85, 65, 78, 0, 0 }, { 71, 85, 65, 78, 71, 0 },
184 { 71, 85, 73, 0, 0, 0 }, { 71, 85, 78, 0, 0, 0 },
185 { 71, 85, 79, 0, 0, 0 }, { 72, 65, 0, 0, 0, 0 },
186 { 72, 65, 73, 0, 0, 0 }, { 72, 65, 78, 0, 0, 0 },
187 { 72, 65, 78, 71, 0, 0 }, { 72, 65, 79, 0, 0, 0 },
188 { 72, 69, 0, 0, 0, 0 }, { 72, 69, 73, 0, 0, 0 },
189 { 72, 69, 78, 0, 0, 0 }, { 72, 69, 78, 71, 0, 0 },
190 { 72, 77, 0, 0, 0, 0 }, { 72, 79, 78, 71, 0, 0 },
191 { 72, 79, 85, 0, 0, 0 }, { 72, 85, 0, 0, 0, 0 },
192 { 72, 85, 65, 0, 0, 0 }, { 72, 85, 65, 73, 0, 0 },
193 { 72, 85, 65, 78, 0, 0 }, { 72, 85, 65, 78, 71, 0 },
194 { 72, 85, 73, 0, 0, 0 }, { 72, 85, 78, 0, 0, 0 },
195 { 72, 85, 79, 0, 0, 0 }, { 74, 73, 0, 0, 0, 0 },
196 { 74, 73, 65, 0, 0, 0 }, { 74, 73, 65, 78, 0, 0 },
197 { 74, 73, 65, 78, 71, 0 }, { 74, 73, 65, 79, 0, 0 },
198 { 74, 73, 69, 0, 0, 0 }, { 74, 73, 78, 0, 0, 0 },
199 { 74, 73, 78, 71, 0, 0 }, { 74, 73, 79, 78, 71, 0 },
200 { 74, 73, 85, 0, 0, 0 }, { 74, 85, 0, 0, 0, 0 },
201 { 74, 85, 65, 78, 0, 0 }, { 74, 85, 69, 0, 0, 0 },
202 { 74, 85, 78, 0, 0, 0 }, { 75, 65, 0, 0, 0, 0 },
203 { 75, 65, 73, 0, 0, 0 }, { 75, 65, 78, 0, 0, 0 },
204 { 75, 65, 78, 71, 0, 0 }, { 75, 65, 79, 0, 0, 0 },
205 { 75, 69, 0, 0, 0, 0 }, { 75, 69, 78, 0, 0, 0 },
206 { 75, 69, 78, 71, 0, 0 }, { 75, 79, 78, 71, 0, 0 },
207 { 75, 79, 85, 0, 0, 0 }, { 75, 85, 0, 0, 0, 0 },
208 { 75, 85, 65, 0, 0, 0 }, { 75, 85, 65, 73, 0, 0 },
209 { 75, 85, 65, 78, 0, 0 }, { 75, 85, 65, 78, 71, 0 },
210 { 75, 85, 73, 0, 0, 0 }, { 75, 85, 78, 0, 0, 0 },
211 { 75, 85, 79, 0, 0, 0 }, { 76, 65, 0, 0, 0, 0 },
212 { 76, 65, 73, 0, 0, 0 }, { 76, 65, 78, 0, 0, 0 },
213 { 76, 65, 78, 71, 0, 0 }, { 76, 65, 79, 0, 0, 0 },
214 { 76, 69, 0, 0, 0, 0 }, { 76, 69, 73, 0, 0, 0 },
215 { 76, 69, 78, 71, 0, 0 }, { 76, 73, 0, 0, 0, 0 },
216 { 76, 73, 65, 0, 0, 0 }, { 76, 73, 65, 78, 0, 0 },
217 { 76, 73, 65, 78, 71, 0 }, { 76, 73, 65, 79, 0, 0 },
218 { 76, 73, 69, 0, 0, 0 }, { 76, 73, 78, 0, 0, 0 },
219 { 76, 73, 78, 71, 0, 0 }, { 76, 73, 85, 0, 0, 0 },
220 { 76, 79, 0, 0, 0, 0 }, { 76, 79, 78, 71, 0, 0 },
221 { 76, 79, 85, 0, 0, 0 }, { 76, 85, 0, 0, 0, 0 },
222 { 76, 85, 65, 78, 0, 0 }, { 76, 85, 69, 0, 0, 0 },
223 { 76, 85, 78, 0, 0, 0 }, { 76, 85, 79, 0, 0, 0 },
224 { 77, 0, 0, 0, 0, 0 }, { 77, 65, 0, 0, 0, 0 },
225 { 77, 65, 73, 0, 0, 0 }, { 77, 65, 78, 0, 0, 0 },
226 { 77, 65, 78, 71, 0, 0 }, { 77, 65, 79, 0, 0, 0 },
227 { 77, 69, 0, 0, 0, 0 }, { 77, 69, 73, 0, 0, 0 },
228 { 77, 69, 78, 0, 0, 0 }, { 77, 69, 78, 71, 0, 0 },
229 { 77, 73, 0, 0, 0, 0 }, { 77, 73, 65, 78, 0, 0 },
230 { 77, 73, 65, 79, 0, 0 }, { 77, 73, 69, 0, 0, 0 },
231 { 77, 73, 78, 0, 0, 0 }, { 77, 73, 78, 71, 0, 0 },
232 { 77, 73, 85, 0, 0, 0 }, { 77, 79, 0, 0, 0, 0 },
233 { 77, 79, 85, 0, 0, 0 }, { 77, 85, 0, 0, 0, 0 },
234 { 78, 0, 0, 0, 0, 0 }, { 78, 65, 0, 0, 0, 0 },
235 { 78, 65, 73, 0, 0, 0 }, { 78, 65, 78, 0, 0, 0 },
236 { 78, 65, 78, 71, 0, 0 }, { 78, 65, 79, 0, 0, 0 },
237 { 78, 69, 0, 0, 0, 0 }, { 78, 69, 73, 0, 0, 0 },
238 { 78, 69, 78, 0, 0, 0 }, { 78, 69, 78, 71, 0, 0 },
239 { 78, 73, 0, 0, 0, 0 }, { 78, 73, 65, 78, 0, 0 },
240 { 78, 73, 65, 78, 71, 0 }, { 78, 73, 65, 79, 0, 0 },
241 { 78, 73, 69, 0, 0, 0 }, { 78, 73, 78, 0, 0, 0 },
242 { 78, 73, 78, 71, 0, 0 }, { 78, 73, 85, 0, 0, 0 },
243 { 78, 79, 78, 71, 0, 0 }, { 78, 79, 85, 0, 0, 0 },
244 { 78, 85, 0, 0, 0, 0 }, { 78, 85, 65, 78, 0, 0 },
245 { 78, 85, 69, 0, 0, 0 }, { 78, 85, 78, 0, 0, 0 },
246 { 78, 85, 79, 0, 0, 0 }, { 79, 0, 0, 0, 0, 0 },
247 { 79, 85, 0, 0, 0, 0 }, { 80, 65, 0, 0, 0, 0 },
248 { 80, 65, 73, 0, 0, 0 }, { 80, 65, 78, 0, 0, 0 },
249 { 80, 65, 78, 71, 0, 0 }, { 80, 65, 79, 0, 0, 0 },
250 { 80, 69, 73, 0, 0, 0 }, { 80, 69, 78, 0, 0, 0 },
251 { 80, 69, 78, 71, 0, 0 }, { 80, 73, 0, 0, 0, 0 },
252 { 80, 73, 65, 78, 0, 0 }, { 80, 73, 65, 79, 0, 0 },
253 { 80, 73, 69, 0, 0, 0 }, { 80, 73, 78, 0, 0, 0 },
254 { 80, 73, 78, 71, 0, 0 }, { 80, 79, 0, 0, 0, 0 },
255 { 80, 79, 85, 0, 0, 0 }, { 80, 85, 0, 0, 0, 0 },
256 { 81, 73, 0, 0, 0, 0 }, { 81, 73, 65, 0, 0, 0 },
257 { 81, 73, 65, 78, 0, 0 }, { 81, 73, 65, 78, 71, 0 },
258 { 81, 73, 65, 79, 0, 0 }, { 81, 73, 69, 0, 0, 0 },
259 { 81, 73, 78, 0, 0, 0 }, { 81, 73, 78, 71, 0, 0 },
260 { 81, 73, 79, 78, 71, 0 }, { 81, 73, 85, 0, 0, 0 },
261 { 81, 85, 0, 0, 0, 0 }, { 81, 85, 65, 78, 0, 0 },
262 { 81, 85, 69, 0, 0, 0 }, { 81, 85, 78, 0, 0, 0 },
263 { 82, 65, 78, 0, 0, 0 }, { 82, 65, 78, 71, 0, 0 },
264 { 82, 65, 79, 0, 0, 0 }, { 82, 69, 0, 0, 0, 0 },
265 { 82, 69, 78, 0, 0, 0 }, { 82, 69, 78, 71, 0, 0 },
266 { 82, 73, 0, 0, 0, 0 }, { 82, 79, 78, 71, 0, 0 },
267 { 82, 79, 85, 0, 0, 0 }, { 82, 85, 0, 0, 0, 0 },
268 { 82, 85, 65, 0, 0, 0 }, { 82, 85, 65, 78, 0, 0 },
269 { 82, 85, 73, 0, 0, 0 }, { 82, 85, 78, 0, 0, 0 },
270 { 82, 85, 79, 0, 0, 0 }, { 83, 65, 0, 0, 0, 0 },
271 { 83, 65, 73, 0, 0, 0 }, { 83, 65, 78, 0, 0, 0 },
272 { 83, 65, 78, 71, 0, 0 }, { 83, 65, 79, 0, 0, 0 },
273 { 83, 69, 0, 0, 0, 0 }, { 83, 69, 78, 0, 0, 0 },
274 { 83, 69, 78, 71, 0, 0 }, { 83, 72, 65, 0, 0, 0 },
275 { 83, 72, 65, 73, 0, 0 }, { 83, 72, 65, 78, 0, 0 },
276 { 83, 72, 65, 78, 71, 0 }, { 83, 72, 65, 79, 0, 0 },
277 { 83, 72, 69, 0, 0, 0 }, { 83, 72, 69, 78, 0, 0 },
278 { 88, 73, 78, 0, 0, 0 }, { 83, 72, 69, 78, 0, 0 },
279 { 83, 72, 69, 78, 71, 0 }, { 83, 72, 73, 0, 0, 0 },
280 { 83, 72, 79, 85, 0, 0 }, { 83, 72, 85, 0, 0, 0 },
281 { 83, 72, 85, 65, 0, 0 }, { 83, 72, 85, 65, 73, 0 },
282 { 83, 72, 85, 65, 78, 0 }, { 83, 72, 85, 65, 78, 71 },
283 { 83, 72, 85, 73, 0, 0 }, { 83, 72, 85, 78, 0, 0 },
284 { 83, 72, 85, 79, 0, 0 }, { 83, 73, 0, 0, 0, 0 },
285 { 83, 79, 78, 71, 0, 0 }, { 83, 79, 85, 0, 0, 0 },
286 { 83, 85, 0, 0, 0, 0 }, { 83, 85, 65, 78, 0, 0 },
287 { 83, 85, 73, 0, 0, 0 }, { 83, 85, 78, 0, 0, 0 },
288 { 83, 85, 79, 0, 0, 0 }, { 84, 65, 0, 0, 0, 0 },
289 { 84, 65, 73, 0, 0, 0 }, { 84, 65, 78, 0, 0, 0 },
290 { 84, 65, 78, 71, 0, 0 }, { 84, 65, 79, 0, 0, 0 },
291 { 84, 69, 0, 0, 0, 0 }, { 84, 69, 78, 71, 0, 0 },
292 { 84, 73, 0, 0, 0, 0 }, { 84, 73, 65, 78, 0, 0 },
293 { 84, 73, 65, 79, 0, 0 }, { 84, 73, 69, 0, 0, 0 },
294 { 84, 73, 78, 71, 0, 0 }, { 84, 79, 78, 71, 0, 0 },
295 { 84, 79, 85, 0, 0, 0 }, { 84, 85, 0, 0, 0, 0 },
296 { 84, 85, 65, 78, 0, 0 }, { 84, 85, 73, 0, 0, 0 },
297 { 84, 85, 78, 0, 0, 0 }, { 84, 85, 79, 0, 0, 0 },
298 { 87, 65, 0, 0, 0, 0 }, { 87, 65, 73, 0, 0, 0 },
299 { 87, 65, 78, 0, 0, 0 }, { 87, 65, 78, 71, 0, 0 },
300 { 87, 69, 73, 0, 0, 0 }, { 87, 69, 78, 0, 0, 0 },
301 { 87, 69, 78, 71, 0, 0 }, { 87, 79, 0, 0, 0, 0 },
302 { 87, 85, 0, 0, 0, 0 }, { 88, 73, 0, 0, 0, 0 },
303 { 88, 73, 65, 0, 0, 0 }, { 88, 73, 65, 78, 0, 0 },
304 { 88, 73, 65, 78, 71, 0 }, { 88, 73, 65, 79, 0, 0 },
305 { 88, 73, 69, 0, 0, 0 }, { 88, 73, 78, 0, 0, 0 },
306 { 88, 73, 78, 71, 0, 0 }, { 88, 73, 79, 78, 71, 0 },
307 { 88, 73, 85, 0, 0, 0 }, { 88, 85, 0, 0, 0, 0 },
308 { 88, 85, 65, 78, 0, 0 }, { 88, 85, 69, 0, 0, 0 },
309 { 88, 85, 78, 0, 0, 0 }, { 89, 65, 0, 0, 0, 0 },
310 { 89, 65, 78, 0, 0, 0 }, { 89, 65, 78, 71, 0, 0 },
311 { 89, 65, 79, 0, 0, 0 }, { 89, 69, 0, 0, 0, 0 },
312 { 89, 73, 0, 0, 0, 0 }, { 89, 73, 78, 0, 0, 0 },
313 { 89, 73, 78, 71, 0, 0 }, { 89, 79, 0, 0, 0, 0 },
314 { 89, 79, 78, 71, 0, 0 }, { 89, 79, 85, 0, 0, 0 },
315 { 89, 85, 0, 0, 0, 0 }, { 89, 85, 65, 78, 0, 0 },
316 { 89, 85, 69, 0, 0, 0 }, { 89, 85, 78, 0, 0, 0 },
317 { 74, 85, 78, 0, 0, 0 }, { 89, 85, 78, 0, 0, 0 },
318 { 90, 65, 0, 0, 0, 0 }, { 90, 65, 73, 0, 0, 0 },
319 { 90, 65, 78, 0, 0, 0 }, { 90, 65, 78, 71, 0, 0 },
320 { 90, 65, 79, 0, 0, 0 }, { 90, 69, 0, 0, 0, 0 },
321 { 90, 69, 73, 0, 0, 0 }, { 90, 69, 78, 0, 0, 0 },
322 { 90, 69, 78, 71, 0, 0 }, { 90, 72, 65, 0, 0, 0 },
323 { 90, 72, 65, 73, 0, 0 }, { 90, 72, 65, 78, 0, 0 },
324 { 90, 72, 65, 78, 71, 0 }, { 67, 72, 65, 78, 71, 0 },
325 { 90, 72, 65, 78, 71, 0 }, { 90, 72, 65, 79, 0, 0 },
326 { 90, 72, 69, 0, 0, 0 }, { 90, 72, 69, 78, 0, 0 },
327 { 90, 72, 69, 78, 71, 0 }, { 90, 72, 73, 0, 0, 0 },
328 { 83, 72, 73, 0, 0, 0 }, { 90, 72, 73, 0, 0, 0 },
329 { 90, 72, 79, 78, 71, 0 }, { 90, 72, 79, 85, 0, 0 },
330 { 90, 72, 85, 0, 0, 0 }, { 90, 72, 85, 65, 0, 0 },
331 { 90, 72, 85, 65, 73, 0 }, { 90, 72, 85, 65, 78, 0 },
332 { 90, 72, 85, 65, 78, 71 }, { 90, 72, 85, 73, 0, 0 },
333 { 90, 72, 85, 78, 0, 0 }, { 90, 72, 85, 79, 0, 0 },
334 { 90, 73, 0, 0, 0, 0 }, { 90, 79, 78, 71, 0, 0 },
335 { 90, 79, 85, 0, 0, 0 }, { 90, 85, 0, 0, 0, 0 },
336 { 90, 85, 65, 78, 0, 0 }, { 90, 85, 73, 0, 0, 0 },
337 { 90, 85, 78, 0, 0, 0 }, { 90, 85, 79, 0, 0, 0 },
338 { 0, 0, 0, 0, 0, 0 }, { 83, 72, 65, 78, 0, 0 },
339 { 0, 0, 0, 0, 0, 0 }, };
340
341 /**
342 * First and last Chinese character with known Pinyin according to zh
343 * collation
344 */
345 private static final String FIRST_PINYIN_UNIHAN = "\u963F";
346 private static final String LAST_PINYIN_UNIHAN = "\u9FFF";
347
348 private static final Collator COLLATOR = Collator.getInstance(Locale.CHINA);
349
350 private static HanziToPinyin sInstance;
351 private final boolean mHasChinaCollator;
352
353 public static class Token {
354 /**
355 * Separator between target string for each source char
356 */
357 public static final String SEPARATOR = " ";
358
359 public static final int LATIN = 1;
360 public static final int PINYIN = 2;
361 public static final int UNKNOWN = 3;
362
363 public Token() {
364 }
365
366 public Token(int type, String source, String target) {
367 this.type = type;
368 this.source = source;
369 this.target = target;
370 }
371
372 /**
373 * Type of this token, ASCII, PINYIN or UNKNOWN.
374 */
375 public int type;
376 /**
377 * Original string before translation.
378 */
379 public String source;
380 /**
381 * Translated string of source. For Han, target is corresponding Pinyin.
382 * Otherwise target is original string in source.
383 */
384 public String target;
385 }
386
387 protected HanziToPinyin(boolean hasChinaCollator) {
388 mHasChinaCollator = hasChinaCollator;
389 }
390
391 public static HanziToPinyin getInstance() {
392 synchronized (HanziToPinyin.class) {
393 if (sInstance != null) {
394 return sInstance;
395 }
396 // Check if zh_CN collation data is available
397 final Locale locale[] = Collator.getAvailableLocales();
398
399 // 增加的代码,增强。
400 final Locale chinaAddition = new Locale("zh");
401
402 for (int i = 0; i < locale.length; i++) {
403 if (locale[i].equals(Locale.CHINA)
404 || locale[i].equals(chinaAddition)) {
405 // Do self validation just once.
406 if (DEBUG) {
407 Log.d(TAG, "Self validation. Result: "
408 + doSelfValidation());
409 }
410 sInstance = new HanziToPinyin(true);
411 return sInstance;
412 }
413 }
414 Log.w(TAG,
415 "There is no Chinese collator, HanziToPinyin is disabled");
416 sInstance = new HanziToPinyin(false);
417 return sInstance;
418 }
419 }
420
421 /**
422 * Validate if our internal table has some wrong value.
423 *
424 * @return true when the table looks correct.
425 */
426 private static boolean doSelfValidation() {
427 char lastChar = UNIHANS[0];
428 String lastString = Character.toString(lastChar);
429 for (char c : UNIHANS) {
430 if (lastChar == c) {
431 continue;
432 }
433 final String curString = Character.toString(c);
434 int cmp = COLLATOR.compare(lastString, curString);
435 if (cmp >= 0) {
436 Log.e(TAG, "Internal error in Unihan table. "
437 + "The last string \"" + lastString
438 + "\" is greater than current string \"" + curString
439 + "\".");
440 return false;
441 }
442 lastString = curString;
443 }
444 return true;
445 }
446
447 private Token getToken(char character) {
448 Token token = new Token();
449 final String letter = Character.toString(character);
450 token.source = letter;
451 int offset = -1;
452 int cmp;
453 if (character < 256) {
454 token.type = Token.LATIN;
455 token.target = letter;
456 return token;
457 } else {
458 cmp = COLLATOR.compare(letter, FIRST_PINYIN_UNIHAN);
459 if (cmp < 0) {
460 token.type = Token.UNKNOWN;
461 token.target = letter;
462 return token;
463 } else if (cmp == 0) {
464 token.type = Token.PINYIN;
465 offset = 0;
466 } else {
467 cmp = COLLATOR.compare(letter, LAST_PINYIN_UNIHAN);
468 if (cmp > 0) {
469 token.type = Token.UNKNOWN;
470 token.target = letter;
471 return token;
472 } else if (cmp == 0) {
473 token.type = Token.PINYIN;
474 offset = UNIHANS.length - 1;
475 }
476 }
477 }
478
479 token.type = Token.PINYIN;
480 if (offset < 0) {
481 int begin = 0;
482 int end = UNIHANS.length - 1;
483 while (begin <= end) {
484 offset = (begin + end) / 2;
485 final String unihan = Character.toString(UNIHANS[offset]);
486 cmp = COLLATOR.compare(letter, unihan);
487 if (cmp == 0) {
488 break;
489 } else if (cmp > 0) {
490 begin = offset + 1;
491 } else {
492 end = offset - 1;
493 }
494 }
495 }
496 if (cmp < 0) {
497 offset--;
498 }
499 StringBuilder pinyin = new StringBuilder();
500 for (int j = 0; j < PINYINS[offset].length && PINYINS[offset][j] != 0; j++) {
501 pinyin.append((char) PINYINS[offset][j]);
502 }
503 token.target = pinyin.toString();
504 if (TextUtils.isEmpty(token.target)) {
505 token.type = Token.UNKNOWN;
506 token.target = token.source;
507 }
508 return token;
509 }
510
511 /**
512 * Convert the input to a array of tokens. The sequence of ASCII or Unknown
513 * characters without space will be put into a Token, One Hanzi character
514 * which has pinyin will be treated as a Token. If these is no China
515 * collator, the empty token array is returned.
516 */
517 public ArrayList<Token> get(final String input) {
518 ArrayList<Token> tokens = new ArrayList<Token>();
519 if (!mHasChinaCollator || TextUtils.isEmpty(input)) {
520 // return empty tokens.
521 return tokens;
522 }
523 final int inputLength = input.length();
524 final StringBuilder sb = new StringBuilder();
525 int tokenType = Token.LATIN;
526 // Go through the input, create a new token when
527 // a. Token type changed
528 // b. Get the Pinyin of current charater.
529 // c. current character is space.
530 for (int i = 0; i < inputLength; i++) {
531 final char character = input.charAt(i);
532 if (character == ' ') {
533 if (sb.length() > 0) {
534 addToken(sb, tokens, tokenType);
535 }
536 } else if (character < 256) {
537 if (tokenType != Token.LATIN && sb.length() > 0) {
538 addToken(sb, tokens, tokenType);
539 }
540 tokenType = Token.LATIN;
541 sb.append(character);
542 } else {
543 Token t = getToken(character);
544 if (t.type == Token.PINYIN) {
545 if (sb.length() > 0) {
546 addToken(sb, tokens, tokenType);
547 }
548 tokens.add(t);
549 tokenType = Token.PINYIN;
550 } else {
551 if (tokenType != t.type && sb.length() > 0) {
552 addToken(sb, tokens, tokenType);
553 }
554 tokenType = t.type;
555 sb.append(character);
556 }
557 }
558 }
559 if (sb.length() > 0) {
560 addToken(sb, tokens, tokenType);
561 }
562 return tokens;
563 }
564
565 private void addToken(final StringBuilder sb,
566 final ArrayList<Token> tokens, final int tokenType) {
567 String str = sb.toString();
568 tokens.add(new Token(tokenType, str, str));
569 sb.setLength(0);
570 }
571 }

HanziToPinyin.java

写一个MainActivity.java测试汉字转化为汉语拼音输出的效果:

 package zhangphil.hanyupinyin;

 import java.util.ArrayList;

 import zhangphil.hanyupinyin.HanziToPinyin.Token;
import android.app.Activity;
import android.os.Bundle; public class MainActivity extends Activity { @Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState); String s = "安卓";
System.out.println("汉字转拼音输出: " + getPinYin(s));
} // 输入汉字返回拼音的通用方法函数。
public static String getPinYin(String hanzi) {
ArrayList<Token> tokens = HanziToPinyin.getInstance().get(hanzi);
StringBuilder sb = new StringBuilder();
if (tokens != null && tokens.size() > 0) {
for (Token token : tokens) {
if (Token.PINYIN == token.type) {
sb.append(token.target);
} else {
sb.append(token.source);
}
}
} return sb.toString().toUpperCase();
}
}

结果输出如图:

(转)汉字转拼音HanziToPinyin的更多相关文章

  1. 优化后的 google提供的汉字转拼音类(针对某些htc等手机的不兼容情况)

    /* * Copyright (C) 2011 The Android Open Source Project * * Licensed under the Apache License, Versi ...

  2. 文件一键上传、汉字转拼音、excel文件上传下载功能模块的实现

    ----------------------------------------------------------------------------------------------[版权申明: ...

  3. iOS 汉字的拼音

    获取汉字的拼音 #import <Foundation/Foundation.h> @interface NSString (Utils) /** * 汉字的拼音 * * @return ...

  4. JavaScript 汉字与拼音互转终极方案 附JS拼音输入法

    转:http://www.codeceo.com/article/javascript-pinyin.html 前言 网上关于JS实现汉字和拼音互转的文章很多,但是比较杂乱,都是互相抄来抄去,而且有的 ...

  5. SQL汉字转拼音函数-支持首字母、全拼

    SQL汉字转拼音函数-支持首字母.全拼 FROM :http://my.oschina.net/ind/blog/191659 作者不详 --方法一sqlserver汉字转拼音首字母 --调用方法 s ...

  6. 【干货】JS版汉字与拼音互转终极方案,附简单的JS拼音输入法

    前言 网上关于JS实现汉字和拼音互转的文章很多,但是比较杂乱,都是互相抄来抄去,而且有的不支持多音字,有的不支持声调,有的字典文件太大,还比如有时候我仅仅是需要获取汉字拼音首字母却要引入200kb的字 ...

  7. C#汉字转拼音(支持多音字)

    之前由于项目需要,中间需要一个汉字转拼音和首拼的功能来做查询,感觉这种功能基本已经成熟化了,于是查找了相关的代码,首先引入眼帘的是下面两篇文章 1.C# 汉字转拼音(支持GB2312字符集中所有汉字) ...

  8. C#汉字转拼音(npinyin)将中文转换成拼音全文或首字母

    汉字转拼音貌似一直是C#开发的一个难题,无论什么方案都有一定的bug,之前使用了两种方案. 1.Chinese2Spell.cs 一些不能识别的汉字全部转为Z 2.Microsoft Visual S ...

  9. C#汉字转拼音帮助类

    using System; using System.Collections.Generic; using System.Text; using System.Text.RegularExpressi ...

随机推荐

  1. Timed Code

    <!DOCTYPE html> <html xmlns="http://www.w3.org/1999/xhtml"> <head runat=&qu ...

  2. 收藏一部山地车教学视频,Fabien Barel主讲及动作示范

    视频是由曾多次获得UCI速降赛的冠军车手Fabien Barel主讲及动作示范,讲解山地车越野的装备以及基本动作.视频中的要点说明我已经手录为文本,如果视频中没有看清的地方,也可以看文字. 骑行装备 ...

  3. SQL SERVER – Count Duplicate Records – Rows

    SELECT YourColumn, COUNT(*) TotalCount FROM YourTable GROUP BY YourColumn HAVING COUNT(*) > 1 ORD ...

  4. iOS 切图使用 分辨率 使用 相关总结

    疑问: 就iphone来说分辨率有三种,320*480. 640*960. 640*1136 那么如果我想做图,如果是320*480 1.是不是所有的图片,比如按钮的,背景图的,尺寸都必须做成小于或等 ...

  5. Xcode中修改整个项目工程名称步骤

    1:首先选中项目WaterDropTest.xcodeproj文件后单击鼠标->输入我们要重新命名的工程名,然后会弹出一个对话框,点击rename按钮 2.xcode菜单中选->produ ...

  6. TCP/IP协议原理与应用笔记25:网际协议(IP)之 数据报(Datagram)

    1. 数据报(Datagram)格式: 2. 长度字段 (1)首部长度字段, bits 以 4 字节(即32bits)为单位 取值:5~15(即首部长度为20 ~ 60 bytes) (2)总长度字段 ...

  7. Android权限机制

    Android系统是运行在Linux内核上的,Android与Linux分别有自己的一套严格的安全及权限机制, 很多像我这样的新手,尤其是习惯了windows低安全限制的用户,很容易在这方面弄混淆,下 ...

  8. jQuery中each的break和continue

    each实质上是一个for循环,那么能不能像普通的for循环那样break和continue呢? 参考http://bevisoft.iteye.com/blog/641195做了个实验,可以的, 代 ...

  9. 关于JPA方法名创建自动查询

    JPA 的根据解析方法名称自动对接口进行实现的方法能节省大量的资源,以下对于解析规则进行列举哈 商品实体类 package com.dionren.zhaoxie.entity.trade; impo ...

  10. Matlab之文件读写

    读文件:  (0)自己添加 你可以将txt的一些文本数据直接拷贝到matlab窗口,然后保存为mat文件,下次就可以直接采用load函数了. (1)Load load 从Matlab的数据文件.mat ...