本文转载于:http://blog.csdn.net/zhangphil/article/details/47164665

Android系统本身自带有有将汉字转化为英文拼音的类和方法。具体的类就是HanziToPinyin.java。Android系统自身实现的通讯录中就使用了HanziToPinyin.java对中文通讯录做分组整理。通过HanziToPinyin.java可以将汉字转化为拼音输出,在一些应用中非常必须,比如联系人的分组,假设一个人通讯录中存有若干姓张(ZHANG)的联系人,那么所有姓张的联系人按理都应该分组在“Z”组下。又比如微信、QQ等等此类社交类APP,凡是涉及到联系人、好友分组排序的应用场景,则均需要将汉字转化为拼音然后依据首字母排序归类。
HanziToPinyin.java不是一个公开的类,只是谷歌官方内部在实现Android通讯录中私有使用的一个类,我们不能够直接像使用普通Android SDK API一样使用,但这没关系,我们完全可以将这个类文件拷贝出来,放到我们自己的项目中,直接使用。
HanziToPinyin.java的代码文件,谷歌官方的通讯录APP下:

packages/providers/ContactsProvider /src/com/android/providers/contacts/HanziToPinyin.java

网上也有这个HanziToPinyin.java类文件的项目地址。但是,直接使用这个 类不能正常工作,错误原因是:

"There is no Chinese collator, HanziToPinyin is disabled"

发生这一错误的代码块是在HanziToPinyin.java的方法:
public static HanziToPinyin getInstance();
具体原因是这个方法在一些非原生定制的Android系统中,对中文Locale的定义规则不同,导致原代码文件中的locale[i].equals(Locale.CHINA)返回false,不能识别,致使以后的代码全部失去功效。

对此问题的修复(解决方案)

我改进了判断条件,增加一些代码:
final Locale chinaAddition = new Locale("zh");
将此chinaAddition作为辅助条件也加入到条件判断中,

 if ( locale[i].equals(Locale.CHINA) ||  locale[i].equals(chinaAddition) ){

}

下面是我改进后的getInstance()方法全部代码:

 public static HanziToPinyin getInstance() {
synchronized (HanziToPinyin.class) {
if (sInstance != null) {
return sInstance;
}
// Check if zh_CN collation data is available
final Locale locale[] = Collator.getAvailableLocales(); // 增加的代码,增强。
final Locale chinaAddition = new Locale("zh"); for (int i = 0; i < locale.length; i++) {
if (locale[i].equals(Locale.CHINA)
|| locale[i].equals(chinaAddition)) {
// Do self validation just once.
if (DEBUG) {
Log.d(TAG, "Self validation. Result: "
+ doSelfValidation());
}
sInstance = new HanziToPinyin(true);
return sInstance;
}
}
Log.w(TAG,
"There is no Chinese collator, HanziToPinyin is disabled");
sInstance = new HanziToPinyin(false);
return sInstance;
}
}

经由改进增强,HanziToPinyin.java的全部源代码如下(代码可以复制到自己的项目中直接使用):

  1 /*
2 * Copyright (C) 2011 The Android Open Source Project
3 *
4 * Licensed under the Apache License, Version 2.0 (the "License");
5 * you may not use this file except in compliance with the License.
6 * You may obtain a copy of the License at
7 *
8 * http://www.apache.org/licenses/LICENSE-2.0
9 *
10 * Unless required by applicable law or agreed to in writing, software
11 * distributed under the License is distributed on an "AS IS" BASIS,
12 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13 * See the License for the specific language governing permissions and
14 * limitations under the License.
15 */
16
17 package zhangphil.hanyupinyin;
18
19 import android.text.TextUtils;
20 import android.util.Log;
21
22 import java.text.Collator;
23 import java.util.ArrayList;
24 import java.util.Locale;
25
26 /**
27 * An object to convert Chinese character to its corresponding pinyin string.
28 * For characters with multiple possible pinyin string, only one is selected
29 * according to collator. Polyphone is not supported in this implementation.
30 * This class is implemented to achieve the best runtime performance and minimum
31 * runtime resources with tolerable sacrifice of accuracy. This implementation
32 * highly depends on zh_CN ICU collation data and must be always synchronized
33 * with ICU.
34 *
35 * Currently this file is aligned to zh.txt in ICU 4.6 鏉ヨ嚜android4.2婧愮爜
36 */
37 public class HanziToPinyin {
38 private static final String TAG = "HanziToPinyin";
39
40 // Turn on this flag when we want to check internal data structure.
41 private static final boolean DEBUG = false;
42
43 /**
44 * Unihans array.
45 *
46 * Each unihans is the first one within same pinyin when collator is zh_CN.
47 */
48 public static final char[] UNIHANS = { '\u963f', '\u54ce', '\u5b89',
49 '\u80ae', '\u51f9', '\u516b', '\u6300', '\u6273', '\u90a6',
50 '\u52f9', '\u9642', '\u5954', '\u4f3b', '\u5c44', '\u8fb9',
51 '\u706c', '\u618b', '\u6c43', '\u51ab', '\u7676', '\u5cec',
52 '\u5693', '\u5072', '\u53c2', '\u4ed3', '\u64a1', '\u518a',
53 '\u5d7e', '\u66fd', '\u66fe', '\u5c64', '\u53c9', '\u8286',
54 '\u8fbf', '\u4f25', '\u6284', '\u8f66', '\u62bb', '\u6c88',
55 '\u6c89', '\u9637', '\u5403', '\u5145', '\u62bd', '\u51fa',
56 '\u6b3b', '\u63e3', '\u5ddb', '\u5205', '\u5439', '\u65fe',
57 '\u9034', '\u5472', '\u5306', '\u51d1', '\u7c97', '\u6c46',
58 '\u5d14', '\u90a8', '\u6413', '\u5491', '\u5446', '\u4e39',
59 '\u5f53', '\u5200', '\u561a', '\u6265', '\u706f', '\u6c10',
60 '\u55f2', '\u7538', '\u5201', '\u7239', '\u4e01', '\u4e1f',
61 '\u4e1c', '\u543a', '\u53be', '\u8011', '\u8968', '\u5428',
62 '\u591a', '\u59b8', '\u8bf6', '\u5940', '\u97a5', '\u513f',
63 '\u53d1', '\u5e06', '\u531a', '\u98de', '\u5206', '\u4e30',
64 '\u8985', '\u4ecf', '\u7d11', '\u4f15', '\u65ee', '\u4f85',
65 '\u7518', '\u5188', '\u768b', '\u6208', '\u7ed9', '\u6839',
66 '\u522f', '\u5de5', '\u52fe', '\u4f30', '\u74dc', '\u4e56',
67 '\u5173', '\u5149', '\u5f52', '\u4e28', '\u5459', '\u54c8',
68 '\u548d', '\u4f44', '\u592f', '\u8320', '\u8bc3', '\u9ed2',
69 '\u62eb', '\u4ea8', '\u5677', '\u53ff', '\u9f41', '\u4e6f',
70 '\u82b1', '\u6000', '\u72bf', '\u5ddf', '\u7070', '\u660f',
71 '\u5419', '\u4e0c', '\u52a0', '\u620b', '\u6c5f', '\u827d',
72 '\u9636', '\u5dfe', '\u5755', '\u5182', '\u4e29', '\u51e5',
73 '\u59e2', '\u5658', '\u519b', '\u5494', '\u5f00', '\u520a',
74 '\u5ffc', '\u5c3b', '\u533c', '\u808e', '\u52a5', '\u7a7a',
75 '\u62a0', '\u625d', '\u5938', '\u84af', '\u5bbd', '\u5321',
76 '\u4e8f', '\u5764', '\u6269', '\u5783', '\u6765', '\u5170',
77 '\u5577', '\u635e', '\u808b', '\u52d2', '\u5d1a', '\u5215',
78 '\u4fe9', '\u5941', '\u826f', '\u64a9', '\u5217', '\u62ce',
79 '\u5222', '\u6e9c', '\u56d6', '\u9f99', '\u779c', '\u565c',
80 '\u5a08', '\u7567', '\u62a1', '\u7f57', '\u5463', '\u5988',
81 '\u57cb', '\u5ada', '\u7264', '\u732b', '\u4e48', '\u5445',
82 '\u95e8', '\u753f', '\u54aa', '\u5b80', '\u55b5', '\u4e5c',
83 '\u6c11', '\u540d', '\u8c2c', '\u6478', '\u54de', '\u6bea',
84 '\u55ef', '\u62cf', '\u8149', '\u56e1', '\u56d4', '\u5b6c',
85 '\u7592', '\u5a1e', '\u6041', '\u80fd', '\u59ae', '\u62c8',
86 '\u5b22', '\u9e1f', '\u634f', '\u56dc', '\u5b81', '\u599e',
87 '\u519c', '\u7fba', '\u5974', '\u597b', '\u759f', '\u9ec1',
88 '\u90cd', '\u5594', '\u8bb4', '\u5991', '\u62cd', '\u7705',
89 '\u4e53', '\u629b', '\u5478', '\u55b7', '\u5309', '\u4e15',
90 '\u56e8', '\u527d', '\u6c15', '\u59d8', '\u4e52', '\u948b',
91 '\u5256', '\u4ec6', '\u4e03', '\u6390', '\u5343', '\u545b',
92 '\u6084', '\u767f', '\u4eb2', '\u72c5', '\u828e', '\u4e18',
93 '\u533a', '\u5cd1', '\u7f3a', '\u590b', '\u5465', '\u7a63',
94 '\u5a06', '\u60f9', '\u4eba', '\u6254', '\u65e5', '\u8338',
95 '\u53b9', '\u909a', '\u633c', '\u5827', '\u5a51', '\u77a4',
96 '\u637c', '\u4ee8', '\u6be2', '\u4e09', '\u6852', '\u63bb',
97 '\u95aa', '\u68ee', '\u50e7', '\u6740', '\u7b5b', '\u5c71',
98 '\u4f24', '\u5f30', '\u5962', '\u7533', '\u8398', '\u6552',
99 '\u5347', '\u5c38', '\u53ce', '\u4e66', '\u5237', '\u8870',
100 '\u95e9', '\u53cc', '\u8c01', '\u542e', '\u8bf4', '\u53b6',
101 '\u5fea', '\u635c', '\u82cf', '\u72fb', '\u590a', '\u5b59',
102 '\u5506', '\u4ed6', '\u56fc', '\u574d', '\u6c64', '\u5932',
103 '\u5fd1', '\u71a5', '\u5254', '\u5929', '\u65eb', '\u5e16',
104 '\u5385', '\u56f2', '\u5077', '\u51f8', '\u6e4d', '\u63a8',
105 '\u541e', '\u4e47', '\u7a75', '\u6b6a', '\u5f2f', '\u5c23',
106 '\u5371', '\u6637', '\u7fc1', '\u631d', '\u4e4c', '\u5915',
107 '\u8672', '\u4eda', '\u4e61', '\u7071', '\u4e9b', '\u5fc3',
108 '\u661f', '\u51f6', '\u4f11', '\u5401', '\u5405', '\u524a',
109 '\u5743', '\u4e2b', '\u6079', '\u592e', '\u5e7a', '\u503b',
110 '\u4e00', '\u56d9', '\u5e94', '\u54df', '\u4f63', '\u4f18',
111 '\u625c', '\u56e6', '\u66f0', '\u6655', '\u7b60', '\u7b7c',
112 '\u5e00', '\u707d', '\u5142', '\u5328', '\u50ae', '\u5219',
113 '\u8d3c', '\u600e', '\u5897', '\u624e', '\u635a', '\u6cbe',
114 '\u5f20', '\u957f', '\u9577', '\u4f4b', '\u8707', '\u8d1e',
115 '\u4e89', '\u4e4b', '\u5cd9', '\u5ea2', '\u4e2d', '\u5dde',
116 '\u6731', '\u6293', '\u62fd', '\u4e13', '\u5986', '\u96b9',
117 '\u5b92', '\u5353', '\u4e72', '\u5b97', '\u90b9', '\u79df',
118 '\u94bb', '\u539c', '\u5c0a', '\u6628', '\u5159', '\u9fc3',
119 '\u9fc4', };
120
121 /**
122 * Pinyin array.
123 *
124 * Each pinyin is corresponding to unihans of same offset in the unihans
125 * array.
126 */
127 public static final byte[][] PINYINS = { { 65, 0, 0, 0, 0, 0 },
128 { 65, 73, 0, 0, 0, 0 }, { 65, 78, 0, 0, 0, 0 },
129 { 65, 78, 71, 0, 0, 0 }, { 65, 79, 0, 0, 0, 0 },
130 { 66, 65, 0, 0, 0, 0 }, { 66, 65, 73, 0, 0, 0 },
131 { 66, 65, 78, 0, 0, 0 }, { 66, 65, 78, 71, 0, 0 },
132 { 66, 65, 79, 0, 0, 0 }, { 66, 69, 73, 0, 0, 0 },
133 { 66, 69, 78, 0, 0, 0 }, { 66, 69, 78, 71, 0, 0 },
134 { 66, 73, 0, 0, 0, 0 }, { 66, 73, 65, 78, 0, 0 },
135 { 66, 73, 65, 79, 0, 0 }, { 66, 73, 69, 0, 0, 0 },
136 { 66, 73, 78, 0, 0, 0 }, { 66, 73, 78, 71, 0, 0 },
137 { 66, 79, 0, 0, 0, 0 }, { 66, 85, 0, 0, 0, 0 },
138 { 67, 65, 0, 0, 0, 0 }, { 67, 65, 73, 0, 0, 0 },
139 { 67, 65, 78, 0, 0, 0 }, { 67, 65, 78, 71, 0, 0 },
140 { 67, 65, 79, 0, 0, 0 }, { 67, 69, 0, 0, 0, 0 },
141 { 67, 69, 78, 0, 0, 0 }, { 67, 69, 78, 71, 0, 0 },
142 { 90, 69, 78, 71, 0, 0 }, { 67, 69, 78, 71, 0, 0 },
143 { 67, 72, 65, 0, 0, 0 }, { 67, 72, 65, 73, 0, 0 },
144 { 67, 72, 65, 78, 0, 0 }, { 67, 72, 65, 78, 71, 0 },
145 { 67, 72, 65, 79, 0, 0 }, { 67, 72, 69, 0, 0, 0 },
146 { 67, 72, 69, 78, 0, 0 }, { 83, 72, 69, 78, 0, 0 },
147 { 67, 72, 69, 78, 0, 0 }, { 67, 72, 69, 78, 71, 0 },
148 { 67, 72, 73, 0, 0, 0 }, { 67, 72, 79, 78, 71, 0 },
149 { 67, 72, 79, 85, 0, 0 }, { 67, 72, 85, 0, 0, 0 },
150 { 67, 72, 85, 65, 0, 0 }, { 67, 72, 85, 65, 73, 0 },
151 { 67, 72, 85, 65, 78, 0 }, { 67, 72, 85, 65, 78, 71 },
152 { 67, 72, 85, 73, 0, 0 }, { 67, 72, 85, 78, 0, 0 },
153 { 67, 72, 85, 79, 0, 0 }, { 67, 73, 0, 0, 0, 0 },
154 { 67, 79, 78, 71, 0, 0 }, { 67, 79, 85, 0, 0, 0 },
155 { 67, 85, 0, 0, 0, 0 }, { 67, 85, 65, 78, 0, 0 },
156 { 67, 85, 73, 0, 0, 0 }, { 67, 85, 78, 0, 0, 0 },
157 { 67, 85, 79, 0, 0, 0 }, { 68, 65, 0, 0, 0, 0 },
158 { 68, 65, 73, 0, 0, 0 }, { 68, 65, 78, 0, 0, 0 },
159 { 68, 65, 78, 71, 0, 0 }, { 68, 65, 79, 0, 0, 0 },
160 { 68, 69, 0, 0, 0, 0 }, { 68, 69, 78, 0, 0, 0 },
161 { 68, 69, 78, 71, 0, 0 }, { 68, 73, 0, 0, 0, 0 },
162 { 68, 73, 65, 0, 0, 0 }, { 68, 73, 65, 78, 0, 0 },
163 { 68, 73, 65, 79, 0, 0 }, { 68, 73, 69, 0, 0, 0 },
164 { 68, 73, 78, 71, 0, 0 }, { 68, 73, 85, 0, 0, 0 },
165 { 68, 79, 78, 71, 0, 0 }, { 68, 79, 85, 0, 0, 0 },
166 { 68, 85, 0, 0, 0, 0 }, { 68, 85, 65, 78, 0, 0 },
167 { 68, 85, 73, 0, 0, 0 }, { 68, 85, 78, 0, 0, 0 },
168 { 68, 85, 79, 0, 0, 0 }, { 69, 0, 0, 0, 0, 0 },
169 { 69, 73, 0, 0, 0, 0 }, { 69, 78, 0, 0, 0, 0 },
170 { 69, 78, 71, 0, 0, 0 }, { 69, 82, 0, 0, 0, 0 },
171 { 70, 65, 0, 0, 0, 0 }, { 70, 65, 78, 0, 0, 0 },
172 { 70, 65, 78, 71, 0, 0 }, { 70, 69, 73, 0, 0, 0 },
173 { 70, 69, 78, 0, 0, 0 }, { 70, 69, 78, 71, 0, 0 },
174 { 70, 73, 65, 79, 0, 0 }, { 70, 79, 0, 0, 0, 0 },
175 { 70, 79, 85, 0, 0, 0 }, { 70, 85, 0, 0, 0, 0 },
176 { 71, 65, 0, 0, 0, 0 }, { 71, 65, 73, 0, 0, 0 },
177 { 71, 65, 78, 0, 0, 0 }, { 71, 65, 78, 71, 0, 0 },
178 { 71, 65, 79, 0, 0, 0 }, { 71, 69, 0, 0, 0, 0 },
179 { 71, 69, 73, 0, 0, 0 }, { 71, 69, 78, 0, 0, 0 },
180 { 71, 69, 78, 71, 0, 0 }, { 71, 79, 78, 71, 0, 0 },
181 { 71, 79, 85, 0, 0, 0 }, { 71, 85, 0, 0, 0, 0 },
182 { 71, 85, 65, 0, 0, 0 }, { 71, 85, 65, 73, 0, 0 },
183 { 71, 85, 65, 78, 0, 0 }, { 71, 85, 65, 78, 71, 0 },
184 { 71, 85, 73, 0, 0, 0 }, { 71, 85, 78, 0, 0, 0 },
185 { 71, 85, 79, 0, 0, 0 }, { 72, 65, 0, 0, 0, 0 },
186 { 72, 65, 73, 0, 0, 0 }, { 72, 65, 78, 0, 0, 0 },
187 { 72, 65, 78, 71, 0, 0 }, { 72, 65, 79, 0, 0, 0 },
188 { 72, 69, 0, 0, 0, 0 }, { 72, 69, 73, 0, 0, 0 },
189 { 72, 69, 78, 0, 0, 0 }, { 72, 69, 78, 71, 0, 0 },
190 { 72, 77, 0, 0, 0, 0 }, { 72, 79, 78, 71, 0, 0 },
191 { 72, 79, 85, 0, 0, 0 }, { 72, 85, 0, 0, 0, 0 },
192 { 72, 85, 65, 0, 0, 0 }, { 72, 85, 65, 73, 0, 0 },
193 { 72, 85, 65, 78, 0, 0 }, { 72, 85, 65, 78, 71, 0 },
194 { 72, 85, 73, 0, 0, 0 }, { 72, 85, 78, 0, 0, 0 },
195 { 72, 85, 79, 0, 0, 0 }, { 74, 73, 0, 0, 0, 0 },
196 { 74, 73, 65, 0, 0, 0 }, { 74, 73, 65, 78, 0, 0 },
197 { 74, 73, 65, 78, 71, 0 }, { 74, 73, 65, 79, 0, 0 },
198 { 74, 73, 69, 0, 0, 0 }, { 74, 73, 78, 0, 0, 0 },
199 { 74, 73, 78, 71, 0, 0 }, { 74, 73, 79, 78, 71, 0 },
200 { 74, 73, 85, 0, 0, 0 }, { 74, 85, 0, 0, 0, 0 },
201 { 74, 85, 65, 78, 0, 0 }, { 74, 85, 69, 0, 0, 0 },
202 { 74, 85, 78, 0, 0, 0 }, { 75, 65, 0, 0, 0, 0 },
203 { 75, 65, 73, 0, 0, 0 }, { 75, 65, 78, 0, 0, 0 },
204 { 75, 65, 78, 71, 0, 0 }, { 75, 65, 79, 0, 0, 0 },
205 { 75, 69, 0, 0, 0, 0 }, { 75, 69, 78, 0, 0, 0 },
206 { 75, 69, 78, 71, 0, 0 }, { 75, 79, 78, 71, 0, 0 },
207 { 75, 79, 85, 0, 0, 0 }, { 75, 85, 0, 0, 0, 0 },
208 { 75, 85, 65, 0, 0, 0 }, { 75, 85, 65, 73, 0, 0 },
209 { 75, 85, 65, 78, 0, 0 }, { 75, 85, 65, 78, 71, 0 },
210 { 75, 85, 73, 0, 0, 0 }, { 75, 85, 78, 0, 0, 0 },
211 { 75, 85, 79, 0, 0, 0 }, { 76, 65, 0, 0, 0, 0 },
212 { 76, 65, 73, 0, 0, 0 }, { 76, 65, 78, 0, 0, 0 },
213 { 76, 65, 78, 71, 0, 0 }, { 76, 65, 79, 0, 0, 0 },
214 { 76, 69, 0, 0, 0, 0 }, { 76, 69, 73, 0, 0, 0 },
215 { 76, 69, 78, 71, 0, 0 }, { 76, 73, 0, 0, 0, 0 },
216 { 76, 73, 65, 0, 0, 0 }, { 76, 73, 65, 78, 0, 0 },
217 { 76, 73, 65, 78, 71, 0 }, { 76, 73, 65, 79, 0, 0 },
218 { 76, 73, 69, 0, 0, 0 }, { 76, 73, 78, 0, 0, 0 },
219 { 76, 73, 78, 71, 0, 0 }, { 76, 73, 85, 0, 0, 0 },
220 { 76, 79, 0, 0, 0, 0 }, { 76, 79, 78, 71, 0, 0 },
221 { 76, 79, 85, 0, 0, 0 }, { 76, 85, 0, 0, 0, 0 },
222 { 76, 85, 65, 78, 0, 0 }, { 76, 85, 69, 0, 0, 0 },
223 { 76, 85, 78, 0, 0, 0 }, { 76, 85, 79, 0, 0, 0 },
224 { 77, 0, 0, 0, 0, 0 }, { 77, 65, 0, 0, 0, 0 },
225 { 77, 65, 73, 0, 0, 0 }, { 77, 65, 78, 0, 0, 0 },
226 { 77, 65, 78, 71, 0, 0 }, { 77, 65, 79, 0, 0, 0 },
227 { 77, 69, 0, 0, 0, 0 }, { 77, 69, 73, 0, 0, 0 },
228 { 77, 69, 78, 0, 0, 0 }, { 77, 69, 78, 71, 0, 0 },
229 { 77, 73, 0, 0, 0, 0 }, { 77, 73, 65, 78, 0, 0 },
230 { 77, 73, 65, 79, 0, 0 }, { 77, 73, 69, 0, 0, 0 },
231 { 77, 73, 78, 0, 0, 0 }, { 77, 73, 78, 71, 0, 0 },
232 { 77, 73, 85, 0, 0, 0 }, { 77, 79, 0, 0, 0, 0 },
233 { 77, 79, 85, 0, 0, 0 }, { 77, 85, 0, 0, 0, 0 },
234 { 78, 0, 0, 0, 0, 0 }, { 78, 65, 0, 0, 0, 0 },
235 { 78, 65, 73, 0, 0, 0 }, { 78, 65, 78, 0, 0, 0 },
236 { 78, 65, 78, 71, 0, 0 }, { 78, 65, 79, 0, 0, 0 },
237 { 78, 69, 0, 0, 0, 0 }, { 78, 69, 73, 0, 0, 0 },
238 { 78, 69, 78, 0, 0, 0 }, { 78, 69, 78, 71, 0, 0 },
239 { 78, 73, 0, 0, 0, 0 }, { 78, 73, 65, 78, 0, 0 },
240 { 78, 73, 65, 78, 71, 0 }, { 78, 73, 65, 79, 0, 0 },
241 { 78, 73, 69, 0, 0, 0 }, { 78, 73, 78, 0, 0, 0 },
242 { 78, 73, 78, 71, 0, 0 }, { 78, 73, 85, 0, 0, 0 },
243 { 78, 79, 78, 71, 0, 0 }, { 78, 79, 85, 0, 0, 0 },
244 { 78, 85, 0, 0, 0, 0 }, { 78, 85, 65, 78, 0, 0 },
245 { 78, 85, 69, 0, 0, 0 }, { 78, 85, 78, 0, 0, 0 },
246 { 78, 85, 79, 0, 0, 0 }, { 79, 0, 0, 0, 0, 0 },
247 { 79, 85, 0, 0, 0, 0 }, { 80, 65, 0, 0, 0, 0 },
248 { 80, 65, 73, 0, 0, 0 }, { 80, 65, 78, 0, 0, 0 },
249 { 80, 65, 78, 71, 0, 0 }, { 80, 65, 79, 0, 0, 0 },
250 { 80, 69, 73, 0, 0, 0 }, { 80, 69, 78, 0, 0, 0 },
251 { 80, 69, 78, 71, 0, 0 }, { 80, 73, 0, 0, 0, 0 },
252 { 80, 73, 65, 78, 0, 0 }, { 80, 73, 65, 79, 0, 0 },
253 { 80, 73, 69, 0, 0, 0 }, { 80, 73, 78, 0, 0, 0 },
254 { 80, 73, 78, 71, 0, 0 }, { 80, 79, 0, 0, 0, 0 },
255 { 80, 79, 85, 0, 0, 0 }, { 80, 85, 0, 0, 0, 0 },
256 { 81, 73, 0, 0, 0, 0 }, { 81, 73, 65, 0, 0, 0 },
257 { 81, 73, 65, 78, 0, 0 }, { 81, 73, 65, 78, 71, 0 },
258 { 81, 73, 65, 79, 0, 0 }, { 81, 73, 69, 0, 0, 0 },
259 { 81, 73, 78, 0, 0, 0 }, { 81, 73, 78, 71, 0, 0 },
260 { 81, 73, 79, 78, 71, 0 }, { 81, 73, 85, 0, 0, 0 },
261 { 81, 85, 0, 0, 0, 0 }, { 81, 85, 65, 78, 0, 0 },
262 { 81, 85, 69, 0, 0, 0 }, { 81, 85, 78, 0, 0, 0 },
263 { 82, 65, 78, 0, 0, 0 }, { 82, 65, 78, 71, 0, 0 },
264 { 82, 65, 79, 0, 0, 0 }, { 82, 69, 0, 0, 0, 0 },
265 { 82, 69, 78, 0, 0, 0 }, { 82, 69, 78, 71, 0, 0 },
266 { 82, 73, 0, 0, 0, 0 }, { 82, 79, 78, 71, 0, 0 },
267 { 82, 79, 85, 0, 0, 0 }, { 82, 85, 0, 0, 0, 0 },
268 { 82, 85, 65, 0, 0, 0 }, { 82, 85, 65, 78, 0, 0 },
269 { 82, 85, 73, 0, 0, 0 }, { 82, 85, 78, 0, 0, 0 },
270 { 82, 85, 79, 0, 0, 0 }, { 83, 65, 0, 0, 0, 0 },
271 { 83, 65, 73, 0, 0, 0 }, { 83, 65, 78, 0, 0, 0 },
272 { 83, 65, 78, 71, 0, 0 }, { 83, 65, 79, 0, 0, 0 },
273 { 83, 69, 0, 0, 0, 0 }, { 83, 69, 78, 0, 0, 0 },
274 { 83, 69, 78, 71, 0, 0 }, { 83, 72, 65, 0, 0, 0 },
275 { 83, 72, 65, 73, 0, 0 }, { 83, 72, 65, 78, 0, 0 },
276 { 83, 72, 65, 78, 71, 0 }, { 83, 72, 65, 79, 0, 0 },
277 { 83, 72, 69, 0, 0, 0 }, { 83, 72, 69, 78, 0, 0 },
278 { 88, 73, 78, 0, 0, 0 }, { 83, 72, 69, 78, 0, 0 },
279 { 83, 72, 69, 78, 71, 0 }, { 83, 72, 73, 0, 0, 0 },
280 { 83, 72, 79, 85, 0, 0 }, { 83, 72, 85, 0, 0, 0 },
281 { 83, 72, 85, 65, 0, 0 }, { 83, 72, 85, 65, 73, 0 },
282 { 83, 72, 85, 65, 78, 0 }, { 83, 72, 85, 65, 78, 71 },
283 { 83, 72, 85, 73, 0, 0 }, { 83, 72, 85, 78, 0, 0 },
284 { 83, 72, 85, 79, 0, 0 }, { 83, 73, 0, 0, 0, 0 },
285 { 83, 79, 78, 71, 0, 0 }, { 83, 79, 85, 0, 0, 0 },
286 { 83, 85, 0, 0, 0, 0 }, { 83, 85, 65, 78, 0, 0 },
287 { 83, 85, 73, 0, 0, 0 }, { 83, 85, 78, 0, 0, 0 },
288 { 83, 85, 79, 0, 0, 0 }, { 84, 65, 0, 0, 0, 0 },
289 { 84, 65, 73, 0, 0, 0 }, { 84, 65, 78, 0, 0, 0 },
290 { 84, 65, 78, 71, 0, 0 }, { 84, 65, 79, 0, 0, 0 },
291 { 84, 69, 0, 0, 0, 0 }, { 84, 69, 78, 71, 0, 0 },
292 { 84, 73, 0, 0, 0, 0 }, { 84, 73, 65, 78, 0, 0 },
293 { 84, 73, 65, 79, 0, 0 }, { 84, 73, 69, 0, 0, 0 },
294 { 84, 73, 78, 71, 0, 0 }, { 84, 79, 78, 71, 0, 0 },
295 { 84, 79, 85, 0, 0, 0 }, { 84, 85, 0, 0, 0, 0 },
296 { 84, 85, 65, 78, 0, 0 }, { 84, 85, 73, 0, 0, 0 },
297 { 84, 85, 78, 0, 0, 0 }, { 84, 85, 79, 0, 0, 0 },
298 { 87, 65, 0, 0, 0, 0 }, { 87, 65, 73, 0, 0, 0 },
299 { 87, 65, 78, 0, 0, 0 }, { 87, 65, 78, 71, 0, 0 },
300 { 87, 69, 73, 0, 0, 0 }, { 87, 69, 78, 0, 0, 0 },
301 { 87, 69, 78, 71, 0, 0 }, { 87, 79, 0, 0, 0, 0 },
302 { 87, 85, 0, 0, 0, 0 }, { 88, 73, 0, 0, 0, 0 },
303 { 88, 73, 65, 0, 0, 0 }, { 88, 73, 65, 78, 0, 0 },
304 { 88, 73, 65, 78, 71, 0 }, { 88, 73, 65, 79, 0, 0 },
305 { 88, 73, 69, 0, 0, 0 }, { 88, 73, 78, 0, 0, 0 },
306 { 88, 73, 78, 71, 0, 0 }, { 88, 73, 79, 78, 71, 0 },
307 { 88, 73, 85, 0, 0, 0 }, { 88, 85, 0, 0, 0, 0 },
308 { 88, 85, 65, 78, 0, 0 }, { 88, 85, 69, 0, 0, 0 },
309 { 88, 85, 78, 0, 0, 0 }, { 89, 65, 0, 0, 0, 0 },
310 { 89, 65, 78, 0, 0, 0 }, { 89, 65, 78, 71, 0, 0 },
311 { 89, 65, 79, 0, 0, 0 }, { 89, 69, 0, 0, 0, 0 },
312 { 89, 73, 0, 0, 0, 0 }, { 89, 73, 78, 0, 0, 0 },
313 { 89, 73, 78, 71, 0, 0 }, { 89, 79, 0, 0, 0, 0 },
314 { 89, 79, 78, 71, 0, 0 }, { 89, 79, 85, 0, 0, 0 },
315 { 89, 85, 0, 0, 0, 0 }, { 89, 85, 65, 78, 0, 0 },
316 { 89, 85, 69, 0, 0, 0 }, { 89, 85, 78, 0, 0, 0 },
317 { 74, 85, 78, 0, 0, 0 }, { 89, 85, 78, 0, 0, 0 },
318 { 90, 65, 0, 0, 0, 0 }, { 90, 65, 73, 0, 0, 0 },
319 { 90, 65, 78, 0, 0, 0 }, { 90, 65, 78, 71, 0, 0 },
320 { 90, 65, 79, 0, 0, 0 }, { 90, 69, 0, 0, 0, 0 },
321 { 90, 69, 73, 0, 0, 0 }, { 90, 69, 78, 0, 0, 0 },
322 { 90, 69, 78, 71, 0, 0 }, { 90, 72, 65, 0, 0, 0 },
323 { 90, 72, 65, 73, 0, 0 }, { 90, 72, 65, 78, 0, 0 },
324 { 90, 72, 65, 78, 71, 0 }, { 67, 72, 65, 78, 71, 0 },
325 { 90, 72, 65, 78, 71, 0 }, { 90, 72, 65, 79, 0, 0 },
326 { 90, 72, 69, 0, 0, 0 }, { 90, 72, 69, 78, 0, 0 },
327 { 90, 72, 69, 78, 71, 0 }, { 90, 72, 73, 0, 0, 0 },
328 { 83, 72, 73, 0, 0, 0 }, { 90, 72, 73, 0, 0, 0 },
329 { 90, 72, 79, 78, 71, 0 }, { 90, 72, 79, 85, 0, 0 },
330 { 90, 72, 85, 0, 0, 0 }, { 90, 72, 85, 65, 0, 0 },
331 { 90, 72, 85, 65, 73, 0 }, { 90, 72, 85, 65, 78, 0 },
332 { 90, 72, 85, 65, 78, 71 }, { 90, 72, 85, 73, 0, 0 },
333 { 90, 72, 85, 78, 0, 0 }, { 90, 72, 85, 79, 0, 0 },
334 { 90, 73, 0, 0, 0, 0 }, { 90, 79, 78, 71, 0, 0 },
335 { 90, 79, 85, 0, 0, 0 }, { 90, 85, 0, 0, 0, 0 },
336 { 90, 85, 65, 78, 0, 0 }, { 90, 85, 73, 0, 0, 0 },
337 { 90, 85, 78, 0, 0, 0 }, { 90, 85, 79, 0, 0, 0 },
338 { 0, 0, 0, 0, 0, 0 }, { 83, 72, 65, 78, 0, 0 },
339 { 0, 0, 0, 0, 0, 0 }, };
340
341 /**
342 * First and last Chinese character with known Pinyin according to zh
343 * collation
344 */
345 private static final String FIRST_PINYIN_UNIHAN = "\u963F";
346 private static final String LAST_PINYIN_UNIHAN = "\u9FFF";
347
348 private static final Collator COLLATOR = Collator.getInstance(Locale.CHINA);
349
350 private static HanziToPinyin sInstance;
351 private final boolean mHasChinaCollator;
352
353 public static class Token {
354 /**
355 * Separator between target string for each source char
356 */
357 public static final String SEPARATOR = " ";
358
359 public static final int LATIN = 1;
360 public static final int PINYIN = 2;
361 public static final int UNKNOWN = 3;
362
363 public Token() {
364 }
365
366 public Token(int type, String source, String target) {
367 this.type = type;
368 this.source = source;
369 this.target = target;
370 }
371
372 /**
373 * Type of this token, ASCII, PINYIN or UNKNOWN.
374 */
375 public int type;
376 /**
377 * Original string before translation.
378 */
379 public String source;
380 /**
381 * Translated string of source. For Han, target is corresponding Pinyin.
382 * Otherwise target is original string in source.
383 */
384 public String target;
385 }
386
387 protected HanziToPinyin(boolean hasChinaCollator) {
388 mHasChinaCollator = hasChinaCollator;
389 }
390
391 public static HanziToPinyin getInstance() {
392 synchronized (HanziToPinyin.class) {
393 if (sInstance != null) {
394 return sInstance;
395 }
396 // Check if zh_CN collation data is available
397 final Locale locale[] = Collator.getAvailableLocales();
398
399 // 增加的代码,增强。
400 final Locale chinaAddition = new Locale("zh");
401
402 for (int i = 0; i < locale.length; i++) {
403 if (locale[i].equals(Locale.CHINA)
404 || locale[i].equals(chinaAddition)) {
405 // Do self validation just once.
406 if (DEBUG) {
407 Log.d(TAG, "Self validation. Result: "
408 + doSelfValidation());
409 }
410 sInstance = new HanziToPinyin(true);
411 return sInstance;
412 }
413 }
414 Log.w(TAG,
415 "There is no Chinese collator, HanziToPinyin is disabled");
416 sInstance = new HanziToPinyin(false);
417 return sInstance;
418 }
419 }
420
421 /**
422 * Validate if our internal table has some wrong value.
423 *
424 * @return true when the table looks correct.
425 */
426 private static boolean doSelfValidation() {
427 char lastChar = UNIHANS[0];
428 String lastString = Character.toString(lastChar);
429 for (char c : UNIHANS) {
430 if (lastChar == c) {
431 continue;
432 }
433 final String curString = Character.toString(c);
434 int cmp = COLLATOR.compare(lastString, curString);
435 if (cmp >= 0) {
436 Log.e(TAG, "Internal error in Unihan table. "
437 + "The last string \"" + lastString
438 + "\" is greater than current string \"" + curString
439 + "\".");
440 return false;
441 }
442 lastString = curString;
443 }
444 return true;
445 }
446
447 private Token getToken(char character) {
448 Token token = new Token();
449 final String letter = Character.toString(character);
450 token.source = letter;
451 int offset = -1;
452 int cmp;
453 if (character < 256) {
454 token.type = Token.LATIN;
455 token.target = letter;
456 return token;
457 } else {
458 cmp = COLLATOR.compare(letter, FIRST_PINYIN_UNIHAN);
459 if (cmp < 0) {
460 token.type = Token.UNKNOWN;
461 token.target = letter;
462 return token;
463 } else if (cmp == 0) {
464 token.type = Token.PINYIN;
465 offset = 0;
466 } else {
467 cmp = COLLATOR.compare(letter, LAST_PINYIN_UNIHAN);
468 if (cmp > 0) {
469 token.type = Token.UNKNOWN;
470 token.target = letter;
471 return token;
472 } else if (cmp == 0) {
473 token.type = Token.PINYIN;
474 offset = UNIHANS.length - 1;
475 }
476 }
477 }
478
479 token.type = Token.PINYIN;
480 if (offset < 0) {
481 int begin = 0;
482 int end = UNIHANS.length - 1;
483 while (begin <= end) {
484 offset = (begin + end) / 2;
485 final String unihan = Character.toString(UNIHANS[offset]);
486 cmp = COLLATOR.compare(letter, unihan);
487 if (cmp == 0) {
488 break;
489 } else if (cmp > 0) {
490 begin = offset + 1;
491 } else {
492 end = offset - 1;
493 }
494 }
495 }
496 if (cmp < 0) {
497 offset--;
498 }
499 StringBuilder pinyin = new StringBuilder();
500 for (int j = 0; j < PINYINS[offset].length && PINYINS[offset][j] != 0; j++) {
501 pinyin.append((char) PINYINS[offset][j]);
502 }
503 token.target = pinyin.toString();
504 if (TextUtils.isEmpty(token.target)) {
505 token.type = Token.UNKNOWN;
506 token.target = token.source;
507 }
508 return token;
509 }
510
511 /**
512 * Convert the input to a array of tokens. The sequence of ASCII or Unknown
513 * characters without space will be put into a Token, One Hanzi character
514 * which has pinyin will be treated as a Token. If these is no China
515 * collator, the empty token array is returned.
516 */
517 public ArrayList<Token> get(final String input) {
518 ArrayList<Token> tokens = new ArrayList<Token>();
519 if (!mHasChinaCollator || TextUtils.isEmpty(input)) {
520 // return empty tokens.
521 return tokens;
522 }
523 final int inputLength = input.length();
524 final StringBuilder sb = new StringBuilder();
525 int tokenType = Token.LATIN;
526 // Go through the input, create a new token when
527 // a. Token type changed
528 // b. Get the Pinyin of current charater.
529 // c. current character is space.
530 for (int i = 0; i < inputLength; i++) {
531 final char character = input.charAt(i);
532 if (character == ' ') {
533 if (sb.length() > 0) {
534 addToken(sb, tokens, tokenType);
535 }
536 } else if (character < 256) {
537 if (tokenType != Token.LATIN && sb.length() > 0) {
538 addToken(sb, tokens, tokenType);
539 }
540 tokenType = Token.LATIN;
541 sb.append(character);
542 } else {
543 Token t = getToken(character);
544 if (t.type == Token.PINYIN) {
545 if (sb.length() > 0) {
546 addToken(sb, tokens, tokenType);
547 }
548 tokens.add(t);
549 tokenType = Token.PINYIN;
550 } else {
551 if (tokenType != t.type && sb.length() > 0) {
552 addToken(sb, tokens, tokenType);
553 }
554 tokenType = t.type;
555 sb.append(character);
556 }
557 }
558 }
559 if (sb.length() > 0) {
560 addToken(sb, tokens, tokenType);
561 }
562 return tokens;
563 }
564
565 private void addToken(final StringBuilder sb,
566 final ArrayList<Token> tokens, final int tokenType) {
567 String str = sb.toString();
568 tokens.add(new Token(tokenType, str, str));
569 sb.setLength(0);
570 }
571 }

HanziToPinyin.java

写一个MainActivity.java测试汉字转化为汉语拼音输出的效果:

 package zhangphil.hanyupinyin;

 import java.util.ArrayList;

 import zhangphil.hanyupinyin.HanziToPinyin.Token;
import android.app.Activity;
import android.os.Bundle; public class MainActivity extends Activity { @Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState); String s = "安卓";
System.out.println("汉字转拼音输出: " + getPinYin(s));
} // 输入汉字返回拼音的通用方法函数。
public static String getPinYin(String hanzi) {
ArrayList<Token> tokens = HanziToPinyin.getInstance().get(hanzi);
StringBuilder sb = new StringBuilder();
if (tokens != null && tokens.size() > 0) {
for (Token token : tokens) {
if (Token.PINYIN == token.type) {
sb.append(token.target);
} else {
sb.append(token.source);
}
}
} return sb.toString().toUpperCase();
}
}

结果输出如图:

(转)汉字转拼音HanziToPinyin的更多相关文章

  1. 优化后的 google提供的汉字转拼音类(针对某些htc等手机的不兼容情况)

    /* * Copyright (C) 2011 The Android Open Source Project * * Licensed under the Apache License, Versi ...

  2. 文件一键上传、汉字转拼音、excel文件上传下载功能模块的实现

    ----------------------------------------------------------------------------------------------[版权申明: ...

  3. iOS 汉字的拼音

    获取汉字的拼音 #import <Foundation/Foundation.h> @interface NSString (Utils) /** * 汉字的拼音 * * @return ...

  4. JavaScript 汉字与拼音互转终极方案 附JS拼音输入法

    转:http://www.codeceo.com/article/javascript-pinyin.html 前言 网上关于JS实现汉字和拼音互转的文章很多,但是比较杂乱,都是互相抄来抄去,而且有的 ...

  5. SQL汉字转拼音函数-支持首字母、全拼

    SQL汉字转拼音函数-支持首字母.全拼 FROM :http://my.oschina.net/ind/blog/191659 作者不详 --方法一sqlserver汉字转拼音首字母 --调用方法 s ...

  6. 【干货】JS版汉字与拼音互转终极方案,附简单的JS拼音输入法

    前言 网上关于JS实现汉字和拼音互转的文章很多,但是比较杂乱,都是互相抄来抄去,而且有的不支持多音字,有的不支持声调,有的字典文件太大,还比如有时候我仅仅是需要获取汉字拼音首字母却要引入200kb的字 ...

  7. C#汉字转拼音(支持多音字)

    之前由于项目需要,中间需要一个汉字转拼音和首拼的功能来做查询,感觉这种功能基本已经成熟化了,于是查找了相关的代码,首先引入眼帘的是下面两篇文章 1.C# 汉字转拼音(支持GB2312字符集中所有汉字) ...

  8. C#汉字转拼音(npinyin)将中文转换成拼音全文或首字母

    汉字转拼音貌似一直是C#开发的一个难题,无论什么方案都有一定的bug,之前使用了两种方案. 1.Chinese2Spell.cs 一些不能识别的汉字全部转为Z 2.Microsoft Visual S ...

  9. C#汉字转拼音帮助类

    using System; using System.Collections.Generic; using System.Text; using System.Text.RegularExpressi ...

随机推荐

  1. Access to the path '' is denied.解决方案

    在本地测试正常,但一上传到服务器上的时候,那个就提示Access to the path ‘路径’ is denied.我在网上找了很多资料,最后终于解决了,原来是因为在该文件的上级文件夹没有修改权限 ...

  2. Dapper ORM 用法

    假如你喜欢原生的Sql语句,又喜欢ORM的简单,那你一定会喜欢上Dapper这款ROM.Dapper的优势:1,Dapper是一个轻型的ORM类.代码就一个SqlMapper.cs文件,编译后就40K ...

  3. 转 【O2O案例】汽车后市场垂直化电子商务:平业模式解析

    核心提示:一.商业模式简介.汽车后市场垂直化电子商务是我在2010年初开始筹划,起因是在淘宝工作期间运营汽车类目后遇到很多问题无决,由于 一.商业模式简介. 汽车后市场垂直化电子商务是我在2010年初 ...

  4. css去掉默认的下拉,实现用户自定义的下拉列表

    <!DOCTYPE html><html lang="en"><head> <meta charset="UTF-8" ...

  5. Java开发从零开始填坑

    开始学习Java,感觉较.NET知识更零碎一些,所以开个帖子把自己踩过的坑记录下来,都是边边角角网上不容易找到的东西. 1.java命令格式:>cd %parent-of-pakadgePath ...

  6. 学习Slim Framework for PHP v3 (四)--get()是怎么加进去的?

    看看官网加粗的一句话: At its core, Slim is a dispatcher that receives an HTTP request, invokes an appropriate ...

  7. php学习笔记6--php中的文件包含 include,require,include_once,require_once

    php中的文件包含 include,require,include_once,require_once 文件包含:是指将一个文件的内容包含进另外一个文件,有利于代码的复用等.php中文件包含指令有4个 ...

  8. Activity生命周期-Android

    Activity常见的三种生命周期: 1.完整生命周期 oncreate-->onstart-->onresume-->onpause-->onstop-->ondest ...

  9. DWZ (JUI) 教程 DWZ中dialog层的刷新

    在DWZ开发过程中经常会遇到的一种情况就是:在navTab页面中通过a标签打开一个dialog,在dialog层进行操作后,需要对该dialog层进行必要的刷新操作. 1.首先讲一下思路: 在非dia ...

  10. js格式化日期,获取当月的第一天,与最后一天.

    //格式化日期 function setDate(date){   y=date.getFullYear();   m=date.getMonth()+1;   d=date.getDate();   ...