Android教程網
  1. 首頁
  2. Android 技術
  3. Android 手機
  4. Android 系統教程
  5. Android 游戲
 Android教程網 >> Android技術 >> Android開發 >> 關於android開發 >> (轉)漢字轉拼音HanziToPinyin,拼音hanzitopinyin

(轉)漢字轉拼音HanziToPinyin,拼音hanzitopinyin

編輯:關於android開發

(轉)漢字轉拼音HanziToPinyin,拼音hanzitopinyin


本文轉載於:http://blog.csdn.net/zhangphil/article/details/47164665

Android系統本身自帶有有將漢字轉化為英文拼音的類和方法。具體的類就是HanziToPinyin.java。Android系統自身實現的通訊錄中就使用了HanziToPinyin.java對中文通訊錄做分組整理。通過HanziToPinyin.java可以將漢字轉化為拼音輸出,在一些應用中非常必須,比如聯系人的分組,假設一個人通訊錄中存有若干姓張(ZHANG)的聯系人,那麼所有姓張的聯系人按理都應該分組在“Z”組下。又比如微信、QQ等等此類社交類APP,凡是涉及到聯系人、好友分組排序的應用場景,則均需要將漢字轉化為拼音然後依據首字母排序歸類。
HanziToPinyin.java不是一個公開的類,只是谷歌官方內部在實現Android通訊錄中私有使用的一個類,我們不能夠直接像使用普通Android SDK API一樣使用,但這沒關系,我們完全可以將這個類文件拷貝出來,放到我們自己的項目中,直接使用。
HanziToPinyin.java的代碼文件,谷歌官方的通訊錄APP下:

packages/providers/ContactsProvider /src/com/android/providers/contacts/HanziToPinyin.java

網上也有這個HanziToPinyin.java類文件的項目地址。但是,直接使用這個 類不能正常工作,錯誤原因是:

"There is no Chinese collator, HanziToPinyin is disabled"

發生這一錯誤的代碼塊是在HanziToPinyin.java的方法:
public static HanziToPinyin getInstance();
具體原因是這個方法在一些非原生定制的Android系統中,對中文Locale的定義規則不同,導致原代碼文件中的locale[i].equals(Locale.CHINA)返回false,不能識別,致使以後的代碼全部失去功效。

對此問題的修復(解決方案)

我改進了判斷條件,增加一些代碼:
final Locale chinaAddition = new Locale("zh");
將此chinaAddition作為輔助條件也加入到條件判斷中,

1 if ( locale[i].equals(Locale.CHINA) ||  locale[i].equals(chinaAddition) ){
2 …
3 }

下面是我改進後的getInstance()方法全部代碼:

 1 public static HanziToPinyin getInstance() {
 2         synchronized (HanziToPinyin.class) {
 3             if (sInstance != null) {
 4                 return sInstance;
 5             }
 6             // Check if zh_CN collation data is available
 7             final Locale locale[] = Collator.getAvailableLocales();
 8 
 9             // 增加的代碼,增強。
10             final Locale chinaAddition = new Locale("zh");
11 
12             for (int i = 0; i < locale.length; i++) {
13                 if (locale[i].equals(Locale.CHINA)
14                         || locale[i].equals(chinaAddition)) {
15                     // Do self validation just once.
16                     if (DEBUG) {
17                         Log.d(TAG, "Self validation. Result: "
18                                 + doSelfValidation());
19                     }
20                     sInstance = new HanziToPinyin(true);
21                     return sInstance;
22                 }
23             }
24             Log.w(TAG,
25                     "There is no Chinese collator, HanziToPinyin is disabled");
26             sInstance = new HanziToPinyin(false);
27             return sInstance;
28         }
29     }

經由改進增強,HanziToPinyin.java的全部源代碼如下(代碼可以復制到自己的項目中直接使用):

  1 /*
  2  * Copyright (C) 2011 The Android Open Source Project
  3  *
  4  * Licensed under the Apache License, Version 2.0 (the "License");
  5  * you may not use this file except in compliance with the License.
  6  * You may obtain a copy of the License at
  7  *
  8  *      http://www.apache.org/licenses/LICENSE-2.0
  9  *
 10  * Unless required by applicable law or agreed to in writing, software
 11  * distributed under the License is distributed on an "AS IS" BASIS,
 12  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 13  * See the License for the specific language governing permissions and
 14  * limitations under the License.
 15  */
 16 
 17 package zhangphil.hanyupinyin;
 18 
 19 import android.text.TextUtils;
 20 import android.util.Log;
 21 
 22 import java.text.Collator;
 23 import java.util.ArrayList;
 24 import java.util.Locale;
 25 
 26 /**
 27  * An object to convert Chinese character to its corresponding pinyin string.
 28  * For characters with multiple possible pinyin string, only one is selected
 29  * according to collator. Polyphone is not supported in this implementation.
 30  * This class is implemented to achieve the best runtime performance and minimum
 31  * runtime resources with tolerable sacrifice of accuracy. This implementation
 32  * highly depends on zh_CN ICU collation data and must be always synchronized
 33  * with ICU.
 34  *
 35  * Currently this file is aligned to zh.txt in ICU 4.6 鏉ヨ嚜android4.2婧愮爜
 36  */
 37 public class HanziToPinyin {
 38     private static final String TAG = "HanziToPinyin";
 39 
 40     // Turn on this flag when we want to check internal data structure.
 41     private static final boolean DEBUG = false;
 42 
 43     /**
 44      * Unihans array.
 45      *
 46      * Each unihans is the first one within same pinyin when collator is zh_CN.
 47      */
 48     public static final char[] UNIHANS = { '\u963f', '\u54ce', '\u5b89',
 49             '\u80ae', '\u51f9', '\u516b', '\u6300', '\u6273', '\u90a6',
 50             '\u52f9', '\u9642', '\u5954', '\u4f3b', '\u5c44', '\u8fb9',
 51             '\u706c', '\u618b', '\u6c43', '\u51ab', '\u7676', '\u5cec',
 52             '\u5693', '\u5072', '\u53c2', '\u4ed3', '\u64a1', '\u518a',
 53             '\u5d7e', '\u66fd', '\u66fe', '\u5c64', '\u53c9', '\u8286',
 54             '\u8fbf', '\u4f25', '\u6284', '\u8f66', '\u62bb', '\u6c88',
 55             '\u6c89', '\u9637', '\u5403', '\u5145', '\u62bd', '\u51fa',
 56             '\u6b3b', '\u63e3', '\u5ddb', '\u5205', '\u5439', '\u65fe',
 57             '\u9034', '\u5472', '\u5306', '\u51d1', '\u7c97', '\u6c46',
 58             '\u5d14', '\u90a8', '\u6413', '\u5491', '\u5446', '\u4e39',
 59             '\u5f53', '\u5200', '\u561a', '\u6265', '\u706f', '\u6c10',
 60             '\u55f2', '\u7538', '\u5201', '\u7239', '\u4e01', '\u4e1f',
 61             '\u4e1c', '\u543a', '\u53be', '\u8011', '\u8968', '\u5428',
 62             '\u591a', '\u59b8', '\u8bf6', '\u5940', '\u97a5', '\u513f',
 63             '\u53d1', '\u5e06', '\u531a', '\u98de', '\u5206', '\u4e30',
 64             '\u8985', '\u4ecf', '\u7d11', '\u4f15', '\u65ee', '\u4f85',
 65             '\u7518', '\u5188', '\u768b', '\u6208', '\u7ed9', '\u6839',
 66             '\u522f', '\u5de5', '\u52fe', '\u4f30', '\u74dc', '\u4e56',
 67             '\u5173', '\u5149', '\u5f52', '\u4e28', '\u5459', '\u54c8',
 68             '\u548d', '\u4f44', '\u592f', '\u8320', '\u8bc3', '\u9ed2',
 69             '\u62eb', '\u4ea8', '\u5677', '\u53ff', '\u9f41', '\u4e6f',
 70             '\u82b1', '\u6000', '\u72bf', '\u5ddf', '\u7070', '\u660f',
 71             '\u5419', '\u4e0c', '\u52a0', '\u620b', '\u6c5f', '\u827d',
 72             '\u9636', '\u5dfe', '\u5755', '\u5182', '\u4e29', '\u51e5',
 73             '\u59e2', '\u5658', '\u519b', '\u5494', '\u5f00', '\u520a',
 74             '\u5ffc', '\u5c3b', '\u533c', '\u808e', '\u52a5', '\u7a7a',
 75             '\u62a0', '\u625d', '\u5938', '\u84af', '\u5bbd', '\u5321',
 76             '\u4e8f', '\u5764', '\u6269', '\u5783', '\u6765', '\u5170',
 77             '\u5577', '\u635e', '\u808b', '\u52d2', '\u5d1a', '\u5215',
 78             '\u4fe9', '\u5941', '\u826f', '\u64a9', '\u5217', '\u62ce',
 79             '\u5222', '\u6e9c', '\u56d6', '\u9f99', '\u779c', '\u565c',
 80             '\u5a08', '\u7567', '\u62a1', '\u7f57', '\u5463', '\u5988',
 81             '\u57cb', '\u5ada', '\u7264', '\u732b', '\u4e48', '\u5445',
 82             '\u95e8', '\u753f', '\u54aa', '\u5b80', '\u55b5', '\u4e5c',
 83             '\u6c11', '\u540d', '\u8c2c', '\u6478', '\u54de', '\u6bea',
 84             '\u55ef', '\u62cf', '\u8149', '\u56e1', '\u56d4', '\u5b6c',
 85             '\u7592', '\u5a1e', '\u6041', '\u80fd', '\u59ae', '\u62c8',
 86             '\u5b22', '\u9e1f', '\u634f', '\u56dc', '\u5b81', '\u599e',
 87             '\u519c', '\u7fba', '\u5974', '\u597b', '\u759f', '\u9ec1',
 88             '\u90cd', '\u5594', '\u8bb4', '\u5991', '\u62cd', '\u7705',
 89             '\u4e53', '\u629b', '\u5478', '\u55b7', '\u5309', '\u4e15',
 90             '\u56e8', '\u527d', '\u6c15', '\u59d8', '\u4e52', '\u948b',
 91             '\u5256', '\u4ec6', '\u4e03', '\u6390', '\u5343', '\u545b',
 92             '\u6084', '\u767f', '\u4eb2', '\u72c5', '\u828e', '\u4e18',
 93             '\u533a', '\u5cd1', '\u7f3a', '\u590b', '\u5465', '\u7a63',
 94             '\u5a06', '\u60f9', '\u4eba', '\u6254', '\u65e5', '\u8338',
 95             '\u53b9', '\u909a', '\u633c', '\u5827', '\u5a51', '\u77a4',
 96             '\u637c', '\u4ee8', '\u6be2', '\u4e09', '\u6852', '\u63bb',
 97             '\u95aa', '\u68ee', '\u50e7', '\u6740', '\u7b5b', '\u5c71',
 98             '\u4f24', '\u5f30', '\u5962', '\u7533', '\u8398', '\u6552',
 99             '\u5347', '\u5c38', '\u53ce', '\u4e66', '\u5237', '\u8870',
100             '\u95e9', '\u53cc', '\u8c01', '\u542e', '\u8bf4', '\u53b6',
101             '\u5fea', '\u635c', '\u82cf', '\u72fb', '\u590a', '\u5b59',
102             '\u5506', '\u4ed6', '\u56fc', '\u574d', '\u6c64', '\u5932',
103             '\u5fd1', '\u71a5', '\u5254', '\u5929', '\u65eb', '\u5e16',
104             '\u5385', '\u56f2', '\u5077', '\u51f8', '\u6e4d', '\u63a8',
105             '\u541e', '\u4e47', '\u7a75', '\u6b6a', '\u5f2f', '\u5c23',
106             '\u5371', '\u6637', '\u7fc1', '\u631d', '\u4e4c', '\u5915',
107             '\u8672', '\u4eda', '\u4e61', '\u7071', '\u4e9b', '\u5fc3',
108             '\u661f', '\u51f6', '\u4f11', '\u5401', '\u5405', '\u524a',
109             '\u5743', '\u4e2b', '\u6079', '\u592e', '\u5e7a', '\u503b',
110             '\u4e00', '\u56d9', '\u5e94', '\u54df', '\u4f63', '\u4f18',
111             '\u625c', '\u56e6', '\u66f0', '\u6655', '\u7b60', '\u7b7c',
112             '\u5e00', '\u707d', '\u5142', '\u5328', '\u50ae', '\u5219',
113             '\u8d3c', '\u600e', '\u5897', '\u624e', '\u635a', '\u6cbe',
114             '\u5f20', '\u957f', '\u9577', '\u4f4b', '\u8707', '\u8d1e',
115             '\u4e89', '\u4e4b', '\u5cd9', '\u5ea2', '\u4e2d', '\u5dde',
116             '\u6731', '\u6293', '\u62fd', '\u4e13', '\u5986', '\u96b9',
117             '\u5b92', '\u5353', '\u4e72', '\u5b97', '\u90b9', '\u79df',
118             '\u94bb', '\u539c', '\u5c0a', '\u6628', '\u5159', '\u9fc3',
119             '\u9fc4', };
120 
121     /**
122      * Pinyin array.
123      *
124      * Each pinyin is corresponding to unihans of same offset in the unihans
125      * array.
126      */
127     public static final byte[][] PINYINS = { { 65, 0, 0, 0, 0, 0 },
128             { 65, 73, 0, 0, 0, 0 }, { 65, 78, 0, 0, 0, 0 },
129             { 65, 78, 71, 0, 0, 0 }, { 65, 79, 0, 0, 0, 0 },
130             { 66, 65, 0, 0, 0, 0 }, { 66, 65, 73, 0, 0, 0 },
131             { 66, 65, 78, 0, 0, 0 }, { 66, 65, 78, 71, 0, 0 },
132             { 66, 65, 79, 0, 0, 0 }, { 66, 69, 73, 0, 0, 0 },
133             { 66, 69, 78, 0, 0, 0 }, { 66, 69, 78, 71, 0, 0 },
134             { 66, 73, 0, 0, 0, 0 }, { 66, 73, 65, 78, 0, 0 },
135             { 66, 73, 65, 79, 0, 0 }, { 66, 73, 69, 0, 0, 0 },
136             { 66, 73, 78, 0, 0, 0 }, { 66, 73, 78, 71, 0, 0 },
137             { 66, 79, 0, 0, 0, 0 }, { 66, 85, 0, 0, 0, 0 },
138             { 67, 65, 0, 0, 0, 0 }, { 67, 65, 73, 0, 0, 0 },
139             { 67, 65, 78, 0, 0, 0 }, { 67, 65, 78, 71, 0, 0 },
140             { 67, 65, 79, 0, 0, 0 }, { 67, 69, 0, 0, 0, 0 },
141             { 67, 69, 78, 0, 0, 0 }, { 67, 69, 78, 71, 0, 0 },
142             { 90, 69, 78, 71, 0, 0 }, { 67, 69, 78, 71, 0, 0 },
143             { 67, 72, 65, 0, 0, 0 }, { 67, 72, 65, 73, 0, 0 },
144             { 67, 72, 65, 78, 0, 0 }, { 67, 72, 65, 78, 71, 0 },
145             { 67, 72, 65, 79, 0, 0 }, { 67, 72, 69, 0, 0, 0 },
146             { 67, 72, 69, 78, 0, 0 }, { 83, 72, 69, 78, 0, 0 },
147             { 67, 72, 69, 78, 0, 0 }, { 67, 72, 69, 78, 71, 0 },
148             { 67, 72, 73, 0, 0, 0 }, { 67, 72, 79, 78, 71, 0 },
149             { 67, 72, 79, 85, 0, 0 }, { 67, 72, 85, 0, 0, 0 },
150             { 67, 72, 85, 65, 0, 0 }, { 67, 72, 85, 65, 73, 0 },
151             { 67, 72, 85, 65, 78, 0 }, { 67, 72, 85, 65, 78, 71 },
152             { 67, 72, 85, 73, 0, 0 }, { 67, 72, 85, 78, 0, 0 },
153             { 67, 72, 85, 79, 0, 0 }, { 67, 73, 0, 0, 0, 0 },
154             { 67, 79, 78, 71, 0, 0 }, { 67, 79, 85, 0, 0, 0 },
155             { 67, 85, 0, 0, 0, 0 }, { 67, 85, 65, 78, 0, 0 },
156             { 67, 85, 73, 0, 0, 0 }, { 67, 85, 78, 0, 0, 0 },
157             { 67, 85, 79, 0, 0, 0 }, { 68, 65, 0, 0, 0, 0 },
158             { 68, 65, 73, 0, 0, 0 }, { 68, 65, 78, 0, 0, 0 },
159             { 68, 65, 78, 71, 0, 0 }, { 68, 65, 79, 0, 0, 0 },
160             { 68, 69, 0, 0, 0, 0 }, { 68, 69, 78, 0, 0, 0 },
161             { 68, 69, 78, 71, 0, 0 }, { 68, 73, 0, 0, 0, 0 },
162             { 68, 73, 65, 0, 0, 0 }, { 68, 73, 65, 78, 0, 0 },
163             { 68, 73, 65, 79, 0, 0 }, { 68, 73, 69, 0, 0, 0 },
164             { 68, 73, 78, 71, 0, 0 }, { 68, 73, 85, 0, 0, 0 },
165             { 68, 79, 78, 71, 0, 0 }, { 68, 79, 85, 0, 0, 0 },
166             { 68, 85, 0, 0, 0, 0 }, { 68, 85, 65, 78, 0, 0 },
167             { 68, 85, 73, 0, 0, 0 }, { 68, 85, 78, 0, 0, 0 },
168             { 68, 85, 79, 0, 0, 0 }, { 69, 0, 0, 0, 0, 0 },
169             { 69, 73, 0, 0, 0, 0 }, { 69, 78, 0, 0, 0, 0 },
170             { 69, 78, 71, 0, 0, 0 }, { 69, 82, 0, 0, 0, 0 },
171             { 70, 65, 0, 0, 0, 0 }, { 70, 65, 78, 0, 0, 0 },
172             { 70, 65, 78, 71, 0, 0 }, { 70, 69, 73, 0, 0, 0 },
173             { 70, 69, 78, 0, 0, 0 }, { 70, 69, 78, 71, 0, 0 },
174             { 70, 73, 65, 79, 0, 0 }, { 70, 79, 0, 0, 0, 0 },
175             { 70, 79, 85, 0, 0, 0 }, { 70, 85, 0, 0, 0, 0 },
176             { 71, 65, 0, 0, 0, 0 }, { 71, 65, 73, 0, 0, 0 },
177             { 71, 65, 78, 0, 0, 0 }, { 71, 65, 78, 71, 0, 0 },
178             { 71, 65, 79, 0, 0, 0 }, { 71, 69, 0, 0, 0, 0 },
179             { 71, 69, 73, 0, 0, 0 }, { 71, 69, 78, 0, 0, 0 },
180             { 71, 69, 78, 71, 0, 0 }, { 71, 79, 78, 71, 0, 0 },
181             { 71, 79, 85, 0, 0, 0 }, { 71, 85, 0, 0, 0, 0 },
182             { 71, 85, 65, 0, 0, 0 }, { 71, 85, 65, 73, 0, 0 },
183             { 71, 85, 65, 78, 0, 0 }, { 71, 85, 65, 78, 71, 0 },
184             { 71, 85, 73, 0, 0, 0 }, { 71, 85, 78, 0, 0, 0 },
185             { 71, 85, 79, 0, 0, 0 }, { 72, 65, 0, 0, 0, 0 },
186             { 72, 65, 73, 0, 0, 0 }, { 72, 65, 78, 0, 0, 0 },
187             { 72, 65, 78, 71, 0, 0 }, { 72, 65, 79, 0, 0, 0 },
188             { 72, 69, 0, 0, 0, 0 }, { 72, 69, 73, 0, 0, 0 },
189             { 72, 69, 78, 0, 0, 0 }, { 72, 69, 78, 71, 0, 0 },
190             { 72, 77, 0, 0, 0, 0 }, { 72, 79, 78, 71, 0, 0 },
191             { 72, 79, 85, 0, 0, 0 }, { 72, 85, 0, 0, 0, 0 },
192             { 72, 85, 65, 0, 0, 0 }, { 72, 85, 65, 73, 0, 0 },
193             { 72, 85, 65, 78, 0, 0 }, { 72, 85, 65, 78, 71, 0 },
194             { 72, 85, 73, 0, 0, 0 }, { 72, 85, 78, 0, 0, 0 },
195             { 72, 85, 79, 0, 0, 0 }, { 74, 73, 0, 0, 0, 0 },
196             { 74, 73, 65, 0, 0, 0 }, { 74, 73, 65, 78, 0, 0 },
197             { 74, 73, 65, 78, 71, 0 }, { 74, 73, 65, 79, 0, 0 },
198             { 74, 73, 69, 0, 0, 0 }, { 74, 73, 78, 0, 0, 0 },
199             { 74, 73, 78, 71, 0, 0 }, { 74, 73, 79, 78, 71, 0 },
200             { 74, 73, 85, 0, 0, 0 }, { 74, 85, 0, 0, 0, 0 },
201             { 74, 85, 65, 78, 0, 0 }, { 74, 85, 69, 0, 0, 0 },
202             { 74, 85, 78, 0, 0, 0 }, { 75, 65, 0, 0, 0, 0 },
203             { 75, 65, 73, 0, 0, 0 }, { 75, 65, 78, 0, 0, 0 },
204             { 75, 65, 78, 71, 0, 0 }, { 75, 65, 79, 0, 0, 0 },
205             { 75, 69, 0, 0, 0, 0 }, { 75, 69, 78, 0, 0, 0 },
206             { 75, 69, 78, 71, 0, 0 }, { 75, 79, 78, 71, 0, 0 },
207             { 75, 79, 85, 0, 0, 0 }, { 75, 85, 0, 0, 0, 0 },
208             { 75, 85, 65, 0, 0, 0 }, { 75, 85, 65, 73, 0, 0 },
209             { 75, 85, 65, 78, 0, 0 }, { 75, 85, 65, 78, 71, 0 },
210             { 75, 85, 73, 0, 0, 0 }, { 75, 85, 78, 0, 0, 0 },
211             { 75, 85, 79, 0, 0, 0 }, { 76, 65, 0, 0, 0, 0 },
212             { 76, 65, 73, 0, 0, 0 }, { 76, 65, 78, 0, 0, 0 },
213             { 76, 65, 78, 71, 0, 0 }, { 76, 65, 79, 0, 0, 0 },
214             { 76, 69, 0, 0, 0, 0 }, { 76, 69, 73, 0, 0, 0 },
215             { 76, 69, 78, 71, 0, 0 }, { 76, 73, 0, 0, 0, 0 },
216             { 76, 73, 65, 0, 0, 0 }, { 76, 73, 65, 78, 0, 0 },
217             { 76, 73, 65, 78, 71, 0 }, { 76, 73, 65, 79, 0, 0 },
218             { 76, 73, 69, 0, 0, 0 }, { 76, 73, 78, 0, 0, 0 },
219             { 76, 73, 78, 71, 0, 0 }, { 76, 73, 85, 0, 0, 0 },
220             { 76, 79, 0, 0, 0, 0 }, { 76, 79, 78, 71, 0, 0 },
221             { 76, 79, 85, 0, 0, 0 }, { 76, 85, 0, 0, 0, 0 },
222             { 76, 85, 65, 78, 0, 0 }, { 76, 85, 69, 0, 0, 0 },
223             { 76, 85, 78, 0, 0, 0 }, { 76, 85, 79, 0, 0, 0 },
224             { 77, 0, 0, 0, 0, 0 }, { 77, 65, 0, 0, 0, 0 },
225             { 77, 65, 73, 0, 0, 0 }, { 77, 65, 78, 0, 0, 0 },
226             { 77, 65, 78, 71, 0, 0 }, { 77, 65, 79, 0, 0, 0 },
227             { 77, 69, 0, 0, 0, 0 }, { 77, 69, 73, 0, 0, 0 },
228             { 77, 69, 78, 0, 0, 0 }, { 77, 69, 78, 71, 0, 0 },
229             { 77, 73, 0, 0, 0, 0 }, { 77, 73, 65, 78, 0, 0 },
230             { 77, 73, 65, 79, 0, 0 }, { 77, 73, 69, 0, 0, 0 },
231             { 77, 73, 78, 0, 0, 0 }, { 77, 73, 78, 71, 0, 0 },
232             { 77, 73, 85, 0, 0, 0 }, { 77, 79, 0, 0, 0, 0 },
233             { 77, 79, 85, 0, 0, 0 }, { 77, 85, 0, 0, 0, 0 },
234             { 78, 0, 0, 0, 0, 0 }, { 78, 65, 0, 0, 0, 0 },
235             { 78, 65, 73, 0, 0, 0 }, { 78, 65, 78, 0, 0, 0 },
236             { 78, 65, 78, 71, 0, 0 }, { 78, 65, 79, 0, 0, 0 },
237             { 78, 69, 0, 0, 0, 0 }, { 78, 69, 73, 0, 0, 0 },
238             { 78, 69, 78, 0, 0, 0 }, { 78, 69, 78, 71, 0, 0 },
239             { 78, 73, 0, 0, 0, 0 }, { 78, 73, 65, 78, 0, 0 },
240             { 78, 73, 65, 78, 71, 0 }, { 78, 73, 65, 79, 0, 0 },
241             { 78, 73, 69, 0, 0, 0 }, { 78, 73, 78, 0, 0, 0 },
242             { 78, 73, 78, 71, 0, 0 }, { 78, 73, 85, 0, 0, 0 },
243             { 78, 79, 78, 71, 0, 0 }, { 78, 79, 85, 0, 0, 0 },
244             { 78, 85, 0, 0, 0, 0 }, { 78, 85, 65, 78, 0, 0 },
245             { 78, 85, 69, 0, 0, 0 }, { 78, 85, 78, 0, 0, 0 },
246             { 78, 85, 79, 0, 0, 0 }, { 79, 0, 0, 0, 0, 0 },
247             { 79, 85, 0, 0, 0, 0 }, { 80, 65, 0, 0, 0, 0 },
248             { 80, 65, 73, 0, 0, 0 }, { 80, 65, 78, 0, 0, 0 },
249             { 80, 65, 78, 71, 0, 0 }, { 80, 65, 79, 0, 0, 0 },
250             { 80, 69, 73, 0, 0, 0 }, { 80, 69, 78, 0, 0, 0 },
251             { 80, 69, 78, 71, 0, 0 }, { 80, 73, 0, 0, 0, 0 },
252             { 80, 73, 65, 78, 0, 0 }, { 80, 73, 65, 79, 0, 0 },
253             { 80, 73, 69, 0, 0, 0 }, { 80, 73, 78, 0, 0, 0 },
254             { 80, 73, 78, 71, 0, 0 }, { 80, 79, 0, 0, 0, 0 },
255             { 80, 79, 85, 0, 0, 0 }, { 80, 85, 0, 0, 0, 0 },
256             { 81, 73, 0, 0, 0, 0 }, { 81, 73, 65, 0, 0, 0 },
257             { 81, 73, 65, 78, 0, 0 }, { 81, 73, 65, 78, 71, 0 },
258             { 81, 73, 65, 79, 0, 0 }, { 81, 73, 69, 0, 0, 0 },
259             { 81, 73, 78, 0, 0, 0 }, { 81, 73, 78, 71, 0, 0 },
260             { 81, 73, 79, 78, 71, 0 }, { 81, 73, 85, 0, 0, 0 },
261             { 81, 85, 0, 0, 0, 0 }, { 81, 85, 65, 78, 0, 0 },
262             { 81, 85, 69, 0, 0, 0 }, { 81, 85, 78, 0, 0, 0 },
263             { 82, 65, 78, 0, 0, 0 }, { 82, 65, 78, 71, 0, 0 },
264             { 82, 65, 79, 0, 0, 0 }, { 82, 69, 0, 0, 0, 0 },
265             { 82, 69, 78, 0, 0, 0 }, { 82, 69, 78, 71, 0, 0 },
266             { 82, 73, 0, 0, 0, 0 }, { 82, 79, 78, 71, 0, 0 },
267             { 82, 79, 85, 0, 0, 0 }, { 82, 85, 0, 0, 0, 0 },
268             { 82, 85, 65, 0, 0, 0 }, { 82, 85, 65, 78, 0, 0 },
269             { 82, 85, 73, 0, 0, 0 }, { 82, 85, 78, 0, 0, 0 },
270             { 82, 85, 79, 0, 0, 0 }, { 83, 65, 0, 0, 0, 0 },
271             { 83, 65, 73, 0, 0, 0 }, { 83, 65, 78, 0, 0, 0 },
272             { 83, 65, 78, 71, 0, 0 }, { 83, 65, 79, 0, 0, 0 },
273             { 83, 69, 0, 0, 0, 0 }, { 83, 69, 78, 0, 0, 0 },
274             { 83, 69, 78, 71, 0, 0 }, { 83, 72, 65, 0, 0, 0 },
275             { 83, 72, 65, 73, 0, 0 }, { 83, 72, 65, 78, 0, 0 },
276             { 83, 72, 65, 78, 71, 0 }, { 83, 72, 65, 79, 0, 0 },
277             { 83, 72, 69, 0, 0, 0 }, { 83, 72, 69, 78, 0, 0 },
278             { 88, 73, 78, 0, 0, 0 }, { 83, 72, 69, 78, 0, 0 },
279             { 83, 72, 69, 78, 71, 0 }, { 83, 72, 73, 0, 0, 0 },
280             { 83, 72, 79, 85, 0, 0 }, { 83, 72, 85, 0, 0, 0 },
281             { 83, 72, 85, 65, 0, 0 }, { 83, 72, 85, 65, 73, 0 },
282             { 83, 72, 85, 65, 78, 0 }, { 83, 72, 85, 65, 78, 71 },
283             { 83, 72, 85, 73, 0, 0 }, { 83, 72, 85, 78, 0, 0 },
284             { 83, 72, 85, 79, 0, 0 }, { 83, 73, 0, 0, 0, 0 },
285             { 83, 79, 78, 71, 0, 0 }, { 83, 79, 85, 0, 0, 0 },
286             { 83, 85, 0, 0, 0, 0 }, { 83, 85, 65, 78, 0, 0 },
287             { 83, 85, 73, 0, 0, 0 }, { 83, 85, 78, 0, 0, 0 },
288             { 83, 85, 79, 0, 0, 0 }, { 84, 65, 0, 0, 0, 0 },
289             { 84, 65, 73, 0, 0, 0 }, { 84, 65, 78, 0, 0, 0 },
290             { 84, 65, 78, 71, 0, 0 }, { 84, 65, 79, 0, 0, 0 },
291             { 84, 69, 0, 0, 0, 0 }, { 84, 69, 78, 71, 0, 0 },
292             { 84, 73, 0, 0, 0, 0 }, { 84, 73, 65, 78, 0, 0 },
293             { 84, 73, 65, 79, 0, 0 }, { 84, 73, 69, 0, 0, 0 },
294             { 84, 73, 78, 71, 0, 0 }, { 84, 79, 78, 71, 0, 0 },
295             { 84, 79, 85, 0, 0, 0 }, { 84, 85, 0, 0, 0, 0 },
296             { 84, 85, 65, 78, 0, 0 }, { 84, 85, 73, 0, 0, 0 },
297             { 84, 85, 78, 0, 0, 0 }, { 84, 85, 79, 0, 0, 0 },
298             { 87, 65, 0, 0, 0, 0 }, { 87, 65, 73, 0, 0, 0 },
299             { 87, 65, 78, 0, 0, 0 }, { 87, 65, 78, 71, 0, 0 },
300             { 87, 69, 73, 0, 0, 0 }, { 87, 69, 78, 0, 0, 0 },
301             { 87, 69, 78, 71, 0, 0 }, { 87, 79, 0, 0, 0, 0 },
302             { 87, 85, 0, 0, 0, 0 }, { 88, 73, 0, 0, 0, 0 },
303             { 88, 73, 65, 0, 0, 0 }, { 88, 73, 65, 78, 0, 0 },
304             { 88, 73, 65, 78, 71, 0 }, { 88, 73, 65, 79, 0, 0 },
305             { 88, 73, 69, 0, 0, 0 }, { 88, 73, 78, 0, 0, 0 },
306             { 88, 73, 78, 71, 0, 0 }, { 88, 73, 79, 78, 71, 0 },
307             { 88, 73, 85, 0, 0, 0 }, { 88, 85, 0, 0, 0, 0 },
308             { 88, 85, 65, 78, 0, 0 }, { 88, 85, 69, 0, 0, 0 },
309             { 88, 85, 78, 0, 0, 0 }, { 89, 65, 0, 0, 0, 0 },
310             { 89, 65, 78, 0, 0, 0 }, { 89, 65, 78, 71, 0, 0 },
311             { 89, 65, 79, 0, 0, 0 }, { 89, 69, 0, 0, 0, 0 },
312             { 89, 73, 0, 0, 0, 0 }, { 89, 73, 78, 0, 0, 0 },
313             { 89, 73, 78, 71, 0, 0 }, { 89, 79, 0, 0, 0, 0 },
314             { 89, 79, 78, 71, 0, 0 }, { 89, 79, 85, 0, 0, 0 },
315             { 89, 85, 0, 0, 0, 0 }, { 89, 85, 65, 78, 0, 0 },
316             { 89, 85, 69, 0, 0, 0 }, { 89, 85, 78, 0, 0, 0 },
317             { 74, 85, 78, 0, 0, 0 }, { 89, 85, 78, 0, 0, 0 },
318             { 90, 65, 0, 0, 0, 0 }, { 90, 65, 73, 0, 0, 0 },
319             { 90, 65, 78, 0, 0, 0 }, { 90, 65, 78, 71, 0, 0 },
320             { 90, 65, 79, 0, 0, 0 }, { 90, 69, 0, 0, 0, 0 },
321             { 90, 69, 73, 0, 0, 0 }, { 90, 69, 78, 0, 0, 0 },
322             { 90, 69, 78, 71, 0, 0 }, { 90, 72, 65, 0, 0, 0 },
323             { 90, 72, 65, 73, 0, 0 }, { 90, 72, 65, 78, 0, 0 },
324             { 90, 72, 65, 78, 71, 0 }, { 67, 72, 65, 78, 71, 0 },
325             { 90, 72, 65, 78, 71, 0 }, { 90, 72, 65, 79, 0, 0 },
326             { 90, 72, 69, 0, 0, 0 }, { 90, 72, 69, 78, 0, 0 },
327             { 90, 72, 69, 78, 71, 0 }, { 90, 72, 73, 0, 0, 0 },
328             { 83, 72, 73, 0, 0, 0 }, { 90, 72, 73, 0, 0, 0 },
329             { 90, 72, 79, 78, 71, 0 }, { 90, 72, 79, 85, 0, 0 },
330             { 90, 72, 85, 0, 0, 0 }, { 90, 72, 85, 65, 0, 0 },
331             { 90, 72, 85, 65, 73, 0 }, { 90, 72, 85, 65, 78, 0 },
332             { 90, 72, 85, 65, 78, 71 }, { 90, 72, 85, 73, 0, 0 },
333             { 90, 72, 85, 78, 0, 0 }, { 90, 72, 85, 79, 0, 0 },
334             { 90, 73, 0, 0, 0, 0 }, { 90, 79, 78, 71, 0, 0 },
335             { 90, 79, 85, 0, 0, 0 }, { 90, 85, 0, 0, 0, 0 },
336             { 90, 85, 65, 78, 0, 0 }, { 90, 85, 73, 0, 0, 0 },
337             { 90, 85, 78, 0, 0, 0 }, { 90, 85, 79, 0, 0, 0 },
338             { 0, 0, 0, 0, 0, 0 }, { 83, 72, 65, 78, 0, 0 },
339             { 0, 0, 0, 0, 0, 0 }, };
340 
341     /**
342      * First and last Chinese character with known Pinyin according to zh
343      * collation
344      */
345     private static final String FIRST_PINYIN_UNIHAN = "\u963F";
346     private static final String LAST_PINYIN_UNIHAN = "\u9FFF";
347 
348     private static final Collator COLLATOR = Collator.getInstance(Locale.CHINA);
349 
350     private static HanziToPinyin sInstance;
351     private final boolean mHasChinaCollator;
352 
353     public static class Token {
354         /**
355          * Separator between target string for each source char
356          */
357         public static final String SEPARATOR = " ";
358 
359         public static final int LATIN = 1;
360         public static final int PINYIN = 2;
361         public static final int UNKNOWN = 3;
362 
363         public Token() {
364         }
365 
366         public Token(int type, String source, String target) {
367             this.type = type;
368             this.source = source;
369             this.target = target;
370         }
371 
372         /**
373          * Type of this token, ASCII, PINYIN or UNKNOWN.
374          */
375         public int type;
376         /**
377          * Original string before translation.
378          */
379         public String source;
380         /**
381          * Translated string of source. For Han, target is corresponding Pinyin.
382          * Otherwise target is original string in source.
383          */
384         public String target;
385     }
386 
387     protected HanziToPinyin(boolean hasChinaCollator) {
388         mHasChinaCollator = hasChinaCollator;
389     }
390 
391     public static HanziToPinyin getInstance() {
392         synchronized (HanziToPinyin.class) {
393             if (sInstance != null) {
394                 return sInstance;
395             }
396             // Check if zh_CN collation data is available
397             final Locale locale[] = Collator.getAvailableLocales();
398 
399             // 增加的代碼,增強。
400             final Locale chinaAddition = new Locale("zh");
401 
402             for (int i = 0; i < locale.length; i++) {
403                 if (locale[i].equals(Locale.CHINA)
404                         || locale[i].equals(chinaAddition)) {
405                     // Do self validation just once.
406                     if (DEBUG) {
407                         Log.d(TAG, "Self validation. Result: "
408                                 + doSelfValidation());
409                     }
410                     sInstance = new HanziToPinyin(true);
411                     return sInstance;
412                 }
413             }
414             Log.w(TAG,
415                     "There is no Chinese collator, HanziToPinyin is disabled");
416             sInstance = new HanziToPinyin(false);
417             return sInstance;
418         }
419     }
420 
421     /**
422      * Validate if our internal table has some wrong value.
423      *
424      * @return true when the table looks correct.
425      */
426     private static boolean doSelfValidation() {
427         char lastChar = UNIHANS[0];
428         String lastString = Character.toString(lastChar);
429         for (char c : UNIHANS) {
430             if (lastChar == c) {
431                 continue;
432             }
433             final String curString = Character.toString(c);
434             int cmp = COLLATOR.compare(lastString, curString);
435             if (cmp >= 0) {
436                 Log.e(TAG, "Internal error in Unihan table. "
437                         + "The last string \"" + lastString
438                         + "\" is greater than current string \"" + curString
439                         + "\".");
440                 return false;
441             }
442             lastString = curString;
443         }
444         return true;
445     }
446 
447     private Token getToken(char character) {
448         Token token = new Token();
449         final String letter = Character.toString(character);
450         token.source = letter;
451         int offset = -1;
452         int cmp;
453         if (character < 256) {
454             token.type = Token.LATIN;
455             token.target = letter;
456             return token;
457         } else {
458             cmp = COLLATOR.compare(letter, FIRST_PINYIN_UNIHAN);
459             if (cmp < 0) {
460                 token.type = Token.UNKNOWN;
461                 token.target = letter;
462                 return token;
463             } else if (cmp == 0) {
464                 token.type = Token.PINYIN;
465                 offset = 0;
466             } else {
467                 cmp = COLLATOR.compare(letter, LAST_PINYIN_UNIHAN);
468                 if (cmp > 0) {
469                     token.type = Token.UNKNOWN;
470                     token.target = letter;
471                     return token;
472                 } else if (cmp == 0) {
473                     token.type = Token.PINYIN;
474                     offset = UNIHANS.length - 1;
475                 }
476             }
477         }
478 
479         token.type = Token.PINYIN;
480         if (offset < 0) {
481             int begin = 0;
482             int end = UNIHANS.length - 1;
483             while (begin <= end) {
484                 offset = (begin + end) / 2;
485                 final String unihan = Character.toString(UNIHANS[offset]);
486                 cmp = COLLATOR.compare(letter, unihan);
487                 if (cmp == 0) {
488                     break;
489                 } else if (cmp > 0) {
490                     begin = offset + 1;
491                 } else {
492                     end = offset - 1;
493                 }
494             }
495         }
496         if (cmp < 0) {
497             offset--;
498         }
499         StringBuilder pinyin = new StringBuilder();
500         for (int j = 0; j < PINYINS[offset].length && PINYINS[offset][j] != 0; j++) {
501             pinyin.append((char) PINYINS[offset][j]);
502         }
503         token.target = pinyin.toString();
504         if (TextUtils.isEmpty(token.target)) {
505             token.type = Token.UNKNOWN;
506             token.target = token.source;
507         }
508         return token;
509     }
510 
511     /**
512      * Convert the input to a array of tokens. The sequence of ASCII or Unknown
513      * characters without space will be put into a Token, One Hanzi character
514      * which has pinyin will be treated as a Token. If these is no China
515      * collator, the empty token array is returned.
516      */
517     public ArrayList<Token> get(final String input) {
518         ArrayList<Token> tokens = new ArrayList<Token>();
519         if (!mHasChinaCollator || TextUtils.isEmpty(input)) {
520             // return empty tokens.
521             return tokens;
522         }
523         final int inputLength = input.length();
524         final StringBuilder sb = new StringBuilder();
525         int tokenType = Token.LATIN;
526         // Go through the input, create a new token when
527         // a. Token type changed
528         // b. Get the Pinyin of current charater.
529         // c. current character is space.
530         for (int i = 0; i < inputLength; i++) {
531             final char character = input.charAt(i);
532             if (character == ' ') {
533                 if (sb.length() > 0) {
534                     addToken(sb, tokens, tokenType);
535                 }
536             } else if (character < 256) {
537                 if (tokenType != Token.LATIN && sb.length() > 0) {
538                     addToken(sb, tokens, tokenType);
539                 }
540                 tokenType = Token.LATIN;
541                 sb.append(character);
542             } else {
543                 Token t = getToken(character);
544                 if (t.type == Token.PINYIN) {
545                     if (sb.length() > 0) {
546                         addToken(sb, tokens, tokenType);
547                     }
548                     tokens.add(t);
549                     tokenType = Token.PINYIN;
550                 } else {
551                     if (tokenType != t.type && sb.length() > 0) {
552                         addToken(sb, tokens, tokenType);
553                     }
554                     tokenType = t.type;
555                     sb.append(character);
556                 }
557             }
558         }
559         if (sb.length() > 0) {
560             addToken(sb, tokens, tokenType);
561         }
562         return tokens;
563     }
564 
565     private void addToken(final StringBuilder sb,
566             final ArrayList<Token> tokens, final int tokenType) {
567         String str = sb.toString();
568         tokens.add(new Token(tokenType, str, str));
569         sb.setLength(0);
570     }
571 }

寫一個MainActivity.java測試漢字轉化為漢語拼音輸出的效果:

 1 package zhangphil.hanyupinyin;
 2 
 3 import java.util.ArrayList;
 4 
 5 import zhangphil.hanyupinyin.HanziToPinyin.Token;
 6 import android.app.Activity;
 7 import android.os.Bundle;
 8 
 9 public class MainActivity extends Activity {
10 
11     @Override
12     protected void onCreate(Bundle savedInstanceState) {
13         super.onCreate(savedInstanceState);
14 
15         String s = "安卓";
16         System.out.println("漢字轉拼音輸出: " + getPinYin(s));
17     }
18 
19     // 輸入漢字返回拼音的通用方法函數。
20     public static String getPinYin(String hanzi) {
21         ArrayList<Token> tokens = HanziToPinyin.getInstance().get(hanzi);
22         StringBuilder sb = new StringBuilder();
23         if (tokens != null && tokens.size() > 0) {
24             for (Token token : tokens) {
25                 if (Token.PINYIN == token.type) {
26                     sb.append(token.target);
27                 } else {
28                     sb.append(token.source);
29                 }
30             }
31         }
32 
33         return sb.toString().toUpperCase();
34     }
35 }

結果輸出如圖:

  1. 上一頁:
  2. 下一頁:
熱門文章
閱讀排行版
Copyright © Android教程網 All Rights Reserved