Initially my blog was focused on technical IT subject. Overtime I have wrote
less about that and more on general topics like travel. Today I will return to a
technical topic about the Jyutping(粵拼) Cantonese romanization system.
I start off with the quest to learn and master a Chinese input method. Years
ago I started with the Cangjie (倉頡) system. I never get much beyond basics
because it is a difficult system to learn, let alone to master. Then I looked at
pronunciation based system. I am glad to find out Cantonese based system is
readily available. Out of the multiple romanization system, people seems to have
gravitated toward the Jyutping system by the Linguistic Society of Hong Kong.
So the next step is to get familiar with the Jyutping system, which is not
trivial for me because I am weak in phonetics. It will be very useful if there
is a service to annotate a piece of Chinese text with the pronunciation under
each character. Unfortunately I can find no such software besides some
dictionary that does it character by character. Instead I have decided to write
one myself, as a naïve translation should not be difficult to write.
Now all I need is a table of all Chinese characters and its Jyutping, i.e.
the Jyutping specification. I have spent days searching the internet and come
out empty handed. Linguistic Society of Hong Kong themselves provides little
more than a general description. It is a shame even some links to its
The good news is I have finally found it from the
Database, a place I have crossed many time but have not realized they have
compiled the most comprehensive data on Chinese characters, including Jyutping
and even Cangjie code. With the database here I am
ready for business!
(2010-02-17 Thanks Helena for the heads up. The Unihan database format has since changed. The new download link is Unihan.zip. Some general description such as Unicode NamesList File Format are also available.