Other Stuff (locale, collation...)

Locale

A locale (language and cultural data for Cambodia) has been proposed to the common locale repository mantained by the Unicode Consortium. It is composed of two files that can be found here: km.xml and km_KH.xml

The latest version can be found on the server of the CLDR (Common Locale Data Repository).

Locale definition for GLIBC

After you download the source file, please follow the instruction as below. You must be root user to do this!

This instruction is valid for REDHAT 9.0 and Suse 9.3, other distributions may use other directories!

cp km_KH.UTF-8 /usr/share/i18n/locales/

cp /usr/share/i18n/charmaps/UTF-8.gz /tmp

gzip -d /tmp/UTF-8.gz

localedef -i /usr/share/i18n/locales/km_KH.UTF-8 -f /tmp/UTF-8 /usr/lib/locale/km_KH

rm -f /tmp/UTF-8

That is all. The Khmer Unicode locale file will be installed in the directory /usr/lib/locale/km_KH.

You can use the command: locale -a to see the available locale in your system.

Collation (sorting)

We have developed a couple of collation sequences for Khmer. One of them based on the traditional Chuon Nat dictionary, and another based on the more modern Headly Khmer-English dictionary.
The source for a C program implementing them can be found here. This programs sort data contained in files in UTF-8 format, producing another file (sorted) also in UTF-8 format.
The collation sequence corresponding to the Chuon Nat ordering, in XML format, has been submitted to the Unicode Consortium.

Automatic insertion of ZWSP (word separation)

Jens Herden has developed a first version of a word breaking program for Khmer Unicode. The program goes through a Khmer Unicode text in UTF-8 format and inserts ZWSP characters between the words. It separates words using an internal dictionary (based on the Chuon Nat dictionary).

The program - which you can download here - is in java, so you need to have the Java Runtime Environment installed in your computer. It runs on any platform that has java installed.

It can handle UTF-8 format files, even if these files are in HTML/XML. It can also deal with simple RTF files. (MS Word can save and read documents as RTF files).

To learn how to use it, just type:

java -jar khwrdbrk.jar -o readme.txt -r

in a console (Linux) or the command prompt in WIndows. Make sure that you are in the folder where the *.jar is located. This will generate a file called readme.txt that contains the instructions.

A quicker way is to type:

java -jar khwrdbrk.jar -h

in order to get a short help.

If you can test it and find any bugs or have any wishes, please write to Jens, so that he can improve it further.

Comments

Ambien Brim passe Inderal la

Ambien Brim passe Inderal la long term Across photography Ativan thymus rebound Viagra Super Active Aspirant heterology

You can use the command:

You can use the command: locale -a to see the available locale in your system.
Online Certificate

Informative and interesting

Informative and interesting which we share with you so i think so it is very useful and knowledgeable.  I would like to thank you for the efforts.   I am tiring the same best work from me in the future as well.
 
1Y0-A16 | SK0-003 | 70-401 | 70-513 | VCP-410 | 350-001 | 640-802 | 70-680 | VCP-410 | 350-001 | 640-802 | 70-680 |

I appreciate it

I just wanted to say thanks. I'll apply these to my top 20 songs list site next week.

ruhjohn's picture

great job

thanks for sharing this information..I am looking for this information for a couple of days, thanks african mango diet pills

5 star for you

Great job here,  you end my searching. thanks for this information,  It helps me so much.

ankle brace
malleoloc ankle brace
ankle brace malleoloc ankle brace

hai

I admire the valuable information you offer in your articles. I will bookmark your blog and have my children check up here often. I am quite sure they will learn lots of new stuff here than anybody else!
javahostindo web hosting indonesia | Newport Beach Houses

Absolutely thankful for your

Absolutely thankful for your post.Good thing it's maintain by the "Unicode Consortium".   puppies for sale

Syndicate content