Tuesday, March 18, 2008

The Ten Commandments of Unicode

1. I am Unicode, thy character set. Thou shalt have no other character sets before me.

2. Thou shalt carefully specify the character encoding and the character set whenever reading a text file.

3. Thou shalt not refer to any 8-bit character set as “ASCII”.

4. Thou shalt ensure that all string handling functions fully support characters from beyond the Basic Multilingual Plane. Thou shalt not refer to Unicode as a two-byte character set.

5. Thou shalt plan for additions of future characters to Unicode.

6. Thou shalt count and index Unicode characters, not UTF-16 code points.

7. Thou shalt use UTF-8 as the preferred encoding wherever possible.

8. Thou shalt generate all text in Normalization Form C whenever possible.

9. Thou shalt avoid deprecated characters.

10. Thou shalt steer clear of the private use are

-Abhiz

No comments: