Encoding Forms of the Universal Character Set
There are several character encoding forms defined by ISO 10646 for the Universal Character Set. The simplest is UCS-2, which uses a single code value between 0 and 65535 for each character, and allowing that value to be represented as exactly two bytes (one 16-bit word). UCS-2 thereby permits a binary representation of every code point in the BMP, as long as the code point represents a character. Code points outside the BMP can be represented by pairs of special characters from what is called the S (Special) Zone of the BMP, each pair consisting of what is called an RC-element from the high-half zone and an RC-element from the low-half zone.
In Unicode terminology these characters are called high surrogates and low surrogates respectively and UTF-16 is the Unicode terminology for UCS-2.
Another encoding is UCS-4, which uses a single code value between 0 and, theoretically, hexadecimal FFFFFFFF for each character (although the UCS stops at 10FFFF and ISO/IEC 10646 has stated that all future assignments of characters will also be in that range), and allowing that value to be represented as exactly four bytes (one 32-bit word). UCS-4 thereby permits a binary representation of every code point in the UCS, including those outside the BMP. Like UCS-2, every encoded character has a fixed length in bytes, which makes it simple to manipulate, but of course it requires twice as much storage as UCS-2. ISO/IEC 10646
Occasionally, articles about Unicode will mistakenly refer to UCS-2 as "UCS-16". There is no UCS-16; the authors who make this error usually intended to refer to UCS-2 or UTF-16.
Citing the Universal Character Set
ISO 10646 is a general, informal citation for the ISO/IEC 10646 family of standards, and is acceptable in most prose. And even though it is a separate standard, the term Unicode is used just as often, informally, when discussing the UCS. However, any normative references to the UCS as a publication should cite a particular part and version, using the form ISO/IEC 10646-{part}:{year}; for example: ISO/IEC 10646-1:1993.
Correlation to Unicode
- ISO/IEC 10646-1:1993 ≈ Unicode 1.1
- ISO/IEC 10646-1:2000 ≈ Unicode 3.0
- ISO/IEC 10646-2:2001 ≈ Unicode 3.2
- ISO/IEC 10646-3:2003 ≈ Unicode 4.0
External link
Related ISO
Related ISO standards from the List of ISO standards are: ISO 2022, ISO 6429, ISO 14651
See also Unicode, UTF-16, UTF-8,