Basic Unicode encoding concepts (technical)

...

Unicode encodes characters, such as ‘c’ and ‘ç’: these may be represented by one or more code points. For instance, ‘ç’ can be encoded using the single U+00E7 code point (‘U+’ indicating that what follows immediately is a Unicode hexadecimal scalar value); but also as a combination of ‘c’ character and the combining cedilla character ‘◌̧’ (U+0063.0327, being the combination of hexadecimal 0063 and hexadecimal 0327, with the dot indicating concatenation of the two values).

[TUS Ch. 2.1, 2.2, 2.4]

Encoding forms, bits, bytes, encoding schemes, byte order mark (BOM)

In the Unicode character encoding model, precisely defined encoding forms specify how each integer (code point) for a Unicode character is to be expressed as a sequence of one or more code units. The Unicode Standard provides three distinct encoding forms for Unicode characters, using 8-bit, 16-bit, and 32-bit units. These are named UTF-8, UTF-16, and UTF-32, respectively. They are all equally valid.

...

Unicode encodes characters, such as ‘c’ and ‘ç’: these may be represented by one or more code points. For instance, ‘ç’ can be encoded using the single U+00E7 code point (‘U+’ indicating that what follows immediately is a Unicode hexadecimal scalar value); but also as a combination of ‘c’ character and the combining cedilla character ‘◌̧’ (U+0063.0327, being the combination of hexadecimal 0063 and hexadecimal 0327, with the dot indicating concatenation of the two values).

[TUS Ch. 2.1, 2.2, 2.4]

Encoding forms, bits, bytes, encoding schemes, byte order mark

In the Unicode character encoding model, precisely defined encoding forms specify how each integer (code point) for a Unicode character is to be expressed as a sequence of one or more code units. The Unicode Standard provides three distinct encoding forms for Unicode characters, using 8-bit, 16-bit, and 32-bit units. These are named UTF-8, UTF-16, and UTF-32, respectively. They are all equally valid.

...

Page tree

Versions Compared

Old Version 12

New Version 13

Key

Encoding forms, bits, bytes, encoding schemes, byte order mark (BOM)

Encoding forms, bits, bytes, encoding schemes, byte order mark