Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: minor edit

...

Unicode encodes characters, such as ‘c’ and ‘ç’: these may be represented by one or more code points. For instance, ‘ç’ can be encoded using the single U+00E7 code point (‘U+’ indicating that what follows immediately is a Unicode hexadecimal scalar value); but also as a combination of ‘c’ character and the combining cedilla character ‘◌̧’ (U+0063.0327, being the combination of hexadecimal 0063 and hexadecimal 0327, with the dot indicating concatenation of the two values).

[TUS Ch. 2.1, 2.2, 2.4]

Encoding forms, bits, bytes, encoding schemes, byte order mark (BOM)

In the Unicode character encoding model, precisely defined encoding forms specify how each integer (code point) for a Unicode character is to be expressed as a sequence of one or more code units. The Unicode Standard provides three distinct encoding forms for Unicode characters, using 8-bit, 16-bit, and 32-bit units. These are named UTF-8, UTF-16, and UTF-32, respectively. They are all equally valid.

...

Unicode encodes characters, such as ‘c’ and ‘ç’: these may be represented by one or more code points. For instance, ‘ç’ can be encoded using the single U+00E7 code point (‘U+’ indicating that what follows immediately is a Unicode hexadecimal scalar value); but also as a combination of ‘c’ character and the combining cedilla character ‘◌̧’ (U+0063.0327, being the combination of hexadecimal 0063 and hexadecimal 0327, with the dot indicating concatenation of the two values).

[TUS Ch. 2.1, 2.2, 2.4]

Encoding forms, bits, bytes, encoding schemes, byte order mark

In the Unicode character encoding model, precisely defined encoding forms specify how each integer (code point) for a Unicode character is to be expressed as a sequence of one or more code units. The Unicode Standard provides three distinct encoding forms for Unicode characters, using 8-bit, 16-bit, and 32-bit units. These are named UTF-8, UTF-16, and UTF-32, respectively. They are all equally valid.

...