Page tree

Version 1.0.1, 11 October 2016

Version history:

  • 1.0, 2 September 2016

  • 1.0.1, 11 October 2016

The Uyghur language

Uyghur is a Turkic language which is spoken in central and eastern Asia, notably China. The written language goes back to at least the 6th century CE.

Uyghur scripts

A whole slew of scripts was used at some time or another to record Uyghur, and in modern times various orthographies were developed. Some of them are:

  • Arabic script, old orthography: ‘Chaghatai alphabet’ (Uygh. kona yëziq or ‘old script’)

  • Arabic script, modern orthography (from ca. 1920)

  • Cyrillic script

  • Latin script, Chinese Pinyin-inspired

  • Latin script, modern orthography

Uyghur in the Arabic script (modern orthography)

This is the script which is used by most Uyghurs living in China today.

Encoding considerations

The Arabic blocks in the Unicode Standard can be confusing due to the considerable number of characters that look identical to some others. Some points to consider:

  • The Uyghur consonant /h/ must be encoded as U+06BE ARABIC LETTER HEH DOACHASHMEE (ھ). See Roozbeh Pournader, ''The Right Hehs for Arabic script orthographies of Sorani Kurdish and Uighur''.

  • The Uyghur vowel /ɛ/ must be encoded as U+06D5 ARABIC LETTER AE (ە). See Roozbeh Pournader, ''The Right Hehs for Arabic script orthographies of Sorani Kurdish and Uighur''.

  • Three yāʾ-shaped (ى) characters are used in Uyghur:

    • a yāʾ with two dots arranged horizontally below it, U+064A (ي), representing the Uyghur consonant /j/ (‘j’ in IPA = ‘y’ as in Eng. ‘yes’);

    • a yāʾ without dots, U+0649 (ى), representing the Uyghur vowel /i/;

    • a yāʾ with two dots arranged vertically below it, U+06D0 (ې), representing the Uyghur vowel /e/.

  • The dotless ى and the vertically-dotted ې are usually preceded by hamza (placed on a ‘chair’) when they occur initially, in final position, or at a syllable boundary. This ‘hamza-on-a-chair’ (ئ) must be encoded as U+0626 ARABIC LETTER YEH WITH HAMZA ABOVE. See Jonathan Kew, Notes on some Unicode Arabic characters: recommendations for usage, April 21, 2005.

  • Never use Persian ی (U+06CC) for the dotless Uyghur ى.

  • Three kāf-shaped (ك) characters are used in Uyghur:

    • Uyghur /k/ is represented by Arabic U+0643 (ك), not by the Persian letter U+06A9 (ک), although its Uyghur glyph shape more closely resembles the ‘Persian kāf’ than the ‘Arabic kāf’;

    • Uyghur /g/ is represented by Persian (and Urdu, etc.) U+06AF (گ);

    • Uyghur /ŋ/ is represented by U+06AD (ڭ), an Arabic kāf shape carrying three dots on top.

Script styles

TBD

Page layout considerations

Hyphenation TBD

Fonts and type sizes

  • main text, block quotes: font size TBD

  • appendices, bibliographies: font size TBD

  • footnote text, indexes: font size TBD