An adequate representation of information in non-Latin languages has been one of the strengths of Brill in the “paper period”. The use of online platforms to access our publications still increases. Using PDF as a format for onscreen presentation ensures the same quality that our customers are used to, based on the print. Besides PDF most of the publications are also available in a full text HTML-format. A correct display of non-Latin scripts in HTML format can only be ensured by “pushing” the necessary font(s) together with the webpage. This starts with a proper coding of language and script information in the source xml, as explained hereafter.
Both the JATS- and BITS-standard support an attribute @xml:lang. Some important sections from the description of this attribute in the JATS Tag Library:
The @xml:lang can be assigned to most elements of the JATS and BITS standard. A listing of the elements is available in the JATS and BITS documentation. Thus, if the contents of the element is completely in a particular language, the attribute can be assigned to the element. For example:
Often in Brill publications a few words in a non-Latin script are used in an English- or other language sentence. In this case, the language and script information should only be assigned to these few words and not to the complete element. For this situation, the element <styled-content> should be used(See https://jats.nlm.nih.gov/publishing/tag-library/1.2/element/styled-content.html) . The @xml:lang can be assigned to the <styled-content> element, for example:
… instead of <styled-content xml:lang="mid-Mand">ࡈࡀࡁࡅࡕࡀ</styled-content> ṭabuta …
For a word in Mandaic in a sentence. Note that the Mandaic characters are coded using their Unicode values.… themselves “<styled-content xml:lang="txg-Tang">𘜶𗴲𗂧</styled-content>” [tha dźjwij lhji.j] — the “State of Great Xia”…
For a word in Tangut. Here also, the characters are coded using their Unicode values to avoid squares if a font to display the values in the xml-editor is not available.… sage (<styled-content xml:lang="he-Hebr">למשכיל</styled-content>) to instruct …
For a word in Hebrew.The following table gives an overview of the non-Latin scripts of which we know that they are used in Brill publications. For most of these we have web fonts available, including the instruction for size in relation to the Brill typeface. Please note that because Brill uses language-script tags exclusively to trigger web fonts at this time, language tags may be artificial, as in the case of Aramaic text written in the ‘Hebrew’ square script: in order to simplify the tagging, such text is tagged as ‘he-Hebr’, even though the language tag ‘he’ does not apply to Aramaic, which was and is a language distinct from Hebrew.
no. | Script name | Language code(s) | Script code |
---|---|---|---|
001 | Latin | (many) | Latn |
002 | Greek | el | Grek |
003 | Cyrillic | (many) | Cyrl |
004 | Old Slavic | cu | Cyrs |
005 | Hebrew | he | Hebr |
006 | Paleo-Hebrew | hbo | Phnx |
007 | Aramaic (biblical) | he (there is no ‘general’ Aramaic language tag) | Hebr |
008 | Aramaic (imperial) | arc | Armi |
009 | Syriac Estrangelo | syr | Syre |
010 | Syriac Serto | syr | Syrj |
011 | Arabic | ar | Arab |
012 | Armenian | hy | Armn |
013 | Coptic | cop | Copt |
014 | Gəʿəz (Ethiopian) | gez | Ethi |
015 | Georgian (Mkhedruli, Mtavruli, Khutsuri) | ka | Geor |
016 | Gothic | got | Goth |
017 | Samaritan | smp | Samr |
018 | Syriac, East | syr | Syrn |
019 | Glagolitic | (for future use; no language tag yet) | Glag |
020 | Old Turkic | otk | Orkh |
021 | Devanagari | sa | Deva |
022 | Tibetan | bo | Tibt |
023 | Chinese (simplified) | (for future use; no web font in use yet) | Hans |
024 | Chinese (traditional) | (for future use; no web font in use yet) | Hant |
025 | Japanese | ja (for future use; no web font in use yet) | Jpan |
026 | Lisu | lis | Lisu |
027 | Cypriot syllabary | grc | Cprt |
028 | Georgian (Asomtavruli) | ka | Geok |
029 | Logic and mathematics | und | Zmth |
030 | Tangut | txg | Tang |
031 | Mandaic | mid | Mand |
032 | Sindhi | sd | Arab |
033 | Egyptian hieroglyphs | egy | Egyp |
034 | Manichaean | xmn | Mani |
035 | Avestan | ae | Avst |
036 | Epichoric Greek (archaic local Greek scripts) | grc | Grek-epi |
037 | Linear B (Mycenaean Greek) | gmy | Linb |
038 | Uyghur in Arabic script | ug | Arab |
039 | Persian | fa | Arab |
At brill.com the xml:lang attribute and <styled-content> element is converted to the corresponding html encoding. Furthermore, the CSS includes the declarations of web fonts used. Web font packages for the scripts listed in the table above are available and will thus ensure a proper display of the characters in html, similar to the pdf-format.