Localisation with tracklang.tex

TeX provides the means to set up multiple hyphenation rules, so you can select the hyphenation rule for a particular language with the primitive \language〈number〉 where 〈number〉 is a number that identifies the hyphenation rule. That’s pretty much the limit of localisation support, but TeX itself doesn’t really provide any commands that may vary according to language or region. The TeXbook contains a definition of \today that typesets the current date in US format as an example, but TeX doesn’t come with \today predefined.

The LaTeX kernel also doesn’t define \today.¹ This command is actually defined by LaTeX classes, such as article.cls or report.cls. However, the LaTeX user manual states that \today produces the current date in the format July 29, 1985 so many classes define it in this way (unless they are designed for a specific locale).

Similarly, commands that produce fixed text, such as \chaptername and \seename, are not provided by the kernel but are defined by classes or packages that need them. For example, \chaptername is defined in report.cls and book.cls (which define \chapter) but not in article.cls (which doesn’t define \chapter). None of those classes define \seename but the makeidx package does.

With modern distributions, \languagename is defined before the class file is loaded. A simple test document:

\show\languagename
\documentclass{article}
\begin{document}
\end{document}

This will show the definition of \languagename in the transcript. However, there’s no mention of that command in the documented code of the LaTeX kernel (texdoc source2e).

So the only localisation feature of the TeX core is the ability to switch hyphenation rules. It’s possible that this may be addressed with LaTeX3. (There is some reference to localisation in the “Case-Changing” section of the “LaTeX3 Interfaces” document.)

It might be useful here to view development in a historical context:

November 1967: ISO/R 639:1967 Symbols for languages, countries and authorities was published (now withdrawn).
1978: TeX was first released.
1985: LaTeX was first released.
October 1986: ISO 8879:1986 Information processing — Text and office systems — Standard Generalized Markup Language (SGML) was published.
March 1988: ISO 639:1988 Code for the representation of names of languages was published (now withdrawn).
1989: The World Wide Web (WWW) was invented. (Berners-Lee released his WWW software in 1991.)
1993: HTML was first released.
1994: LaTeX2e was released.
March 1995: RFC 1766: Tags for the Identification of Languages was published. (Became obsolete with the publication of RFC 3066 in January 2002.)
January 1997: RFC 2070: Internationalization of the Hypertext Markup Language was published. (Became obsolete with the publication of RFC 2854.)
July 2002: ISO 639-1:2002 Codes for the representation of names of languages — Part 1: Alpha-2 code was published.

As can be seen from this timeline, TeX and LaTeX were first developed while the standards for identifying languages and other localisation information were still being formed.

The original LaTeX2.09 user manual (1985) doesn’t give much help:

See the Local Guide to find out if any foreign language versions of LaTeX are available for your computer.

The updated LaTeX2e user manual (1994) suggests the use of the babel package and directs the reader to The LaTeX Companion. Following that reference, The LaTeX Companion (1994) indicates a very simple user interface with two basic commands: \selectlanguage{language} and \iflanguage{language}{true}{false} (these days use the iflang package to test the active language). The LaTeX Companion states:

Any language that you use in your document should be declared on the \usepackage command as a language option.¹

The footnote reads:

¹In principle, since the language(s) in which a document is written is a global characteristic of the document in question, it makes good sense to declare it on the \documentclass command.

This is a really important point and one that’s not just pertinent to TeX or LaTeX. If you inspect the HTML source code for this webpage, you’ll find:

<html lang="en-GB">

which globally declares the primary language for the page.

Unfortunately the evolution of the LaTeX language packages have moved away from this key point. There’s no core framework to globally register the document languages. Why is this a problem? Surely babel and polyglossia etc deal with all the localisation support? Well, actually, they don’t. They mostly just provide translations for common elements, such as \chaptername, and the date format for \today. There are now thousands of packages on the Comprehensive TeX Archive Network (CTAN) and many of them provide commands that produce fixed text or data in a format that varies according to language or region.

Suppose I want to write a package that typesets invoices. This may have fixed text, such as “Description” or “Price”. It may need to display the currency sign and format the decimal part. The package therefore needs to know the document language in order to provide the relevant translations. However, there’s no standard mechanism for querying this information.

The simplest method from the package writer’s point of view is to get the document author to specify the particular language, and assume the document only has a single language. For example:

\usepackage[british]{myinvoice}

This can get rather frustrating for the document author if they require multiple packages that provide localisation support.

\usepackage[british]{foo}
\usepackage[UKenglish]{bar}
\usepackage[enGB]{baz}
\usepackage[en-GB]{wibble}
\usepackage[englishUK]{whatever}

Note that in the above, not only does each package require the localisation information in the option list but also each package has a different labelling system used to identify a particular locale.

Why can’t the package just test if \captions〈language〉 has been defined? Some do, but the code ends up quite complicated and it doesn’t warn the user about unsupported languages. For example, suppose I have a package that supports just English and French, then I would need the following tests:

Test if \captionsamerican is defined.
Test if \captionsaustralian is defined.
Test if \captionsbritish is defined.
Test if \captionscanadian is defined.
Test if \captionsenglish is defined.
Test if \captionsnewzealand is defined.
Test if \captionsUKenglish is defined.
Test if \captionsUSenglish is defined.
Test if \captionsacadian is defined.
Test if \captionscanadien is defined.
Test if \captionsfrancais is defined.
Test if \captionsfrenchb is defined.
Test if \captionsfrench is defined.

That’s just for two languages and it’s already complicated. What if babel introduces new dialect labels (e.g. southafrican or belgique)? What if the document is using an unsupported language? For example, if the document has loaded babel with french and ngerman then my package will provide the French support but will silently ignore the German selection. The lack of warning may confuse the document author.

It would be really useful to have a list of all the document languages. [Update 2025-01-27] Fortunately, since I first wrote tracklang (2014) and this article (2019), both polyglossia and babel now provide convenient commands.

New versions of polyglossia store the language list in \xpg@loaded but, better still, polyglossia now also has \xpg@bcp@loaded which is a list of BCP 47 language tags. In the case of babel, the list of all languages can be iterated over with \LocaleForEach. This can be combined with babel’s \getlocaleproperty to obtain the BCP 47 language tag. If the translator package has been loaded, then \trans@languages expands to a comma-separated list of languages (using translator’s labelling scheme).

The lack of a standardised way of conveniently identifying which languages have been loaded is a source of frustration for package writers who are trying to provide localisation support. This is the reason why I wrote the tracklang package. When I write packages that require localisation support, I now don’t need to worry about which language package has been used by the document author

The main bulk of the tracklang code is in tracklang.tex, which is generic TeX so it can be used with other TeX formats. The tracklang.sty file is a LaTeX package that internally inputs tracklang.tex, but it also provides package options (which can also be passed through the document class options) to conveniently track predefined dialect labels. This means that if the document author does:

\documentclass[british,naustrian]{article}
\usepackage{babel}
\usepackage{mypackage}% internally loads tracklang.sty

then tracklang.sty can just pick up the document class options without having to perform cumbersome tests. If the author does:

\documentclass{article}
\usepackage[british,naustrian]{babel}
\usepackage{mypackage}% internally loads tracklang.sty

\documentclass{article}
\usepackage[nil]{babel}
\babelprovide[import]{british}
\babelprovide[import,main]{austrian}
\usepackage{mypackage}% internally loads tracklang.sty

then, as from tracklang v1.6.4, the languages can be detected with \LocaleForEach. This has simplified things a great deal, but if \LocaleForEach isn’t defined, then tracklang will fallback on its old behaviour, which is to then test if \bbl@loaded or \xpg@bcp@loaded or \xpg@loaded is defined. Note, however, that this requires all languages to be identified before tracklang.sty is loaded. This means that it doesn’t support “just in time” or “lazy loading” in the document. However, lazy loading is typically used for short fragments of foreign text and that context is less likely to require the full feature set for that language.

If you’re a package author and you need your package to detect the document localisation settings then the article Using tracklang in Packages with Localisation Features gives an example of how to do this.

The articles Writing a datetime2 Language Module and Localisation with datatool v3.0+ provide practical examples.

¹The definition of the command \today is shown in the LaTeX kernel documentation (texdoc source2e) but texdef -t latex -c minimal today shows that it’s not part of the minimal core code. You can, however, access the date and time information with primitives, such as \day, or with LaTeX3 commands, such as \c_sys_timestamp_str.