Localisation with tracklang.tex

TeX provides the means to set up multiple hyphenation rules, so you can select the hyphenation rule for a particular language with the primitive \language〈number where 〈number〉 is a number that identifies the hyphenation rule. That’s pretty much the limit of localisation support, but TeX itself doesn’t really provide any commands that may vary according to language or region. The TeXbook contains a definition of \today that typesets the current date in US format as an example, but TeX doesn’t come with \today predefined.

The LaTeX kernel also doesn’t define \today.¹ This command is actually defined by LaTeX classes, such as article.cls or report.cls. However, the LaTeX user manual states that \today produces the current date in the format July 29, 1985 so many classes define it in this way (unless they are designed for a specific locale).

Similarly, commands that produce fixed text, such as \chaptername and \seename, are not provided by the kernel but are defined by classes or packages that need them. For example, \chaptername is defined in report.cls and book.cls (which define \chapter) but not in article.cls (which doesn’t define \chapter). None of those classes define \seename but the makeidx package does.

With modern distributions, \languagename is defined before the class file is loaded. A simple test document:

\show\languagename
\documentclass{article}
\begin{document}
\end{document}

This will show the definition of \languagename in the transcript. However, there’s no mention of that command in the documented code of the LaTeX kernel (texdoc source2e).

So the only localisation feature of the TeX core is the ability to switch hyphenation rules. It’s possible that this may be addressed with LaTeX3. (There is some reference to localisation in the “Case-Changing” section of the “LaTeX3 Interfaces” document.)

It might be useful here to view development in a historical context:

November 1967
ISO/R 639:1967 Symbols for languages, countries and authorities was published (now withdrawn).
1978
TeX was first released.
1985
LaTeX was first released.
October 1986
ISO 8879:1986 Information processing — Text and office systems — Standard Generalized Markup Language (SGML) was published.
March 1988
ISO 639:1988 Code for the representation of names of languages was published (now withdrawn).
1989
The World Wide Web (WWW) was invented. (Berners-Lee released his WWW software in 1991.)
1993
HTML was first released.
1994
LaTeX2e was released.
March 1995
RFC 1766: Tags for the Identification of Languages was published. (Became obsolete with the publication of RFC 3066 in January 2002.)
January 1997
RFC 2070: Internationalization of the Hypertext Markup Language was published. (Became obsolete with the publication of RFC 2854.)
July 2002
ISO 639-1:2002 Codes for the representation of names of languages — Part 1: Alpha-2 code was published.

As can be seen from this timeline, TeX and LaTeX were first developed while the standards for identifying languages and other localisation information were still being formed.

The original LaTeX2.09 user manual (1985) doesn’t give much help:

See the Local Guide to find out if any foreign language versions of LaTeX are available for your computer.

The updated LaTeX2e user manual (1994) suggests the use of the babel package and directs the reader to The LaTeX Companion. Following that reference, The LaTeX Companion (1994) indicates a very simple user interface with two basic commands: \selectlanguage{language} and \iflanguage{language}{true}{false} (these days use the iflang package to test the active language). The LaTeX Companion states:

Any language that you use in your document should be declared on the \usepackage command as a language option.¹

The footnote reads:

¹In principle, since the language(s) in which a document is written is a global characteristic of the document in question, it makes good sense to declare it on the \documentclass command.

This is a really important point and one that’s not just pertinent to TeX or LaTeX. If you inspect the HTML source code for this webpage, you’ll find:

<html lang="en-GB">

which globally declares the primary language for the page.

Unfortunately the evolution of the LaTeX language packages have moved away from this key point. There’s no core framework to globally register the document languages. Why is this a problem? Surely babel and polyglossia etc deal with all the localisation support? Well, actually, they don’t. They only provide translations for common elements, such as \chaptername. There are now thousands of packages on the Comprehensive TeX Archive Network (CTAN) and many of them provide commands that produce fixed text or data in a format that varies according to language or region.

Suppose I want to write a package that typesets invoices. This may have fixed text, such as “Description” or “Price”. It may need to display the currency sign and format the decimal part. The package therefore needs to know the document language in order to provide the relevant translations. However, there’s no standard mechanism for querying this information.

The simplest method from the package writer’s point of view is to get the document author to specify the particular language, and assume the document only has a single language. For example:

\usepackage[british]{myinvoice}

This can get rather frustrating for the document author if they require multiple packages that provide localisation support.

\usepackage[british]{foo}
\usepackage[UKenglish]{bar}
\usepackage[enGB]{baz}
\usepackage[en-GB]{wibble}
\usepackage[englishUK]{whatever}

Note that in the above, not only does each package require the localisation information in the option list but also each package has a different labelling system used to identify a particular locale.

Why can’t the package just test if \captions〈language has been defined? Some do, but the code ends up quite complicated and it doesn’t warn the user about unsupported languages. For example, suppose I have a package that supports just English and French, then I would need the following tests:

That’s just for two languages and it’s already complicated. What if babel introduces new dialect labels (e.g. southafrican or belgique)? What if the document is using an unsupported language? For example, if the document has loaded babel with french and ngerman then my package will provide the French support but will silently ignore the German selection. The lack of warning may confuse the document author.

It would be really useful to have a list of all the document languages. While some language packages do provide such a list, it’s an undocumented internal command and, as such, can’t be relied upon.

For example, if the translator package has been loaded, then \trans@languages expands to a comma-separated list of languages (using translator’s labelling scheme). New versions of polyglossia now store the language list in \xpg@loaded.

Recent versions of babel define \bbl@loaded, but this only contains a list of languages that are identified in the package options. For example, with:

\usepackage[british,naustrian]{babel}

then \bbl@loaded is defined as naustrian,british but with:

\usepackage[nil]{babel}
\babelprovide[import]{british}
\babelprovide[import,main]{austrian}

then \bbl@loaded is simply defined as nil. Without a list, the only way of determining which languages have been loaded is to iterate over all known language labels and test if \captions〈label has been defined. This is now a very long list that’s expanding over time.

What if the document isn’t using babel? Perhaps it’s using polyglossia instead. For example:

\usepackage{polyglossia}
\setmainlanguage[variant=uk]{english}

This defines \captionsenglish and \xpg@loaded expands to english, but there’s no clue about the region. For my example invoice package, I need to know the region in order to set the currency symbol. [Update: as from v1.47, polyglossia provides \xpg@bcp@loaded which expands to a comma-separated list of BCP-47 tags.]

The lack of a standardised way of conveniently identifying which languages have been loaded is a source of frustration for package writers who are trying to provide localisation support. This is the reason why I wrote the tracklang package.

The main bulk of the tracklang code is in tracklang.tex, which is generic TeX so it can be used with other TeX formats. The tracklang.sty file is a LaTeX package that internally inputs tracklang.tex, but it also provides package options (which can also be passed through the document class options) to conveniently track predefined dialect labels. This means that if the document author does:

\documentclass[british,naustrian]{article}
\usepackage{babel}
\usepackage{mypackage}% internally loads tracklang.sty

then tracklang.sty can just pick up the document class options without having to perform cumbersome tests. If the author does:

\documentclass{article}
\usepackage[british,naustrian]{babel}
\usepackage{mypackage}% internally loads tracklang.sty

then things are a little harder for tracklang.sty but if the version of babel is new enough to provide \bbl@loaded then it’s not too hard as tracklang.sty just needs to iterate over the provided list. If however the author does:

\documentclass{article}
\usepackage[nil]{babel}
\babelprovide[import]{british}
\babelprovide[import,main]{austrian}
\usepackage{mypackage}% internally loads tracklang.sty

then tracklang.sty can’t detect the actual document languages. It only picks up nil from \bbl@loaded, which tracklang considers a dialect of the undetermined language with ISO 639-2 code “und”. As far as I can tell (at the time of writing this), babel doesn’t add the language labels to any internal list when they are specified with \babelprovide.

One possibility is for tracklang to test if \bbl@loaded is nil and, if so, iterate through all known labels and test if \captions〈label is defined. The tracklang.tex file currently defines around 200 root language labels and around 100 dialect labels. That’s a lot of labels to iterate over and this doesn’t take into account any new babel dialects that might be added in future. It also doesn’t take into account the possibility that the document author might do:

\documentclass{article}
\usepackage[british]{babel}
\babelprovide[import,main]{austrian}
\usepackage{mypackage}% internally loads tracklang.sty

If tracklang.sty has to always iterate over 300 labels on the off-chance that this situation has occurred then it will result in a slower document compilation time even if the document author hasn’t used \babelprovide. The document author will complain to the package author that their package is slow to load, and the package author will complain to me that tracklang is slow to load, and we’ll all end up grumpy and frustrated with the situation. Currently tracklang.sty will only resort to this method if it detects that babel has been loaded but \bbl@load isn’t defined or if polyglossia has been loaded but \xpg@loaded hasn’t been defined.

If the document author really wants to use \babelprovide then they’ll need to pass the relevant options to tracklang.sty (or use the tracking commands provided in tracklang.tex). This can be done in the document class options, as in the earlier example, but this will also pass those options to babel, which is presumably not what the document author wants (otherwise they would just pass the options to babel). Another possibility is to load tracklang before babel:

\documentclass{article}
\usepackage[british,austrian]{tracklang}
\usepackage[british]{babel}
\babelprovide[import,main]{austrian}
\usepackage{mypackage}% internally loads tracklang.sty

The \RequirePackage{tracklang} line in mypackage.sty will now do nothing since tracklang.sty has already been loaded. We’re now back to the situation where the document author has to specify the required localisation multiple times.

Ideally it would be best if all language packages used a common framework to globally register the document localisation settings. If you are a language package author and you want to use tracklang for this then the article Integrating tracklang into Language Packages gives an example of how to do this.

If you’re a package author and you need your package to detect the document localisation settings then the article Using tracklang in Packages with Localisation Features gives an example of how to do this.

The final article in this set is Writing a datetime2 Language Module, which provides a practical example.


¹The definition of the command \today is shown in the LaTeX kernel documentation (texdoc source2e) but texdef -t latex -c minimal today shows that it’s not part of the minimal core code.