Latest news 2021-06-05: New blog post: Dickimaw Books Site Account.

6.2 Using LaTeX to Sort and Collate Indexes or Glossaries (datagidx package)

§6.1. Using an External Indexing Application described how to create an index or glossaries using an external indexing application. Some users stumble when it comes to invoking the indexing application. There is an alternative where TeX does the sorting and collating. This by-passes the need to use makeindex, xindy or makeglossaries, but it's less efficient and takes longer to build your document. This section describes how to do this using the datagidx package. This package comes with my datatool bundle (at least version 2.13). The documentation for datagidx is included in the datatool user manual [17].

The datatool package allows you to define databases that you can access in your document. The datagidx package has a special interface to this facility that allows you to define databases for the purposes of indexing. These databases and their definitions must be defined in the preamble. In this section, the term “indexing” will be used to refer to either indexes or glossaries, as the same mechanism is used for both tasks.

A new indexing database is defined using:


where <label> is a label that uniquely identifies this database and <title> is the title to be used when the index (or glossary) is displayed. For example:


creates a new database labelled index. When the index is displayed, it will have the section heading “Index”.

As in §6.1. Using an External Indexing Application, each term in the index (or glossary) database has an associated location list. This list is initially null. The locations are added to terms used in the document on the second LaTeX run. When you display the index, only those entries with a non-null location list or a cross-reference will be shown. The default location is the page number on which the entry was referenced. The datagidx package knows about the following page numbering styles: arabic, roman, Roman, alph and Alph. If your document has another type of numbering style, or if you want to use a different counter for the location, consult the datagidx section of the datatool manual [17].

Once you have defined the indexing database, you can now define terms associated with that database using


where <name> is the term and <options> is a list of <key>=<value> options. The following keys are available:

It's also possible to add your own custom keys. See the datagidx section of the datatool user guide [17] for further details.

As with \newglossaryentry, discussed in §6.1.2. Defining Glossary Entries, if the term starts with an accented letter (or a ligature) the letter must be grouped.



   description={tube connecting throat and stomach}


There is a shortcut command for defining acronyms:


where <short> is the abbreviation and <long> is the long form. The optional argument <options> is the same as for \newterm. This is equivalent to:



formats the full version of the acronym. This defaults to: <long> (<short>), and


is the font used to format acronyms. By default this just displays its argument, but can be redefined if you want the acronyms formatted in a particular style or font (such as small-caps). The other commands used above are:


This is defined by the textcase package and converts <text> to uppercase.


This is defined by the mfirstuc package and capitalises the first letter of each word in <text>.


\newacro{svm}{support vector machine}

Once you have defined the terms in the preamble, you can later use them in the document:





These are similar to those described in §6.1.2. Displaying Terms in the Document, but they have a different syntax. Here <format> is the name of a text-block commands (such as \textbf) without the initial backslash that should be used to format the location for this reference. This is analogous to the | special character described in §6.1.1. Setting the Location Format.

There are also commands associated with acronyms:





Unlike the glossaries package, described in §6.1.2. Creating Glossaries, Lists of Symbols or Acronyms (glossaries package), there is a difference between datagidx's \gls and \acr. Here \gls will always display the value of the text field, whereas \acr will display the full form on first use (the text field) and the abbreviation on subsequent use (the short field).

You can also add terms to the index without creating any link text:


This adds the term uniquely identified by <label>.

\glsaddall{<database name>}

This adds all the terms defined in the database uniquely identified by <database name>.


Unlike most commands, the optional part of the above commands occurs inside the mandatory argument.


Given the elite and oesophagus examples defined earlier, I can reference those entries in the text as follows:

\Gls{elite} and \glspl{oesophagus}.

This produces:

Elite and oesophagi

Elsewhere, I might have the main topic about œsophagi:

The \gls{[textbf]oesophagus} connects the throat and the stomach.

This produces:

The oesophagus connects the throat and the stomach.

and the associated location will be typeset in bold.

Here's an example using the svm example defined earlier:

First use: \acr{svm}\@. Subsequent use: \acr{svm}\@. Full form: \gls{svm}.

This produces:

Image showing typeset output click here for a more detailed description.

You can unset and reset acronyms using




To display the index or glossary or list of acronyms use:


where <options> is a comma-separated <key>=<value> list. Common options are:

For a full list of options see the datagidx section of the datatool user guide [17].

Listing 20 can now be rewritten as follows:

Listing 21:

% arara: pdflatex: { synctex: on }
% arara: biber
% arara: pdflatex: { synctex: on }
% arara: pdflatex: { synctex: on }




   description={a rectangular table of elements},% brief description
   plural={matrices}% the plural
 {matrix}% the name


\newacro{svm}{support vector machine}


  label={not:set},% label
  description={A set},%



 {square root}




% later in the document:

\Glspl{matrix} are usually denoted by a bold capital letter, such as $\mathbf{A}$. The \gls{matrix}'s $(i,j)$th element is usually denoted $a_{ij}$. \Gls{matrix} $\mathbf{I}$ is the identity \gls{matrix}.

First use: \acr{svm}\@. Next use: \acr{svm}\@. Full: \gls{svm}\@.

A \gls{not:set} is a collection of objects.


Some sample code is shown in Listing~\ref{lst:sample}. This uses the function \gls{fn.sqrt}.\glsadd{sqrt}


A \emph{\gls{[textbf]tautology}} is a proposition that is always true for any value of its variables.

A \emph{\gls{[textbf]contradiction}} is a proposition that is always false for any value of its variables.

% At the end of the document:




Note that there is now no need to call either makeindex or makeglossaries. The only external application being called is biber for the bibliography.

This book is also available as A4 PDF or 12.8cm x 9.6cm PDF or paperback (ISBN 978-1-909440-02-9).

© 2013 Dickimaw Books. "Dickimaw", "Dickimaw Books" and the Dickimaw parrot logo are trademarks. The Dickimaw parrot was painted by Magdalene Pritchett.

Terms of Use Privacy Policy Cookies Site Map FAQs