org-glossary/org-glossary.org

23 KiB
Raw Permalink Blame History

Org Glossary

Introduction

Summary

Org Glossary defines a flexible model for working with glossary-like constructs (glossaries, acronyms, indices, etc.) within Org documents, with support for in-buffer highlighting of defined terms and high-quality exports across all ox-* backends.

Quickstart

To define a glossary entry, simply place a top-level heading in the document titled Glossary or Acronyms and therein define terms using an Org definition list, like so:

* Glossary
- Emacs :: A lisp-based generic user-centric text manipulation environment that
  masquerades as a text editor.
- Org mode :: A rich and versatile editing mode for the lovely Org format.

Then simply use the terms as you usually would when writing. On export Org Glossary will automatically:

  • Pick up on the uses of defined terms
  • Generate a Glossary/Acronym section at the end of the document
  • Link uses of terms with their definitions, in a backend-appropriate manner (e.g. hyperlinks in html)
  • Give the expanded version of each acronym in parenthesis when they are first used (e.g. "PICNIC (Problem In Chair, Not In Computer)")

To generate an Index for certain terms, you can almost do the same thing, just use an Index heading and use a plain list of terms, e.g.

* Index
- org-mode

To see how this all works, try exporting the following example with Org Glossary installed:

Try using Org Glossary for all your glosses, acronyms, and more within your
favourite ML with a unicorn mascot. It attempts to provide powerful
functionality, in keeping with the simplicity of the Org ML we all know and
love.

* Glossary
- glosses :: Brief notations, giving the meaning of a word or wording in a text.
* Acronyms
- ML :: Markup Language
* Index
- unicorn

If you'd like to see a visual indication of term uses while in org-mode, call M-x org-glossary-mode.

Design

In large or technical documents, there's often a need for an appendix clarifying terms and listing occurrences; this may take the form of a glossary, index, or something else. Org Glossary abstracts all of these glossary-like forms into tracked generated text replacements. Most common structures fit into this abstraction like so:

  1. Search for definitions of $term
  2. Replace all uses of $term with f($term)
  3. Generate a definition section for all used terms, linking to the uses

Out of the box, four glossary-like structures are configured:

Glossary
The term is transformed to the same text, but linking to the definition.
Acronyms
The first use of the term adds the definition in parentheses, and subsequent uses simply link to the definition (behaving the same as glossary terms).
Index
The term is unchanged (the entire purpose of the index is achieved via step 3. alone).
Text Substitutions
The term is replaced with its definition.

For more details on how this works, see /tec/org-glossary/src/branch/master/Structure%20of%20an%20export%20template%20set.

There is a little special-cased behaviour for indexes (usage detection) and text substitution (fontification), but it is kept to a minimum and ideally will be removed via generalisation in future.

Usage

Defining terms

Placement of definitions

Definitions must be placed under one of the specially named headings listed in org-glossary-headings, by default:

* Glossary
* Acronyms
* Index
* Text Substitutions

If org-glossary-toplevel-only is non-nil, then these headlines must also be level one headings. If it is nil, then they are recognised wherever they occur in the document. Note that when using subtree export and non-nil org-glossary-toplevel-only, only level-1 headings in the widened document will be recognised (i.e. it behaves the same as non-subtree export).

External definition sources

Org Glossary supports searching for term definitions in other =#+include=d files, respecting the various restrictions such as headings and line number ranges. You may also specify include paths providing definitions that should be globally available via org-glossary-global-terms.

If you maintain a set of common term sources you may want to use, instead of #+include=ing them, you can make use of the convenience keyword =#+glossary_sources.

The value of #+glossary_sources is split on spaces and to form a list of locations. Each location is appended to org-glossary-collection-root to form the fully qualified location. These locations are then =#+include=d.

For example, if org-glossary-collection-root is set to a folder where a number of individual definition files are places, one could then conveniently use a few with:

#+glossary_sources: abbrevs physics.org::*Quantum foo bar.org

This would be equivalent to:

#+include: COLLECTION-ROOT/abbrevs.org
#+include: COLLECTION-ROOT/physics.org::*Quantum :only-contents t
#+include: COLLECTION-ROOT/foo.org
#+include: COLLECTION-ROOT/bar.org

You could also set to an individual file with the beginning of a heading specification, say file.org::*. This would allow you to have all the terms defined in one file and include groups by heading.

Not that sources with heading/custom-id searches will automatically have :only-contents t added (as seen in the example). This allows for named headings with glossary subheadings to work when org-glossary-toplevel-only is set.

Basic definitions

Org already has a very natural structure for term-definition associations, description lists. Term definitions are extracted from all non-nested description lists within the glossary heading, other elements are simply ignored.

For example, to define "late pleistocene wolf" you could use a description list entry like so:

- late pleistocene wolf :: an extinct lineage of the grey wolf, thought to be
  the ancestor of the dog

which is an instance of the basic structure,

- TERM :: DEFINITION

Advanced definitions

When giving a simple definition like automaton :: A thing or being regarded as having the power of spontaneous motion or action, Org Glossary will actually make a few assumptions.

  • Your wish to refer to the term automaton with automaton
  • There is also a plural form, guessed by calling org-glossary-plural-function, in this case resulting in automata, and you wish to refer to the plural form with automata.

This is equivalent to the following "full form",

- automaton,automata = automaton,automata :: A thing or being regarded as having
  the power of spontaneous motion or action

which is an instance of the full structure,

- SINGULAR KEY, PLURAL KEY = SINGULAR FORM, PLURAL FORM :: DEFINITION

This may seem overly complicated, but unfortunately irregular plurals and homographs exist. Here are some examples of where this functionality comes into play:

- eveningtime=evening :: The latter part of the day, and early night.
- eveninglevel=evening :: To make more even, to become balanced or level.

Here we wish to clarify different uses of the same term "evening", and so define unique keys for each usage. In writing you would use the keys like so,

In the eveningtime I take to eveninglevel out the sand pit.

Let us now consider both irregular plurals and defective nouns.

- ox, oxen :: A male bovine animal.
- sheep, :: A domesticated ruminant mammal with a thick wooly coat.
- glasses, :: An optical instrument worn to correct vision.

In the case of "ox, oxen" we give the irregular plural form explicitly. "Sheep" is also an irregular plural and by just putting a comma but omitting the plural form no plural form will be generated (it will be treated as a singularia tantum). The same behaviour occurs with "glasses", and while it is a plurale tantum internally it will be represented as a singularia tantum, but the behaviour is identical and so this is fine.

Alias terms

Sometimes a term may be known by multiple names. Such a situation is supported by the use of "alias terms", who's definition is simply the key of the canonical term.

This is best illustrated through an example, for which we will visit the field of molecular biology.

- beta sheet :: Common structural motif in proteins in which different sections
  of the polypeptide chain run alongside each other, joined together by hydrogen
  bonding between atoms of the polypeptide backbone.

The beta sheet may also be referred to using the greek letter β instead of "beta", or as the "beta pleated sheet". We can support these variants like so:

- \beta sheet :: beta sheet
- beta pleated sheed :: beta sheet
- \beta-pleated sheet :: beta sheet

Since the definition of each of these terms is an exact match for "beta sheet", they will be recognised as an alias for that term.

Categorisation

To make working with a large collection of terms easier, you might use sub-headings, e.g.

* Glossary
** Animals
- late pleistocene wolf :: an extinct lineage of the grey wolf, thought to be
  the ancestor of the dog
- ox, oxen :: A male bovine animal.
- sheep, :: A domesticated ruminant mammal with a thick wooly coat.
** Technology
- Emacs :: A lisp-based generic user-centric text manipulation environment that
  masquerades as a text editor.
- glasses, :: An optical instrument worn to correct vision.

This structure will be ignored on export, allowing you to structure things freely without worrying about how it will affect the export. Should you wish to split up the exported entries into categories, this can be accomplished by using subheadings with the :category: tag. You can nest category-tagged subheadings inside each other, but only the innermost category will be applied.

* Glossary
** Animals :category:
** Technology :category:
*** Text Editors :category:
*** Mechanical :category:

Using terms

Org Glossary presumes that you'll want to associate a defined term with every usage of it. As such, on export it scans the document for all instances of a defined term and transforms them into one of the four glossary link types:

  • gls, singular lowercase
  • glspl, plural lowercase
  • Gls, singular sentence case
  • Glspl, plural sentence case

To switch from implicit associations to explicit, set org-glossary-automatic to nil and then only gls=/=glspl=/=Gls=/=Glspl links will be picked up. To convert implicit associations to explicit links, you can run M-x org-glossary-apply-terms (if nothing happens, try running M-x org-glossary-update-terms first).

Note that as Org Glossary relies on links, recognised usages can only occur in places where a link is appropriate (i.e. not inside a source block, verbatim text, or another link, etc.). In addition terms in headings are ignored, as this is considered broadly desirable.

In addition to all this, there's a bit of special behaviour for indexing. As you can discuss a topic without explicitly stating it, we support ox-texinfo-style #+[cfkptv]?index keywords. For example:

#+index: penguin
The Linux operating system has a flightless, fat waterfowl
(affectionately named Tux) as its mascot.

* Index
- penguin

Printing definition sections

When exporting a document, all identified glossary headings are unconditionally stripped from the document. If nothing else is done, based on term usage definition sections will be generated and appended to the document.

Fine grained control over the generation of definition sections is possible via the #+print_glossary: keyword, which disables the default "generate and append to document" behaviour.

Simply inserting a #+print_glossary: keyword will result in the default generated definition sections being inserted at the location of the #+print_glossary: keyword. However, customisation of the behaviour is possible via a number of babel-style :key value options, namely:

  • :type (glossary acronym index by default), the specific glossary-like structures that definition sections should be generated for
  • :level (0 by default), both:

    • The scope in which term uses should be searched for, with 0 representing the whole document, 1 within the parent level-1 heading, 2 the parent level-2 heading, etc.
    • One less than the minimum inserted heading level.
  • :consume (nil by default), if t or yes then marks terms defined here as having been defined, preventing them from being listed in any other #+print_glossary: unless :all is set to t or yes.
  • :all (nil by default), behaves as just described in :consumed.
  • :only-contents (nil by default), if t or yes then the :heading (from the export template) is excluded from the generated content.

Putting this all together, the default #+print_glossary: command written out in full is:

#+print_glossary: :type glossary acronym index :level 0 :consume no :all no :only-contents no

The minor mode

A visual indication of defined terms instances is provided by the minor mode org-glossary-mode. This essentially performs two actions:

  1. Run org-glossary-update-terms to update an buffer-local list of defined terms
  2. Add some fontification rules to make term uses stand out.

The local list of defined terms and fontification allow for a few niceties, such as:

  • Showing the term definition in the minibuffer when hovering over a fontified use
  • Calling M-x org-glossary-goto-term-definition or clicking on a fontified use to go to the definition
  • M-x org-glossary-insert-term-reference to view the list of currently defined terms, and perhaps insert a use.
  • In the case of Text Substitutions, displaying the replacement text on top of the use, when org-glossary-display-substitute-value is non-nil.

Export configuration

Setting export parameters

The content generated for export is governed by templates defined in org-glossary-export-specs. We will discuss them in detail shortly, but for now we consider that in different situations we will want different generated content. There are two levels on which this applies:

  1. By export backend
  2. By the type of glossary-like structure (Glossary, Acronyms, Index, etc.)

This is accounted for by creating an alist of alists of templates. This is a bit of a mouthful, so let's unpack what exactly is going on.

First, we create associations between export backends and specs, with the special "backend" t as the default value, i.e.

((t . DEFAULT-TEMPLATE-SET)
 (html . HTML-TEMPLATE-SET)
 (latex . LATEX-TEMPLATE-SET)
 ...)

When selecting the appropriate template set, we actually check each entry against the current export backend using org-export-derived-backend-p (in order). This has two implications:

  • You can export to derived backends (e.g. beamer) and things should just work
  • If specifying a template set for a derived backend (e.g. beamer) be sure to put it before any parent backends (i.e. latex, in beamer's case) in org-glossary-export-specs to ensure it is actually used.

The backend-appropriate template set is itself an alist of templates, like so:

((t . TEMPLATE)
 (glossary . TEMPLATE)
 (acronym . TEMPLATE)
 (index . TEMPLATE))

Once again, t gives the default value. For each of the types listed in org-glossary-headings, the template is filled out, pulling first from the backend-specific defaults template, then the global defaults. This gives a complete template set which governs the export behaviour for each type of glossary-like structure for the current backend.

Structure of an export template set

The export of term uses and definitions is governed by template sets. The default template set is given by (alist-get t (alist-get t ~org-glossary-export-specs)), the default value of which is given by the following property list:

(:use "%t"
 :first-use "%u"
 :definition "%t"
 :backref "%r"
 :heading ""
 :category-heading "* %c\n"
 :letter-heading "*%L*\n"
 :definition-structure-preamble ""
 :definition-structure "*%d*\\emsp{}%v\\ensp{}%b\n")

Each property refers to a particular situation, and the value is either:

  • A format string that represents the content that should be used
  • A function with the same signature as org-glossary--export-template, that generated the replacement content string.

The :use, :first-use, :definition, and :backref properties are applied during backend-specific content transcoding (i.e. using the syntax of the backend's output), while :definition-structure, :category-heading, and :letter-seperator are applied to a copy of the Org document just prior to the backend-specific export process (and so should be written using Org syntax).

The format strings can make use of the following tokens:

  • %t, the term being defined/used. This is pluralised and capitalised automatically based on the link type (gls=/=glspl=/=Gls=/=Glspl).
  • %v, the term definition value.
  • %k, the term key.
  • %K, the term key buffer-local nonce (number used only once). This will only be consistent within a particular Emacs session.
  • %l, the first letter of the term, in lower case.
  • %L, the first letter of the term, in upper case.
  • %r, the term reference index (only applicable to :use and :first-use).
  • %n, the number of times the term is used/referenced.
  • %c, the term category.
  • %u, the result of :use (primarily intended for convenience with :first-use)
  • %d, the result of :definition (only applicable to :definition-structure)
  • %b, all the :backref results joined with ", " (only applicable to :definition-structure).

The :definition-structure-preamble and :heading parameters are literal strings also inserted to the copy of the Org document just prior to backend-specific export stages.

To illustrate how these properties come into play, the following example uses the property names in place of their generated content.

Here's some text and now the term :first-use, if I use the term again
it is now :use. Once more, :use.

Now we have the appendix with glossary-like definitions.

:heading

:category-heading
:letter-heading
:definition-structure-preamble
:definition-structure(:definition def-value :backref)

To avoid superfluous letter headings (i.e. not helpful), we have org-glossary-print-letter-minimums. This variable specifies a threshold minimum number of distinct initial term letters and terms with the same letter before the :letter-heading template should be inserted.

If :heading, :category-heading, or :letter-heading start with "* " then asterisks will be automatically prefixed to set the headings to an appropriate level.

Creating a new glossary type

Let's consider a few examples. To start with, say we want to be able to define indexed terms under the heading Indices instead of Index. To accomplish this, all you need to do is add an entry to org-glossary-headings, which can be done via the customisation interface or with the following snippet:

(customize-set-value
 'org-glossary-headings
 (cl-remove-duplicates (append org-glossary-headings
                               '(("Indices" . index)))))

Should we actually want to have this be reflected in the export, we could either:

  • Rename the index heading to * Indices, or
  • Create a near-copy of index, just changing the heading

In the first case, all we need to do is execute the following snippet.

(org-glossary-set-export-spec t 'index :heading "* Indices)

Should we actually want to have this be reflected in the export, instead of associating Indices with the pre-defined index term we would first add an ("Indices" . indicies) pair to org-glossary-headings (as before). Then, we can copy each index template currently in org-glossary-export-specs and simply update the default :heading as we've just done for index.

(dolist (template-set org-glossary-export-specs)
  (when-let ((index-template (alist-get 'index (cdr template-set))))
    (push (cons 'indices index-template) (cdr template-set))))

(org-glossary-set-export-spec t 'indices :heading "* Indices)

For our final example, let's say we wanted to add support for Abbreviations. This works in much the same way as Acronyms, just with shortened forms of words or phrases not constructed from the first letters. After adding an ("Abbreviations" . abbreviation) pair to org-glossary-headings in the same manner as earlier, this is as simple as:

(push '(abbreviation :heading "* Abbreviations"
                     :first-use "%v (%u)")
      (plist-get t org-glossary-export-specs))

Tweaking specific exports

Instead of overwriting org-glossary-export-specs, it is recommended that you instead make use of setcdr or plist-put like so:

(org-glossary-set-export-spec 'latex t
  :backref "gls-%k-use-%r"
  :backref-seperator ","
  :definition-structure "*%d*\\emsp{}%v\\ensp{}@@latex:\\ifnum%n>0 \\labelcpageref{@@%b@@latex:}\\fi@@\n")

In this example we could alternatively set :definition-structure to a function to avoid the \ifnum%n>0 LaTeX switch.

(org-glossary-set-export-spec 'latex t
  :definition-structure
  (lambda (backend info term-entry form &optional ref-index plural-p capitalized-p extra-parameters)
    (org-glossary--export-template
     (if (plist-get term-entry :uses)
         "*%d*\\emsp{}%v\\ensp{}@@latex:\\labelcpageref{@@%b@@latex:}@@\n"
       "*%d*\\emsp{}%v\n")
     backend info term-entry ref-index
     plural-p capitalized-p extra-parameters)))

This allows for any change in other backends or the defaults you're not particularly attached to from freely updating.

Adding a new export backend

Adding a new export spec is as easy as pushing a spec list to org-glossary-export-specs, for example should we want to add an ox-md backend we could do this:

(push '(md (t :use "[%t](#gls-%K)"
              :definition "%t {#gls-%K}"
              :definition-structure "%d\n\\colon{} %v [%n uses]\n"))
      org-glossary-export-specs)

We need to remember to use \colon{} instead of : to avoid it being interpreted as Org fixed-width syntax.

Alternatively, we could use org-glossary-set-export-spec, which has the advantage of being idempotent, and I would argue a little clearer.

(org-glossary-set-export-spec 'md t
  :use "[%t](#gls-%K)"
  :definition "%t {#gls-%K}"
  :definition-structure "%d\n\\colon{} %v [%n uses]\n")