Hugo's Processing Model and URL Management

The Hugo static site generator takes some plain-text content, marries it to a bunch of HTML templates, and produces a set of complete, static HTML pages that can be served by any generic, stand-alone web server. Because the site generated by Hugo is entirely static, all URLs in the public site must correspond directly to objects in the filesystem.

Part 3: Processing Model, Input/Output Mapping, URL Management

In principle, Hugo takes a hierarchy of directories and files underneath the source directory, and recreates the same hierarchy in the destination directory: it couldn’t be simpler. But there are two circumstances that conspire to turn the whole topic of input/output mapping into the most confusing aspect of working with Hugo:

  • The path names of the generated files will be the public URLs of the finished site. Any amount of URL management, rewriting, or cleaning therefore amounts to changes in the mapping of source to destination files.

  • For each directory, Hugo automatically creates a page, showing all the items in that directory. This page is not based on user-provided content; it is created synthetically by Hugo. But users may want to add to or modify the content of these created pages. Hugo provides a mechanism for doing so that sometimes creates additional confusion. (In particular as the Hugo documentation of this mechanism is not noted for its clarity.)

Clean URLs

The first source of complexity is the desire to have “clean URLs” that end with a directory name, not a filename and extension:

www.example.com/news/what-happened-today/           Clean
www.example.com/news/what-happened-today.html       Ugly

Because in a static site, any public URL must correspond to an object in the filesystem, the generated filesystem objects must be:

public/news/what-happened-today/index.html

Most web servers are configured to silently serve the index.html file when the request URL points to the parent directory.

To create output at this URL, Hugo allows two different input styles:

content/news/what-happened-today.md                 File
content/news/what-happened-today/index.md           Directory with index.md

Either of these alternatives will map to the public URL stated earlier. (Of course, you shouldn’t have both of them in your input directory; otherwise, the results will clobber each other).

Here is the problem: remember that Hugo will automatically create a synthetic page for all directories in the input source tree? Clearly, for the directory what-happened-today in the second alternative, this is not appropriate, because this directory contains only a single item, which is itself a page. Hence Hugo has the special rule:

If a directory contains a file called index.md, then process this directory as if it was a file!

Why, then, allow directories that don’t contain items, but that map to single pages at all? Because they prevent cluttering the namespace if there are auxiliary files (such as images)!

Imagine that the page in question was referring to an image, say img.png. Hugo copies files that are not Markdown directly from their location in the source tree to exactly the same position in the destination directory. Hence a file at content/news/img.png would be copied to public/news/img.png, cluttering the namespace in that directory. (Alternatively, you could have all image files in the content/static/ directory, again cluttering the global namespace.)

By contrast, if the input file resides in its own directory, then the image file can also be placed into that directory:

content/news/what-happened-today/index.md
content/news/what-happened-today/img.png

Both files will be mapped to the directory public/news/what-happened-today/ in the output directory. The image file will be local to this directory, and not clutter the wider namespace.

To summarize:

  • Input can either be a Markdown file with an arbitrary name, or a directory containing a Markdown file named index.md.
  • Either will be mapped to a directory, containing an index.html file, with the content placed into that file.
  • Directories containing an index.md file will not be treated as directories, but will be processed as if they were a file.

Customizing Directory Listings

For each directory, Hugo creates a synthetic page, typically showing the items in the directory. It uses the “list” template for the layout of the resulting page, and in general, there is no user-provided “content” for that page.

But what if the user would like to provide some content, after all? Or possibly just some processing instructions in the frontmatter?

To allow for this, Hugo allows for a special file to be placed into a directory. This file must be called _index.md. If such a file is found, then its contents will be made available to the list template that is used to generate the directory listing page. (It is up to the template to make use of the content; the template may ignore it. A typical use is for the _index.md file to contain only processing instructions in its frontmatter.)

To summarize:

  • If a file called _index.md is found in a directory, then its contents will be made available to the list template that is used to generate the directory listing page for this directory.

  • The directory will be processed as a directory, not as a file.

Overriding Filenames

In everything so far, I assumed that the filesystem name of an object in the source tree was going to become part of the public URL for the generated page. (In the example above, either the file basename or the directory name what-happened-today became part of the public URL.)

But Hugo also allows to override the filename of the input file through frontmatter parameters! In this case, the generated HTML file can be at an arbitrary position in the destination directory; no matter where its corresponding input file resides in the source tree.

There are three frontmatter parameters that matter in this context:

title
The title parameter is generally important, because many themes use its value for visible headlines. But it is also the default for the page-specific part of the visible URL.
slug
The last part of a URL, identifying the specific page or piece of content. (In www.example.com/news/what-happened-today/, the slug is what-happened-today.)
url
The full path part of a URL (the part following the domain).

Yet another way to override the default output location is to configure “permalinks” in the global config.toml file. This option is only available for “sections” (that is, for the top-level directories directly underneath content/). For each such “section” a URL pattern can be specified in the site configuration file. For all content in this section, the corresponding output will be generated at the location pointed to by that pattern. The pattern can include fixed strings, as well a certain variables populated by Hugo. For example, it is possible to interject the year into the URL for blog posts:

blog = "/blog/:year/:slug/"

This will render all content underneath content/posts/ at URLs whose path starts with the fixed string “blog”, followed by year, and the title of the piece.

The Home Page

The Home page is a special case: one may think of it as a “content” page. But because it sits at the top of the directory hierarchy, it must be a “list” page. Furthermore, any user-provided content must be in a file called _index.md to ensure that processing does not stop at the root of the document directory! (Many themes provide a special template, called index.html, that is only going to be used to render the home page.)

A Worked Example

The following example shows the contents of a source directory, and the directories and files that Hugo will typically map them to (assuming nothing is overridden in any of the files' frontmatter). (Two dashes -- indicate a missing file!)

content/                public/
  --                      index.html                         LIST page
                        
  stuff.md                stuff/index.html
  
  about/
    index.md              about/index.html

  posts/
    --                    posts/index.html                   LIST page
    first.md              posts/first/index.html
    other/
      post.md             posts/other/post/index.html
      fedex.md            posts/other/fedex/index.html
    second.md             posts/second/index.html
    final/
      index.md            posts/final/index.html

  guides/
    _index.md             guides/index.html                  LIST page
    victor.md             guides/victor/index.html
    hugo.md               guides/hugo/index.html

  bundle/
    index.md              bundle/index.html
    img.png               bundle/img.png                     direct

  problem/
    index.md              problem/index.html                 SINGLE page
    topic.md              --                                 LOST
    text.md               --                                 LOST
    img.png               problem/img.png                    direct

  nested/
    index.md              nested/index.html                  SINGLE page
    img1.png              nested/img1.png                    direct
    deeper/               --                                 LOST
      index.md            --                                 LOST
    img/
      img2.png            nested/img/img2.png                direct
    mixed/
      index.md            --                                 LOST
      img3.png            nested/mixed/img3.png              direct

It is worth studying this example in some detail.

  1. Although there is no user-provided content for it, Hugo does create a home page! Remember that the home page uses a list template. To provide custom content for the home page, it must be in a file called _index.md at the root of the source directory.

  2. The next two pages demonstrate the two possible types of input: either as named file (stuff.md) or as named directory (about/) containing an index.md file.

  3. The posts/ directory shows that directories can be nested. The directory listing page for the posts/ directory does not have user-provided content; it is synthetically generated by Hugo.

  1. By contrast, the guides/ directory contains an _index.md file that is used by Hugo to supplement the directory listing page. Hugo treats the guides/ directory as directory, generating pages for the content items (victor.md and hugo.md).

  2. The bundle/ directory shows how to bundle an image with a page.

  3. The next two directories show some commonly encountered problems. The problem/ directory contains an index.md file, which means that Hugo treats this directory as a “page” and will not process any input (Markdown) files in this directory or any directory below. By contrast, non-input files (such as images) are faithfully copied to the destination directory.

  4. The nested/ directory demonstrates the same problem with nested directories.

Hugo’s Processing Model

Hugo’s processing model for input files can be summarized like this (this may not be exactly correct, but it seems good enough for now):

  1. Recursively visit each directory.

  2. For each directory, create a public destination directory of the same name.

  3. If the current directory contains index.md, the directory is considered a “leaf directory”:

    • use the single page template to transform index.md into index.html in the destination directory.
    • STOP processing any Markdown files in this directory or any of its children.
    • do copy any Non-Markdown resources (images, also those in subdirectories) to the destination directory (see step 5).
  4. If the current directory does not contain index.md, then the directory is considered a “branch directory”:

    • use the list page template to create index.html in the destination directory, showing items in the current directory.
    • if there is an _index.md in the current directory, include its contents when generating index.html.
  5. For all items in current directory:

    • If Markdown, create a public directory, and use the single page template to create index.html in that directory.
    • Otherwise, copy over directly, without processing, to target directory.
  6. Do not create a public destination directory if it would be empty (because the source directory is empty, or because it contains only materials that would be discarded).