Jekyll Style URLs with Hakyll
Posted on January 31, 2016
Recently, I switched from Jekyll to Hakyll for the generation of this Blog. In this article I want to talk about Hakyll’s routing mechanism and how to get it to generate the same URLs as Jekyll so that all the old links to your posts keep working.
If you want to follow along you can find the source code here. Step through the commits in the repository to see the steps presented in this article applied one after the other.
First, a few brief words about what Jekyll and Hakyll are. Both are static website generators. That is, they take a set of input files and settings, and generate a complete website in a static folder structure. This means that you don’t need any clever server-side software to serve your site. You can just upload it to any hosting service that allows you to serve Html files and that’s all you need.
Before switching, I used Jekyll Bootstrap which is a Blog scaffold based on Jekyll, Twitter Bootstrap, and a few other components that tries to make website generation as quick and easy as possible. When I refer to Jekyll in this article, then I really mean Jekyll Bootstrap.
For that reason I am not going to talk about the pros and cons of Hakyll vs. Jekyll. It would be a misleading comparison. I personally switched to Hakyll because it seemed like a good opportunity to learn a bit more about Haskell, and because it seems to be very flexible when it comes to changing the details of how your site should be generated. The main motivation, however, was just curiosity.
Jekyll’s Routing
Now, let’s talk about routing. Jekyll expects markdown files (or some other markup) in a special folder called _posts
. Their path should be of the following form.
_posts/YYYY-MM-DD-some-title.ext
That is, it should contain the publishing date, and then a title, all separated by dashes. The title may contain dashes itself, those will be left untouched. Jekyll will then take that input path and generate the following output path.
_site/category/YYYY/MM/DD/some-title/index.html
The whole website will be placed into the _site
directory. That is the directory that you want to serve to the web in the end. Next, there is a category directory. Jekyll will read the category from the YAML frontmatter of the markdown files. The date is split into the components year, month, day and each becomes its own directory. The title is made into a directory as well, and the generated Html file is stored as index.html
underneath.
When Jekyll generates URLs to these files they will have the following form.
/category/YYYY/MM/DD/some-title
That is, the filename index.html
is chopped off. In general, most consider that good practice as it allows one to change the technology behind how a website is generated without invalidating old URLs.
Hakyll’s Routing
When starting off with Hakyll’s example site, generated by hakyll-init
, it expects to find markdown files in the folder posts
. Their path should look about the same as for Jekyll.
posts/YYYY-MM-DD-some-title.ext
The generated output will be placed under the following path.
_site/posts/YYYY-MM-DD-some-title.html
As with Jekyll the whole website is placed under _site
. The markdown file will be translated to Html and be given the corresponding file-extension. Other than that the path is not changed at all. Hakyll also doesn’t apply any transformation on the URLs by which it links to your content. I.e. the generated links will look like this:
/posts/YYYY-MM-DD-some-title.html
If we want Hakyll to generate the same URLs as Jekyll would we need to address the following four points.
- Files should be stored as
index.html
under a certain path, and URLs should not include that filename. - The date in the filename should be split into one directory for each component.
- The prefix
posts/
should be dropped. - And we need to generate a category directory in front of the rest of the path.
Extension-less URLs
Let’s first look at the generated output path. Hakyll calls this routing. The tutorial explains the basics behind the routing process. The default setup looks like this.
match "posts/*" $ do
route $ setExtension "html"
This code finds all files in the posts
directory and exchanges their file-extension by html
. What we want to do now, is to slip a /index
in between the extension and the rest of the path. Hakyll’s routing mechanism is easily extensible through the provided function composeRoutes
, which does what the name suggests.
composeRoutes :: Routes -> Routes -> Routes
Additionally, Hakyll offers a function to fully customize how we generate a route:
customRoute :: (Identifier -> FilePath) -> Routes
For our purposes an Identifier
is just something that has a path. We can extract it with toFilePath
. The following function will perform the transformation that we’re looking for.
appendIndex :: Routes
appendIndex = customRoute $
(\(p, e) -> p </> "index" <.> e) . splitExtension . toFilePath
We extract the FilePath
from the Identifier
; split it into a pair containing the path without extension in the first element, and the extension in the second using splitExtension
from System.FilePath
; then, we generate a path of the form path/index.ext
using the operators </>
, and <.>
from the same module.
Next, we need to change the links that Hakyll generates within our sites. We want all the links on the home page and in the archive to be extension-less. If we generate a feed for our site we also want the links in there to be extension-less. If we need to refer to our content anywhere else, e.g. a sharing plugin like Reddit’s share button, we also want the URLs we supply there to be extension-less.
I looked online for a solution to this problem. However, the only solutions I could find used URL transformations in the already generated content. Hakyll offers a function called withUrls
with the following signature.
withUrls :: (String -> String) -> String -> String
It takes a string transformation function and applies it to all URLs that it can find in the given string. We can inject it into Hakyll’s compiler pipeline and let it transform the Html code of our generated sites. However, if we take a look at the definition of that function we find that it only goes through a predefined list of Html attributes such as src
, href
, data
, or poster
and applies the transformation function on them. This works in simple cases, but it fails for the links in generated RSS/Atom feeds, where the URL needs to be changed within an <id>
tag, and for things like the Reddit sharing button, where the URL is passed as a Javascript variable.
If we think about it, we come to the conclusion that withUrls
applies at a too late stage. We don’t want to modify URLs in already generated Html code. Instead, we want to change the URLs before they are inserted into that Html code in the first place.
So, where does Hakyll get the URLs from? Hakyll has a notion of context for every object that it generates. In the case of our Blog posts this context contains a field called "url"
by default which holds the URL by which we would like to refer to that particular post. Fortunately, Hakyll allows to modify this context. The type Context a
is a Monoid and we can therefore combine multiple contexts with mappend
. The documentation states that if we combine two contexts then the left-hand-side can overwrite the right-hand-side. Additionally, Hakyll offers a function called mapContext
.
mapContext :: (String -> String) -> Context a -> Context a
That function takes a transformation function as its first argument and expects a string field to apply the transformer to as its second argument. The string transformation that we want to perform is to drop the index.html
part of a URL. However, to be robust we should not transform URLs that actually don’t end with index.html
.
The module System.FilePath
contains a function called splitFileName
which returns a pair of the path, and the filename without the path. With it we can define our transformation as follows.
transform :: String -> String
transform url = case splitFileName url of
(p, "index.html") -> takeDirectory p
_ -> url
The call to takeDirectory
removes the trailing slash from the path in a safe way. A path like /some/path/index.html
would be transformed to /some/path
without a trailing slash, but a path like /index.html
would be transformed to /
. I.e. the site-root.
Now, we need to decide what context to apply that transformation to. Hakyll offers a function called urlField
:
urlField :: String -> Context a
Its argument defines under which field-name the URL will be stored. We could choose any name here. However, in order to overwrite the default URL we need to pass the parameter "url"
.
If we combine all that we get the following context function.
dropIndexHtml :: String -> Context a
dropIndexHtml key = mapContext transform (urlField key) where
transform url = case splitFileName url of
(p, "index.html") -> takeDirectory p
_ -> url
Which we can inject into the post’s context as follows.
postCtx :: Context String
postCtx =
dateField "date" "%B %e, %Y" `mappend`
dropIndexHtml "url" `mappend`
defaultContext
Note, that if you need access to the actual full URL at any point, then you can add it to the context as another field under a different name with another call to urlField
.
Date Components
The next issue is the publishing date in the URL. At this stage Hakyll generates URLs of the following form.
/posts/YYYY-MM-DD-some-title
But, what we want looks like this:
/posts/YYYY/MM/DD/some-title
So, we need to find the part of the path that looks like a date and in it replace all dashes by slashes. Hakyll has a function in its toolbox that matches this task perfectly.
gsubRoute :: String -> (String -> String) -> Routes
The first parameter is a regex pattern to match against, the second argument is a string transformation function to be applied to the matching parts. The pattern we want to match can be formulated as follows.
[0-9]{4}-[0-9]{2}-[0-9]{2}-
I.e. groups of four, or two digits separated by dashes with another dash in the end. The transformation to apply is to replace all dashes with slashes. This can be achieved by the following call to Hakyll’s replaceAll
function.
replaceAll "-" (const "/")
The route transformer then looks as follows.
dateFolders :: Routes
dateFolders =
gsubRoute "/[0-9]{4}-[0-9]{2}-[0-9]{2}-" $ replaceAll "-" (const "/")
Dropping the Prefix
Next let’s get rid of that posts/
prefix in the generated URL. This is actually very easy. We can use the function gsubRoute
again. Just, this time we want to replace the matching part with the empty string.
dropPostsPrefix :: Routes
dropPostsPrefix = gsubRoute "posts/" $ const ""
Now, we’re almost there.
Group Posts by Category
Finally, we want to group our posts by category. There are two ways to do this.
The simple way is to just move the markdown files into subfolders according to their category, and then change all the routing patterns in your site.hs
file to match only posts within a category in their path. In other words, replace every occurrence of the pattern posts/*
by posts/*/*
in your site.hs
, and move all the markdown files into the corresponding subdirectory.
Another way, which is the way in which Jekyll does it, is by a metadata field which is inserted into the URL. For that we first need to add category fields to the YAML headers of all Blog posts. E.g.
---
title: S.P.Q.R.
category: category_a
---
Then we need a route transformer which adds the category into the route. Fortunately, Hakyll provides a function for that purpose.
metadataRoute :: (Metadata -> Routes) -> Routes
We can combine it with the function customRoute
that we used before and prepend the category to the path.
prependCategory :: Routes
prependCategory = metadataRoute $ \md -> customRoute $
(md M.! "category" </>) . toFilePath
Hakyll represents metadata as a map from Data.Map
in the containers
package. In this function we use the (!)
operator from the same module to extract the category field. Note, that this function will fail with a run-time error if a post does not have a category field. If that is not the behaviour you want, then you should use the lookup
function from Data.Map
instead and handle the Nothing
case in whichever way you see fit.
Conclusion
With that we taught Hakyll how to generate URLs just like Jekyll does. Based on this experience I will dare one comparison between Jekyll and Hakyll. The former seems more like a static website generator that you can configure. The latter, however, really seems like a library that you use to write your own static website generator. Naturally, it takes a little bit more effort to get anything done with Hakyll. On the other hand you have immediate access to every aspect of the website generation.
Please leave a comment if you noticed any mistake, have any suggestions, or any other form of feedback. Thanks for reading!