Jekyll Style URLs with Hakyll

Posted on January 31, 2016

Re­cently, I switched from Jekyll to Hakyll for the gen­er­a­tion of this Blog. In this ar­ticle I want to talk about Hakyll’s routing mech­a­nism and how to get it to gen­erate the same URLs as Jekyll so that all the old links to your posts keep work­ing.

If you want to follow along you can find the source code here. Step through the com­mits in the repos­i­tory to see the steps pre­sented in this ar­ticle ap­plied one after the other.

First, a few brief words about what Jekyll and Hakyll are. Both are static web­site gen­er­a­tors. That is, they take a set of input files and set­tings, and gen­erate a com­plete web­site in a static folder struc­ture. This means that you don’t need any clever server-side soft­ware to serve your site. You can just up­load it to any hosting ser­vice that al­lows you to serve Html files and that’s all you need.

Be­fore switch­ing, I used Jekyll Boot­strap which is a Blog scaf­fold based on Jekyll, Twitter Boot­strap, and a few other com­po­nents that tries to make web­site gen­er­a­tion as quick and easy as pos­si­ble. When I refer to Jekyll in this ar­ti­cle, then I re­ally mean Jekyll Boot­strap.

For that reason I am not going to talk about the pros and cons of Hakyll vs. Jekyll. It would be a mis­leading com­par­i­son. I per­son­ally switched to Hakyll be­cause it seemed like a good op­por­tu­nity to learn a bit more about Haskell, and be­cause it seems to be very flex­ible when it comes to changing the de­tails of how your site should be gen­er­ated. The main mo­ti­va­tion, how­ever, was just cu­rios­ity.

Jekyll’s Routing

Now, let’s talk about rout­ing. Jekyll ex­pects mark­down files (or some other markup) in a spe­cial folder called _posts. Their path should be of the fol­lowing form.

_posts/YYYY-MM-DD-some-title.ext

That is, it should con­tain the pub­lishing date, and then a ti­tle, all sep­a­rated by dashes. The title may con­tain dashes it­self, those will be left un­touched. Jekyll will then take that input path and gen­erate the fol­lowing output path.

_site/category/YYYY/MM/DD/some-title/index.html

The whole web­site will be placed into the _site di­rec­tory. That is the di­rec­tory that you want to serve to the web in the end. Next, there is a cat­e­gory di­rec­tory. Jekyll will read the cat­e­gory from the YAML front­matter of the mark­down files. The date is split into the com­po­nents year, month, day and each be­comes its own di­rec­tory. The title is made into a di­rec­tory as well, and the gen­er­ated Html file is stored as index.html un­der­neath.

When Jekyll gen­er­ates URLs to these files they will have the fol­lowing form.

/category/YYYY/MM/DD/some-title

That is, the file­name index.html is chopped off. In gen­eral, most con­sider that good prac­tice as it al­lows one to change the tech­nology be­hind how a web­site is gen­er­ated without in­val­i­dating old URLs.

Hakyll’s Routing

When starting off with Hakyll’s ex­ample site, gen­er­ated by hakyll-init, it ex­pects to find mark­down files in the folder posts. Their path should look about the same as for Jekyll.

posts/YYYY-MM-DD-some-title.ext

The gen­er­ated output will be placed under the fol­lowing path.

_site/posts/YYYY-MM-DD-some-title.html

As with Jekyll the whole web­site is placed under _site. The mark­down file will be trans­lated to Html and be given the cor­re­sponding file-ex­ten­sion. Other than that the path is not changed at all. Hakyll also doesn’t apply any trans­for­ma­tion on the URLs by which it links to your con­tent. I.e. the gen­er­ated links will look like this:

/posts/YYYY-MM-DD-some-title.html

If we want Hakyll to gen­erate the same URLs as Jekyll would we need to ad­dress the fol­lowing four points.

  • Files should be stored as index.html under a cer­tain path, and URLs should not in­clude that file­name.
  • The date in the file­name should be split into one di­rec­tory for each com­po­nent.
  • The prefix posts/ should be dropped.
  • And we need to gen­erate a cat­e­gory di­rec­tory in front of the rest of the path.

Ex­ten­sion-less URLs

Let’s first look at the gen­er­ated output path. Hakyll calls this routing. The tu­to­rial ex­plains the ba­sics be­hind the routing process. The de­fault setup looks like this.

match "posts/*" $ do
    route $ setExtension "html"

This code finds all files in the posts di­rec­tory and ex­changes their file-ex­ten­sion by html. What we want to do now, is to slip a /index in be­tween the ex­ten­sion and the rest of the path. Hakyll’s routing mech­a­nism is easily ex­ten­sible through the pro­vided func­tion composeRoutes, which does what the name sug­gests.

composeRoutes :: Routes -> Routes -> Routes

Ad­di­tion­ally, Hakyll of­fers a func­tion to fully cus­tomize how we gen­erate a route:

customRoute :: (Identifier -> FilePath) -> Routes

For our pur­poses an Identifier is just some­thing that has a path. We can ex­tract it with toFilePath. The fol­lowing func­tion will per­form the trans­for­ma­tion that we’re looking for.

appendIndex :: Routes
appendIndex = customRoute $
    (\(p, e) -> p </> "index" <.> e) . splitExtension . toFilePath

We ex­tract the FilePath from the Identifier; split it into a pair con­taining the path without ex­ten­sion in the first el­e­ment, and the ex­ten­sion in the second using splitExtension from System.FilePath; then, we gen­erate a path of the form path/index.ext using the op­er­a­tors </>, and <.> from the same mod­ule.

Next, we need to change the links that Hakyll gen­er­ates within our sites. We want all the links on the home page and in the archive to be ex­ten­sion-less. If we gen­erate a feed for our site we also want the links in there to be ex­ten­sion-less. If we need to refer to our con­tent any­where else, e.g. a sharing plugin like Red­dit’s share but­ton, we also want the URLs we supply there to be ex­ten­sion-less.

I looked on­line for a so­lu­tion to this prob­lem. How­ever, the only so­lu­tions I could find used URL trans­for­ma­tions in the al­ready gen­er­ated con­tent. Hakyll of­fers a func­tion called withUrls with the fol­lowing sig­na­ture.

withUrls :: (String -> String) -> String -> String

It takes a string trans­for­ma­tion func­tion and ap­plies it to all URLs that it can find in the given string. We can in­ject it into Hakyll’s com­piler pipeline and let it trans­form the Html code of our gen­er­ated sites. How­ever, if we take a look at the de­f­i­n­i­tion of that func­tion we find that it only goes through a pre­de­fined list of Html at­trib­utes such as src, href, data, or poster and ap­plies the trans­for­ma­tion func­tion on them. This works in simple cases, but it fails for the links in gen­er­ated RSS/Atom feeds, where the URL needs to be changed within an <id> tag, and for things like the Reddit sharing but­ton, where the URL is passed as a Javascript vari­able.

If we think about it, we come to the con­clu­sion that withUrls ap­plies at a too late stage. We don’t want to modify URLs in al­ready gen­er­ated Html code. In­stead, we want to change the URLs be­fore they are in­serted into that Html code in the first place.

So, where does Hakyll get the URLs from? Hakyll has a no­tion of con­text for every ob­ject that it gen­er­ates. In the case of our Blog posts this con­text con­tains a field called "url" by de­fault which holds the URL by which we would like to refer to that par­tic­ular post. For­tu­nately, Hakyll al­lows to modify this con­text. The type Context a is a Monoid and we can there­fore com­bine mul­tiple con­texts with mappend. The doc­u­men­ta­tion states that if we com­bine two con­texts then the left-hand-side can over­write the right-hand-side. Ad­di­tion­ally, Hakyll of­fers a func­tion called mapContext.

mapContext :: (String -> String) -> Context a -> Context a

That func­tion takes a trans­for­ma­tion func­tion as its first ar­gu­ment and ex­pects a string field to apply the trans­former to as its second ar­gu­ment. The string trans­for­ma­tion that we want to per­form is to drop the index.html part of a URL. How­ever, to be ro­bust we should not trans­form URLs that ac­tu­ally don’t end with index.html.

The module System.FilePath con­tains a func­tion called splitFileName which re­turns a pair of the path, and the file­name without the path. With it we can de­fine our trans­for­ma­tion as fol­lows.

transform :: String -> String
transform url = case splitFileName url of
                    (p, "index.html") -> takeDirectory p
                    _                 -> url

The call to takeDirectory re­moves the trailing slash from the path in a safe way. A path like /some/path/index.html would be trans­formed to /some/path without a trailing slash, but a path like /index.html would be trans­formed to /. I.e. the site-root.

Now, we need to de­cide what con­text to apply that trans­for­ma­tion to. Hakyll of­fers a func­tion called urlField:

urlField :: String -> Context a

Its ar­gu­ment de­fines under which field-name the URL will be stored. We could choose any name here. How­ever, in order to over­write the de­fault URL we need to pass the pa­ra­meter "url".

If we com­bine all that we get the fol­lowing con­text func­tion.

dropIndexHtml :: String -> Context a
dropIndexHtml key = mapContext transform (urlField key) where
    transform url = case splitFileName url of
                        (p, "index.html") -> takeDirectory p
                        _                 -> url

Which we can in­ject into the post’s con­text as fol­lows.

postCtx :: Context String
postCtx =
    dateField "date" "%B %e, %Y" `mappend`
    dropIndexHtml "url"          `mappend`
    defaultContext

Note, that if you need ac­cess to the ac­tual full URL at any point, then you can add it to the con­text as an­other field under a dif­ferent name with an­other call to urlField.

Date Com­po­nents

The next issue is the pub­lishing date in the URL. At this stage Hakyll gen­er­ates URLs of the fol­lowing form.

/posts/YYYY-MM-DD-some-title

But, what we want looks like this:

/posts/YYYY/MM/DD/some-title

So, we need to find the part of the path that looks like a date and in it re­place all dashes by slashes. Hakyll has a func­tion in its toolbox that matches this task per­fectly.

gsubRoute :: String -> (String -> String) -> Routes

The first pa­ra­meter is a regex pat­tern to match against, the second ar­gu­ment is a string trans­for­ma­tion func­tion to be ap­plied to the matching parts. The pat­tern we want to match can be for­mu­lated as fol­lows.

[0-9]{4}-[0-9]{2}-[0-9]{2}-

I.e. groups of four, or two digits sep­a­rated by dashes with an­other dash in the end. The trans­for­ma­tion to apply is to re­place all dashes with slashes. This can be achieved by the fol­lowing call to Hakyll’s replaceAll func­tion.

replaceAll "-" (const "/")

The route trans­former then looks as fol­lows.

dateFolders :: Routes
dateFolders =
    gsubRoute "/[0-9]{4}-[0-9]{2}-[0-9]{2}-" $ replaceAll "-" (const "/")

Drop­ping the Prefix

Next let’s get rid of that posts/ prefix in the gen­er­ated URL. This is ac­tu­ally very easy. We can use the func­tion gsubRoute again. Just, this time we want to re­place the matching part with the empty string.

dropPostsPrefix :: Routes
dropPostsPrefix = gsubRoute "posts/" $ const ""

Now, we’re al­most there.

Group Posts by Cat­e­gory

Fi­nally, we want to group our posts by cat­e­gory. There are two ways to do this.

The simple way is to just move the mark­down files into sub­folders ac­cording to their cat­e­gory, and then change all the routing pat­terns in your site.hs file to match only posts within a cat­e­gory in their path. In other words, re­place every oc­cur­rence of the pat­tern posts/* by posts/*/* in your site.hs, and move all the mark­down files into the cor­re­sponding sub­di­rec­tory.

An­other way, which is the way in which Jekyll does it, is by a meta­data field which is in­serted into the URL. For that we first need to add cat­e­gory fields to the YAML headers of all Blog posts. E.g.

---
title: S.P.Q.R.
category: category_a
---

Then we need a route trans­former which adds the cat­e­gory into the route. For­tu­nately, Hakyll pro­vides a func­tion for that pur­pose.

metadataRoute :: (Metadata -> Routes) -> Routes

We can com­bine it with the func­tion customRoute that we used be­fore and prepend the cat­e­gory to the path.

prependCategory :: Routes
prependCategory = metadataRoute $ \md -> customRoute $
    (md M.! "category" </>) . toFilePath

Hakyll rep­re­sents meta­data as a map from Data.Map in the containers pack­age. In this func­tion we use the (!) op­er­ator from the same module to ex­tract the cat­e­gory field. Note, that this func­tion will fail with a run-time error if a post does not have a cat­e­gory field. If that is not the be­hav­iour you want, then you should use the lookup func­tion from Data.Map in­stead and handle the Nothing case in whichever way you see fit.

Con­clu­sion

With that we taught Hakyll how to gen­erate URLs just like Jekyll does. Based on this ex­pe­ri­ence I will dare one com­par­ison be­tween Jekyll and Hakyll. The former seems more like a static web­site gen­er­ator that you can con­fig­ure. The lat­ter, how­ever, re­ally seems like a li­brary that you use to write your own static web­site gen­er­a­tor. Nat­u­rally, it takes a little bit more ef­fort to get any­thing done with Hakyll. On the other hand you have im­me­diate ac­cess to every as­pect of the web­site gen­er­a­tion.

Please leave a com­ment if you no­ticed any mis­take, have any sug­ges­tions, or any other form of feed­back. Thanks for read­ing!