Hakyll+ZMQ+KaTeX

July 6, 2020,
keywords: haskell, hakyll, katex, zmq, web, static sites

It’s been a couple years since my post A JavaScript-free Hakyll site. Today I got an email from someone asking for some help setting up their own JavaScript-free Hakyll site.

The approach in that post is really slow! Most of the slowdown is from the following piece of code.

readCreateProcess (shell $ kaTeXCmd mt) inner

We take a single String called inner, create a KaTeX\KaTeX cli process, feed inner to the process’ standard input, read the converted string from its standard output, and then close the process.

At the time I did not realize how slow creating a new processes for every piece of LaTeX\LaTeX in your code would be. My thought process was something like this: “It’s been a million years of Operating Systems research, starting the same process over and over again should not be that bad.” Afterall, I do this all the time with Unix tools. But I started to feel the slow down at around 3 files containing LaTeX\LaTeX.

ZMQ IPC to the Rescue!

I updated my site to use a single KaTeX\KaTeX process which runs as a server, and talks to Hakyll over ZMQ.

Here is the JavaScript code.

const katex = require("katex");
const zmq = require("zeromq");

async function run() {
    const sock = new zmq.Reply;

    await sock.bind("ipc:///tmp/katex");

    for await (const [msg] of sock) {
        let msgObj = JSON.parse(msg);
        let latex = msgObj.latex;
        let options = msgObj.options;
        options.throwOnError = false;
        let html = katex.renderToString(latex, options);
        console.log(`Recieved\n${msg}`);
        console.log(`Sending\n${html}`);
        await sock.send(html);
    }
}

run();

Here is the updated Haskell code.


{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE DeriveGeneric #-}

module KaTeX.KaTeXIPC
    ( kaTeXifyIO
    ) where

import Control.Monad
import System.ZMQ4.Monadic
import qualified Data.ByteString.Char8 as BS (putStr, putStrLn)
import Data.ByteString (ByteString)
import Data.ByteString.Lazy (toStrict)
import GHC.Generics
import Data.Text
import Data.Text.Encoding (decodeUtf8)
-- Pandoc
import Text.Pandoc.Definition (MathType(..), Inline(Math, RawInline), Pandoc, Format(..))
import Text.Pandoc.Readers.HTML (readHtml)
import Text.Pandoc.Options (def)
import Text.Pandoc.Walk (walkM)
import Text.Pandoc.Class (PandocPure, runPure)
-- Aeson
import Data.Aeson hiding (Options)

--------------------------------------------------------------------------------
-- DataTypes
--------------------------------------------------------------------------------
newtype Options = Options
  { displayMode :: Bool
  } deriving (Generic, Show)

data TeXMath = TeXMath
  { latex :: Text
  , options :: Options
  } deriving (Generic, Show)

--------------------------------------------------------------------------------
-- Instances
--------------------------------------------------------------------------------
instance ToJSON Options where
  -- No need to provide implementation (Generic)
instance FromJSON Options where
  -- No need to provide implementation (Generic)
instance ToJSON TeXMath where
  -- No need to provide implementation (Generic)
instance FromJSON TeXMath where
  -- No need to provide implementation (Generic)

--------------------------------------------------------------------------------
-- Convert Inline
--------------------------------------------------------------------------------
toTeXMath :: MathType -> Text -> TeXMath
toTeXMath mt inner =
  TeXMath
  { latex = inner
  , options = toOptions mt
  }
  where
    toOptions DisplayMath = Options { displayMode = True }
    toOptions _ = Options { displayMode = False }


toKaTeX :: TeXMath -> IO ByteString
toKaTeX tex = runZMQ $ do
  requester <- socket Req
  connect requester "ipc:///tmp/katex"
  send requester [] (toStrict $ encode tex)
  receive requester

parseKaTeX :: Text -> Maybe Inline
parseKaTeX txt =
  -- Ensure txt is parsable HTML
  case runPure $ readHtml def txt of
    Right _   -> Just (RawInline (Format "html") txt)
    otherwise -> Nothing

kaTeXify :: Inline -> IO Inline
kaTeXify orig@(Math mt str) =
  do
    bs <- toKaTeX (toTeXMath mt str)
    case (parseKaTeX $ decodeUtf8 bs) of
      Just inl -> return inl
      Nothing  -> return orig
kaTeXify x = return x

--------------------------------------------------------------------------------
-- Convert Pandoc
--------------------------------------------------------------------------------
kaTeXifyIO :: Pandoc -> IO Pandoc
kaTeXifyIO p = do
  walkM kaTeXify p