site/blog/textile-to-markdown-literat...

5.5 KiB

title date
textile-conversion Main 2017-02-08

textile-conversion Main

Author's Note: this was intended to be documentation for a service that never ended up being implemented. It was going to help Derpibooru convert its existing markup to Markdown. This never happened.

This program listens on port 5000 and serves an unchecked-path web handler that converts Derpibooru Textile via HTML into Markdown, using a two-step process.

The first step is to have SimpleTextile emit a HTML AST of the comment. The second is to have Pandoc turn that HTML into Markdown.

This is intended to be helpful during Derpi's migration from Textile.

Pragmas

The following pragma tells the compiler to automagically tease string literals into whatever type they need to be. For more information on this, see this page.

{-# LANGUAGE OverloadedStrings #-}
module Main where

Imports

In order to accomplish our task, we need to import some libraries.

import Data.String.Conv (toS)
import Network.Wai
import Network.HTTP.Simple
import Network.HTTP.Types
import Network.Wai.Handler.Warp (run)
import System.Environment (lookupEnv)
import Text.Pandoc
import Text.Pandoc.Error (PandocError, handleError)

Helper Functions

getEnvDefault queries an environment variable, returning a default value if it is unset.

getEnvDefault :: String -> String -> IO String
getEnvDefault name default' = do
    envvar <- lookupEnv name
    case envvar of
      Nothing -> pure default'
      Just x  -> pure x

htmlToMarkdown uses Pandoc to convert a HTML input string into the equivalent Markdown. The Either type is used here in place of raising an exception.

htmlToMarkdown :: String -> Either PandocError String
htmlToMarkdown inp = do
    let
        corpus = readHtml def inp

    case corpus of
        Left x ->  Left x
        Right x -> pure $ writeMarkdown def x

Web Application

Now we are getting into the meat of the situation. This is the main Application.

toMarkdown :: Application

First, let's use a guard to ensure that we are only accepting POST requests. If the request is not a POST request, return HTTP error code 405.

toMarkdown req respond
    | requestMethod req /= methodPost =
        respond $ responseLBS
            status405
            [("Content-Type", "text/plain")]
            "Not allowed"

Otherwise, this is a POST request, so we should:

  1. Unpack the data from the post body of the HTTP request
  2. Send the data to the Sinatra app for conversion from Textile to HTML
  3. Take the resulting HTML and feed it to htmlToMarkdown
  4. Respond with the resulting Markdown.

We use http-conduit to contact the Sinatra app.

    | otherwise = do
        body <- requestBody req
        targetHost <- getEnvDefault "TARGET_SERVER" "http://127.0.0.1:9292"
        remoteRequest' <- parseRequest ("POST " ++ targetHost ++ "/textile/html")

The ($) operator is a synonym for calling functions. It is defined in the Prelude as f $ x = f x and is mainly used for omitting parentheses. Here it is used to combine HTTP request settings into one big request.

Additionally we use a custom [Manager][manager] to avoid any issues with request timeouts, as those are not important for the scope of this tool.

        let settings = defaultManagerSettings { managerResponseTimeout = Nothing }
        manager <- newManager settings

        let remoteRequest = setRequestBodyLBS (toS body)
                          $ setRequestManager manager
                          $ remoteRequest'

Now it is time to send off the request and unpack the response.

        response <- httpLBS remoteRequest

If the sinatra app failed to deal with this properly for some reason, report its error as text/plain and return 400.

        if getResponseStatusCode response /= 200
        then respond $ responseLBS
            status400
            [("Content-Type", "text/plain")]
            $ toS $ getResponseBody response
        else do
            let rbody = toS $ getResponseBody response

Convert the result body into Markdown. If there is an error, respond with a 400 and the contents of that error.

            let mbody = htmlToMarkdown rbody

            case mbody of
                Left x ->
                    respond $ responseLBS
                        status400
                        [("Content-Type", "text/plain")]
                        $ toS $ show x
                Right x -> do
                    respond $ responseLBS
                        status200
                        [("Content-Type", "text/markdown")]
                        $ toS x

Now we bootstrap it all by running the toMarkdown Application on port 5000. No other code is needed.

main :: IO ()
main =
    run 5000 toMarkdown