178 lines
5.3 KiB
Markdown
178 lines
5.3 KiB
Markdown
|
---
|
||
|
title: textile-conversion Main
|
||
|
date: 2017-02-08
|
||
|
---
|
||
|
|
||
|
# textile-conversion Main
|
||
|
|
||
|
This program listens on port 5000 and serves an unchecked-path web handler that
|
||
|
converts Derpibooru Textile via HTML into Markdown, using a two-step process.
|
||
|
|
||
|
The first step is to have SimpleTextile emit a HTML AST of the comment.
|
||
|
The second is to have Pandoc turn that HTML into Markdown.
|
||
|
|
||
|
This is intended to be helpful during Derpi's migration from Textile.
|
||
|
|
||
|
## Pragmas
|
||
|
|
||
|
The following pragma tells the compiler to automagically tease string literals
|
||
|
into whatever type they need to be. For more information on this, see [this page][hs-ovs].
|
||
|
|
||
|
```haskell
|
||
|
{-# LANGUAGE OverloadedStrings #-}
|
||
|
module Main where
|
||
|
```
|
||
|
|
||
|
## Imports
|
||
|
|
||
|
In order to accomplish our task, we need to import some libraries.
|
||
|
|
||
|
```haskell
|
||
|
import Data.String.Conv (toS)
|
||
|
import Network.Wai
|
||
|
import Network.HTTP.Simple
|
||
|
import Network.HTTP.Types
|
||
|
import Network.Wai.Handler.Warp (run)
|
||
|
import System.Environment (lookupEnv)
|
||
|
import Text.Pandoc
|
||
|
import Text.Pandoc.Error (PandocError, handleError)
|
||
|
```
|
||
|
|
||
|
## Helper Functions
|
||
|
|
||
|
getEnvDefault queries an environment variable, returning a default value if it
|
||
|
is unset.
|
||
|
|
||
|
```haskell
|
||
|
getEnvDefault :: String -> String -> IO String
|
||
|
getEnvDefault name default' = do
|
||
|
envvar <- lookupEnv name
|
||
|
case envvar of
|
||
|
Nothing -> pure default'
|
||
|
Just x -> pure x
|
||
|
```
|
||
|
|
||
|
---
|
||
|
|
||
|
htmlToMarkdown uses Pandoc to convert a HTML input string into the equivalent
|
||
|
Markdown. The `Either` type is used here in place of raising an exception.
|
||
|
|
||
|
```haskell
|
||
|
htmlToMarkdown :: String -> Either PandocError String
|
||
|
htmlToMarkdown inp = do
|
||
|
let
|
||
|
corpus = readHtml def inp
|
||
|
|
||
|
case corpus of
|
||
|
Left x -> Left x
|
||
|
Right x -> pure $ writeMarkdown def x
|
||
|
```
|
||
|
|
||
|
## Web Application
|
||
|
|
||
|
Now we are getting into the meat of the situation. This is the main
|
||
|
[Application][wai-application].
|
||
|
|
||
|
```haskell
|
||
|
toMarkdown :: Application
|
||
|
```
|
||
|
|
||
|
First, let's use a [guard][guards] to ensure that we are only accepting `POST`
|
||
|
requests. If the request is not a `POST` request, return [HTTP error code 405][http-4xx].
|
||
|
|
||
|
```haskell
|
||
|
toMarkdown req respond
|
||
|
| requestMethod req /= methodPost =
|
||
|
respond $ responseLBS
|
||
|
status405
|
||
|
[("Content-Type", "text/plain")]
|
||
|
"Not allowed"
|
||
|
```
|
||
|
|
||
|
Otherwise, this is a `POST` request, so we should:
|
||
|
|
||
|
1. Unpack the data from the post body of the HTTP request
|
||
|
2. Send the data to the Sinatra app for conversion from Textile to HTML
|
||
|
3. Take the resulting HTML and feed it to `htmlToMarkdown`
|
||
|
4. Respond with the resulting Markdown.
|
||
|
|
||
|
We use [http-conduit][http-conduit] to contact the Sinatra app.
|
||
|
|
||
|
```haskell
|
||
|
| otherwise = do
|
||
|
body <- requestBody req
|
||
|
targetHost <- getEnvDefault "TARGET_SERVER" "http://127.0.0.1:9292"
|
||
|
remoteRequest' <- parseRequest ("POST " ++ targetHost ++ "/textile/html")
|
||
|
```
|
||
|
|
||
|
The `($)` operator is a synonym for calling functions. It is defined in the [Prelude][dolla]
|
||
|
as `f $ x = f x` and is mainly used for omitting parentheses. Here it is used
|
||
|
to combine HTTP request settings into one big request.
|
||
|
|
||
|
Additionally we use a custom [Manager][manager] to avoid any issues with
|
||
|
request timeouts, as those are not important for the scope of this tool.
|
||
|
|
||
|
```haskell
|
||
|
let settings = defaultManagerSettings { managerResponseTimeout = Nothing }
|
||
|
manager <- newManager settings
|
||
|
|
||
|
let remoteRequest = setRequestBodyLBS (toS body)
|
||
|
$ setRequestManager manager
|
||
|
$ remoteRequest'
|
||
|
```
|
||
|
|
||
|
Now it is time to send off the request and unpack the response.
|
||
|
|
||
|
```haskell
|
||
|
response <- httpLBS remoteRequest
|
||
|
```
|
||
|
|
||
|
If the sinatra app failed to deal with this properly for some reason, report
|
||
|
its error as `text/plain` and return `400`.
|
||
|
|
||
|
```haskell
|
||
|
if getResponseStatusCode response /= 200
|
||
|
then respond $ responseLBS
|
||
|
status400
|
||
|
[("Content-Type", "text/plain")]
|
||
|
$ toS $ getResponseBody response
|
||
|
else do
|
||
|
let rbody = toS $ getResponseBody response
|
||
|
```
|
||
|
|
||
|
Convert the result body into Markdown. If there is an error, respond with a `400`
|
||
|
and the contents of that error.
|
||
|
|
||
|
```haskell
|
||
|
let mbody = htmlToMarkdown rbody
|
||
|
|
||
|
case mbody of
|
||
|
Left x ->
|
||
|
respond $ responseLBS
|
||
|
status400
|
||
|
[("Content-Type", "text/plain")]
|
||
|
$ toS $ show x
|
||
|
Right x -> do
|
||
|
respond $ responseLBS
|
||
|
status200
|
||
|
[("Content-Type", "text/markdown")]
|
||
|
$ toS x
|
||
|
```
|
||
|
|
||
|
Now we bootstrap it all by running the `toMarkdown` Application on port `5000`.
|
||
|
No other code is needed.
|
||
|
|
||
|
```haskell
|
||
|
main :: IO ()
|
||
|
main =
|
||
|
run 5000 toMarkdown
|
||
|
```
|
||
|
|
||
|
[hs-ovs]: https://ocharles.org.uk/blog/posts/2014-12-17-overloaded-strings.html
|
||
|
[wai-application]: https://hackage.haskell.org/package/wai
|
||
|
[guards]: https://en.wikibooks.org/wiki/Haskell/Control_structures
|
||
|
[http-4xx]: https://en.wikipedia.org/wiki/List_of_HTTP_status_codes#4xx_Client_Error
|
||
|
[http-conduit]: https://www.stackage.org/haddock/lts-6.6/http-conduit-2.1.11/Network-HTTP-Simple.html
|
||
|
[dolla]: https://hackage.haskell.org/package/base-4.9.0.0/docs/Prelude.html#v:-36-
|
||
|
[manger]: https://www.stackage.org/haddock/lts-6.5/http-client-0.4.29/Network-HTTP-Client.html#g:3
|