Author Topic: [Haskell] A conversion tool between markdown and BBCode, modified  (Read 405 times)

0 Members and 4 Guests are viewing this topic.

Offline TheWormKill

  • EZ's Scripting Whore
  • Global Moderator
  • Knight
  • *
  • Posts: 257
  • Cookies: 66
  • The Grim Reaper of Worms
    • View Profile
This is a nifty tool that does what the title describes.
Only problem: it does not compile properly with newer versions of pandoc, which it uses internally.
Thus, I modified it to account for the changes; installation instructions can be found in the original
content. To those who didn't get it: It is NOT ONLY my work, I merely fixed a few lines and added some
functionality, since it might be useful if you prefer Markdown to BBCode (like I do), but have to use it
here and/or somewhere else.

EDIT: I also changed some of the core functionality to make it play nicely with Evilzone's BBS (SMF).

EDIT2: Added language-recognition. If your code block contains contains information about the language
the code is written in, it will be included in the BBCode output.
And I documented the code properly in case someone wants to extend it.

EDIT3: Added quote-author-recognition. Block quotes like this:
Code: [Select]
> @TheWormKill
> Stuff he ^ said...

get converted to
Quote from: TheWormKill
Stuff he ^ said
Code: (Haskell) [Select]
{-# LANGUAGE DeriveDataTypeable #-}
{-# OPTIONS_GHC -Wall -Werror #-}

import Text.Pandoc
import Text.Pandoc.Error
import Control.Exception
import Data.Typeable

data ImplementationException = MkImpExc String
    deriving (Typeable)

instance Show ImplementationException where
    show (MkImpExc s) = "ImplementationException: " ++ s

instance Exception ImplementationException

-- heading levels associated with fontsizes
headingToFontsize :: [(Int, [Char])]
headingToFontsize = [(1, "36pt"), (2, "24pt"), (3, "18pt"),
                     (4, "14pt"), (5, "12pt"), (6, "10pt")]

-- get a pandoc-specific representation of our markdown source (or an error) and handle it
writeBbcode :: WriterOptions -> Either PandocError Pandoc -> String
writeBbcode _ (Right (Pandoc _ blocks)) = formatBlocks blocks
writeBbcode _ _ = error "Pandoc error occured!"

-- wrap a chunk of text in a BBCode-tag
-- wrapInTag "b" "bold text" returns "[b]bold text[/b]"
wrapInTag :: String -> String -> String
wrapInTag tag content =
  "[" ++ tag ++ "]" ++ content ++ "[/" ++ tag ++ "]"

-- same as wrap in tag, but take an additional parameter string:
-- wrapInTagParam "size" "20pt" "large text" returns "[size=20pt]large text[/size]"
wrapInTagParam :: String -> String -> String -> String
wrapInTagParam tag param content =
    "[" ++ tag ++ "=" ++ param ++ "]" ++ content ++ "[/" ++ tag ++ "]"

-- same as wrapInTagParam, but with an additional parameter name:
-- wrapInTagNamedParam "quote" "author" "TheWormKill" "Stuff I said" returns
-- "[quote author=TheWormKill]Stuff I said[/quote]"
wrapInTagNamedParam :: String -> String -> String -> String -> String
wrapInTagNamedParam tag paramName paramVal content =
    "[" ++ tag ++ " " ++ paramName ++ "=" ++ paramVal ++ "]" ++ content ++ "[/" ++ tag ++ "]"

-- reformat blockwise. each block in BBCode is delimited by 2 newlines.
formatBlocks :: [Block] -> String
formatBlocks = doubleUnlines . (map formatBlock)

-- handle possible blocks and convert to BBCode
formatBlock :: Block -> String
formatBlock (Plain ss) = formatInlines ss
formatBlock (Para ss) = (formatInlines ss)
-- check whether language information is supplied with the codeblock, and include it
formatBlock (CodeBlock a string) = case a of
                                     (_, l:_, _) -> wrapInTagParam
                                        "code" l (escapeCode string)
                                     _           -> wrapInTag "code"
                                        (escapeCode string)
formatBlock (RawBlock _ string) =  wrapInTag "code" (escapeCode string)
-- handle blockquotes with author information.
formatBlock (BlockQuote blocks) = case blocks of
                                    (b:bs) -> case getQuoteAuthor b of
                                               Just (name, b') -> wrapInTagNamedParam
                                                 "quote" "author" name (formatBlocks (b':bs))
                                               Nothing   -> wrapInTag "quote" (formatBlocks blocks)
                                    []    -> error "Empty block quote!"
formatBlock (OrderedList attr bss) = formatList (Just attr) bss
formatBlock (BulletList bss) = formatList Nothing bss
formatBlock (DefinitionList defs) = formatDefinitionList defs
-- use a sane font size, as the forum's form does and default to 8pt
-- in case that the level goes beyond 6
formatBlock (Header l _ ss) = wrapInTagParam "size"
    (maybe "8pt" id $ lookup l headingToFontsize) (formatInlines ss)
formatBlock (HorizontalRule) = "
[hr]"
formatBlock (Table _ _ _ headerRow rows) = formatTable headerRow rows
formatBlock (Null) = ""
-- uncomment this line for pandoc < 1.12
formatBlock (Div _ bs) = formatBlocks bs

-- get the author and the quote *itself* from a block if first inline is a Cite
getQuoteAuthor :: Block -> Maybe (String, Block)
getQuoteAuthor (Plain ((Cite (c:_) _):is)) = Just $ (citationId c, Plain is)
getQuoteAuthor (Para ((Cite (c:_) _):is)) = Just $ (citationId c, Para is)
getQuoteAuthor _ = Nothing

-- handle all inline elements from a block
formatInlines :: [Inline] -> String
formatInlines = concat . (map formatInline)

-- handle possible inline elements
formatInline :: Inline -> String
formatInline (Str s) = s
formatInline (Emph ss) = wrapInTag "i" (formatInlines ss)
formatInline (Strong ss) = wrapInTag "b" (formatInlines ss)
formatInline (Strikeout ss) = wrapInTag "s" (formatInlines ss)
formatInline (Superscript ss) = wrapInTag "sup" (formatInlines ss)
formatInline (Subscript ss) = wrapInTag "sub" (formatInlines ss)
formatInline (SmallCaps _) = throw (MkImpExc "small caps are not (yet?) supported")
formatInline (Quoted SingleQuote _) = throw (MkImpExc "single quotes are not (yet?) supported")
--"'" ++ (formatInlines ss) ++ "'"
formatInline (Quoted DoubleQuote _) = throw (MkImpExc "double quotes are not (yet?) supported")
--"\"" ++ (formatInlines ss) ++ "\""
formatInline (Cite _ l) = case l of
                            (Str s):_ -> s
                            _         -> ""
formatInline (Code _ s) = wrapInTag "b" s
formatInline (Space) = " "
formatInline (LineBreak) = "\n"
formatInline (Math _ s) = wrapInTag "math" s
formatInline (RawInline _ s) = wrapInTag "b" s
formatInline (Link ss (url, _)) = "[url=" ++ url ++ "]" ++ (formatInlines ss) ++ "[/url]"
formatInline (Image alt (url, _)) = "[img=" ++ (formatInlines alt) ++ "]" ++ url ++ "[/img]"
formatInline (Note _) = throw (MkImpExc "notes are not (yet?) supported")
-- uncomment this line for pandoc < 1.12
formatInline (Span _ ss) = formatInlines ss

-- obv useless, maybe the original author intended to make it escape sth?
escapeCode :: String -> String
escapeCode = id

-- formatters for lists, tables
formatList :: (Maybe ListAttributes) -> [[Block]] -> String
formatList attr items =
  let opener =
          case attr of Just (_, LowerAlpha, _) -> "[list=a]\n"
                       Just (_, UpperAlpha, _) -> "[list=a]\n"
                       Just _ -> "[list=1]\n"
                       Nothing -> "[list]\n"
   in opener ++
    (unlines $ map formatListItem items) ++
    "[/list]"

formatListItem :: [Block] -> String
formatListItem bs = "[*]" ++ (formatBlocks bs)

formatDefinitionList :: [([Inline], [[Block]])] -> String
formatDefinitionList defs =
  "[list]\n" ++
  (doubleUnlines $ map formatDefinition defs) ++
  "[/list]"

formatDefinition :: ([Inline], [[Block]]) -> String
formatDefinition (header, body) =
  "[*]" ++ ((wrapInTag "b" $
      wrapInTag "u" (formatInlines header)) ++ " " ++ (unlines $ map formatBlocks body))

formatTable :: [TableCell] -> [[TableCell]] -> String
formatTable headerRow rows =
  wrapInTag "table" $
  (formatHeaderRow headerRow) ++
  "\n" ++
  (formatBodyRows rows)

formatHeaderRow :: [TableCell] -> String
formatHeaderRow hcs = wrapInTag "tr" (concat $ map formatHeaderCell hcs)

formatHeaderCell :: TableCell -> String
formatHeaderCell hc = wrapInTag "th" (formatBlocks hc)

formatBodyRows :: [[TableCell]] -> String
formatBodyRows = unlines . (map formatBodyRow)

formatBodyRow :: [TableCell] -> String
formatBodyRow hcs = wrapInTag "tr" (concat $ map formatBodyCell hcs)

formatBodyCell :: TableCell -> String
formatBodyCell bc = wrapInTag "td" (formatBlocks bc)

-- join blocks into one string with 2 newlines
doubleUnlines :: [String] -> String
doubleUnlines [] = ""
doubleUnlines (s:[]) = s
doubleUnlines (s:ss) = s ++ "\n\n" ++ (doubleUnlines ss)

-- get markdown from stdin, process it, and print the results to stdout
main :: IO ()
main =
  getContents >>= \ input ->
  let
    readerOpts = def { readerParseRaw = False }
    pandoc = readMarkdown readerOpts input
    output = writeBbcode def pandoc
  in
    putStrLn output

« Last Edit: August 10, 2015, 06:07:50 pm by TheWormKill »
Stuff I did: How to think like a superuser, Iridium

He should make that "Haskell"
Quote
<m0rph-is-gay> fuck you thewormkill you python coding mother fucker

Offline flowjob

  • Knight
  • **
  • Posts: 327
  • Cookies: 46
  • Pastafarian
    • View Profile
It's always nice to see people updating abandoned projects so they can be used again

Also, I think you forgot to post the link to the original work..
And if the author's using a revision system, you could fork the project and send a merge-request upstream. That way all other users of this script have access to the new working version too..
Quote
<phil> I'm gonna DDOS the washing machine with clothes packets.
<deviant_sheep> dont use too much soap or youll cause a bubble overflow

Offline TheWormKill

  • EZ's Scripting Whore
  • Global Moderator
  • Knight
  • *
  • Posts: 257
  • Cookies: 66
  • The Grim Reaper of Worms
    • View Profile
It's always nice to see people updating abandoned projects so they can be used again

Also, I think you forgot to post the link to the original work..
And if the author's using a revision system, you could fork the project and send a merge-request upstream. That way all other users of this script have access to the new working version too..
Glad you liked it. Regarding your comments: The very first word of my original post is the link. I just found the original version of the script in the linked source there, which is a forum post, so I don't think the author uses git etc. Here's the source again:
https://bbs.archlinux.org/viewtopic.php?id=168962
Stuff I did: How to think like a superuser, Iridium

He should make that "Haskell"
Quote
<m0rph-is-gay> fuck you thewormkill you python coding mother fucker

Offline Deque

  • P.I.N.N.
  • Global Moderator
  • Overlord
  • *
  • Posts: 1203
  • Cookies: 518
  • Programmer, Malware Analyst
    • View Profile
Re: [Haskell] A conversion tool between markdown and BBCode, modified
« Reply #3 on: August 20, 2015, 09:06:21 am »
Ah, it is so refreshing to see a functional language here.
It's been a while that I programmed in Haskell. I already feel how rusty I am from reading the code. More of that, please.
Also, this is something I find useful too. If I write tutorials for forums in a text editor, I usually have a markdown-like style. It would be much easier to do the formatting if I can just convert it.