site/blog/parsing-toki-pona-2019-07-2...

1.2 KiB

title date
Parsing Toki Pona 2019-07-21

Parsing Toki Pona

Language is annoyingly complicated. English in particular is a nightmare. English is so hard to understand that humans regularly fail to figure out what other humans are saying in it. Even if they are native speakers of English, which usually have a bit of an easier time figuring this stuff out.

What if there was a language that had less going on? What if it was simple enough that we could have a computer tokenize, parse and understand it? This post is an attempt to show that Toki Pona is a potential candidate for this.

Toki Pona is a constructed/planned language created by the professional translator Sonja Lang as an attempt to try to break things down to their core essence. Toki Pona is tiny (only about 120 words depending on who you ask), requiring only a few days to learn and a month or two to master. Because there are so few words, many ideas or concepts that normally span multiple words in languages like English are represented in only one Toki Pona word.

  • basic grammar
  • tokenization
    • implementation in Nim
  • talk about future parsing into phrases
    • structure of phrase
    • implementation in Nim