forked from cadey/xesite
Compare commits
1 Commits
main
...
blog/parsi
Author | SHA1 | Date |
---|---|---|
Cadey Ratio | c7eaaa8324 |
|
@ -0,0 +1,31 @@
|
||||||
|
---
|
||||||
|
title: Parsing Toki Pona
|
||||||
|
date: 2019-07-21
|
||||||
|
---
|
||||||
|
|
||||||
|
# Parsing Toki Pona
|
||||||
|
|
||||||
|
Language is annoyingly complicated. English in particular is a nightmare. English
|
||||||
|
is so hard to understand that humans regularly fail to figure out what other
|
||||||
|
humans are saying in it. Even if they are native speakers of English, which
|
||||||
|
usually have a bit of an easier time figuring this stuff out.
|
||||||
|
|
||||||
|
What if there was a language that had less going on? What if it was simple enough
|
||||||
|
that we could have a _computer_ tokenize, parse and understand it? This post is
|
||||||
|
an attempt to show that [Toki Pona](http://tokipona.org) is a potential candidate
|
||||||
|
for this.
|
||||||
|
|
||||||
|
Toki Pona is a constructed/planned language created by the professional translator
|
||||||
|
Sonja Lang as an attempt to try to break things down to their core essence. Toki
|
||||||
|
Pona is tiny (only about 120 words depending on who you ask), requiring only a
|
||||||
|
few days to learn and a month or two to master. Because there are so few words,
|
||||||
|
many ideas or concepts that normally span multiple words in languages like
|
||||||
|
English are represented in only one Toki Pona word.
|
||||||
|
|
||||||
|
- basic grammar
|
||||||
|
- tokenization
|
||||||
|
- implementation in Nim
|
||||||
|
-
|
||||||
|
- talk about future parsing into phrases
|
||||||
|
- structure of phrase
|
||||||
|
- implementation in Nim
|
Loading…
Reference in New Issue