if you find javascript library for Thai word segmentation in production. I strongly recommend wordcut This repository is use for describe how Thai word segmentation work.
This work is base on document of wordcut that you can found on meduim (Thai language)
this work is use Dictionary base you must have some Thai wordlist. you can found some Thai wordlist from
convert wordlist from step 1 into trie to increase speed of searching. read more about trie: Wikipedia - Trie Note: This step is difference from wordcut, it using Binary search
Wordgraph is graph. use to determine position to word Segmentation where vertex is position to segmentation and Edge is word. create edge by compare input with trie.
Find shortest path from start vertex to end vertex by using SPFA read more about SPFA: Wikipedia - SPFA
use shortest path from step 4 to segmentation sentense and convert to array
CutThai isn't recommend to use in production. but you can download lastest release from Releases
by using Node.js or CommonJS
var CutThai = require("cutthai")
by using normal browser
<script src="path/to/cutthai.min.js"></script>
run some segmentation
var cutthai = new CutThai(function(err){
if(err){
throw err;
}
console.log(cutthai.cut("ฉันกินข้าว"));
});
wordcut - for Algorithm to Thai word segmentaion LibThai - for Thai word dictionary
Note: This document isn't complete yet. need to improve gramma add more picture to describe Algorithm. add more instruction to build.