Skip to content

cantonese/segmenter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

@cantonese/segmenter

This library implements basic grapheme and word segmentation for Cantonese by comparing a depth-first trie traversal of a word list to a supplied string. The trie is built from an unmodified words.hk word list.

In the future it will use different models informed by natural language processing/computational linguistics.

Implements the proposed Intl.Segmenter API shape.

Installation

npm install --save https://github.com/cantonese/segmenter

Usage

import { Segmenter } from '@cantonese/segmenter';

function transform(segmentInfo) {
  return segmentInfo.segment.reverse();
}

var mySegmenter = new Segmenter('zh-hk', { granularity: 'word' });
var mySegments = mySegmenter.segment('我好鍾意食飯');
var transformed = [...mySegments].map(transform);

var atIndex = mySegments.contains(2);

About

Text segmentation for Cantonese.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published