Skip to content
/ emt Public

Emacs macOS Tokenizer, tokenizing CJK words with macOS's built-in NLP tokenizer.

License

Notifications You must be signed in to change notification settings

roife/emt

Repository files navigation

emt.el

Introduction

EMT stands for Emacs MacOS Tokenizer.

This package use macOS’s built-in NLP tokenizer to tokenize and operate on CJK words in Emacs.

Installation

Requirements

  • macOS 10.15 or later
  • Emacs 26.1 or later, built with dynamic module support (use --with-modules during compilation)

Install package

Install with straight and use-package:

(use-package emt
  :straight (:host github :repo "roife/emt")
  :hook (after-init . emt-mode))

Build dynamic module

Pre-built (recommendation)

Retrieve the pre-built module from the releases section and place the dylib file in the emacs-macos-tokenizer-lib-path (by default, it is located at modules/libemacsMacOSTokenizer.dylib within your personal configuration folder, normally ~/.emacs.d/modules/libemacsMacOSTokenizer.dylib).

Manually build

  • Install Xcode.
  • Build the module using emt-compile-module, which compiles and copies the module to emt-lib-path.

If you enconter the folloing error,

No such module “PackageDescription”

run the following command and try again:

sudo xcode-select --switch /Applications/Xcode.app/Contents/Developer

Customization

emt-use-cache

Caches for results of tokenization if non-nil. Default is t.

emt-cache-lru-size

The size of LRU cache. Default is 50.

emt-lib-path

The path to the directory of dynamic library for emt. Default is ~/.emacs.d/modules/libEMT.dylib.

Usage

keymap: emt-mode-map

It remaps forward-word, backward-word, kill-word and backward-kill-word to use emt’s version.

Minor mode

It calls emt-ensure, which load dynamic modeuls and set emt-mode-map.

Functions

emt-word-at-point-or-forward

Return the word at point. If current point is at bound of a word, return the one forward.

emt-word-at-point-or-backward

Return the word at point. If current point is at bound of a word, return the one backward.

emt-compiler-module

Compile and copy the module to emt-lib-path.

It takes an optional argument path, which is the path to the directory of dynamic library. By default, path is set to emt-lib-path.

emt-ensure

Load dynamic module.

emt-forward-word

CJK compatible version of forward-word.

emt-backward-word

CJK compatible version of backward-word.

emt-kill-word

CJK compatible version of kill-word.

emt-backward-kill-word

CJK compatible version of backward-kill-word.

emt-mark-word

CJK compatible version of mark-word.

Acknowledgements

This package is inspired by jieba.el which is a Chinese tokenizer for Emacs using jieba.

The dynamic module uses emacs-swift-module, which provides an interface for writing Emacs dynamic modules in Swift.

About

Emacs macOS Tokenizer, tokenizing CJK words with macOS's built-in NLP tokenizer.

Resources

License

Stars

Watchers

Forks

Packages

No packages published