Skip to content

Commit

Permalink
docs: recommend ICU for Traditional Chinese
Browse files Browse the repository at this point in the history
  • Loading branch information
Master-Hash committed Jan 21, 2025
1 parent ef0b17a commit 98af60f
Show file tree
Hide file tree
Showing 4 changed files with 117 additions and 43 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -21,3 +21,4 @@ target/
#.idea/

*~
src/main.rs
152 changes: 112 additions & 40 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -10,11 +10,11 @@ crate-type = ["cdylib"]
[dependencies]
icu_segmenter = { version = "1.5.0", optional = true }
libc = "0.2.169"
windows = { version = "0.58.0", features = [
windows = { version = "0.59.0", features = [
"Data_Text",
"Foundation_Collections",
], optional = true }
itertools = { version = "0.13.0", optional = true }
itertools = { version = "0.14.0", optional = true }

[build-dependencies]
bindgen = "0.71.1"
Expand Down
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,11 +41,12 @@ I have to use unsafe extern "C" all the way to write Rust binding. The safety no

## WinRT API vs ICU

Personally I prefer the result of WinRT API. ICU does much poorer when segmenting idioms, showing a lack of vocabulary:
Personally I recommand WinRT API for Simplified Chinese and ICU for Traditional Chinese.

| WinRT API | ICU |
|-------|-------|
| '有\|异曲同工\|\|妙' | '有异\|\|同工\|\|妙' |
| '有\|\|\|同工\|\|妙' | '有\|異曲同工\|\|妙' |
| '丧心病狂\|\|异想天开' | '丧心病狂\|\|\|\|\|开' |

## Note on UTF-8 Grapheme Cluster
Expand Down

0 comments on commit 98af60f

Please sign in to comment.