Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The charset is wrongly detected #4

Open
wanyancan opened this issue Nov 23, 2018 · 3 comments
Open

The charset is wrongly detected #4

wanyancan opened this issue Nov 23, 2018 · 3 comments

Comments

@wanyancan
Copy link

Hi,

Is there any way to manually set charset for opened files?
If not, how may I change source code of the auto-detection to manual selection?

Thank you!

@HouQiming
Copy link
Owner

HouQiming commented Nov 23, 2018 via email

@wanyancan
Copy link
Author

The file contains some engineering symbols in CP936.
Designator Footprint Mid X Mid Y Ref X Ref Y Pad X Pad Y TB Rotation Comment R1 0402_R 817.716mil -5537.402mil 817.716mil -5537.401mil 829.548mil -5525.57mil T 225.00 10KΩ (1002) ±1%

Ω (A6 B8) and ±(A1 C0) are treated as separated 。(A1) タ(C0) and ヲ(A6) ク(B8).

In Cp932, from A1 to DF they are all single character but can be combined in CP936 as one character.

I'm not sure how the model can be updated. Maybe use two token score with penalty on the consecutive chars in range A1 to DF ?

I believe a manual selection in menu is the most convenient. Can I call ConvertToUTF8(encoding, s) directly?

@HouQiming
Copy link
Owner

HouQiming commented Nov 23, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants