You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Cutlet has been out for a few years now, and while I consider it basically functionally complete, the API is a little awkward as it's evolved over time. Since it's stable, I'd also like to release a v1.0 to indicate the API is reliable in the future. This issue is for my proposal and also to solicit feedback.
This is not a full API proposal - most of the evolution will be iterative and minor, like cleaning up which functions are public vs private. The main thing I want to do is make treatment of the different output options a little more clear. To that end I propose that the Cutlet object has the following main public methods of interest:
__callable__ / to_doc: returns a CutletDoc (see below)
to_romaji: returns a human legible string, like romaji now
to_slug: returns a machine-friendly string, like slug now
to_nodes: returns a list of nodes, like romaji_tokens now
A CutletDoc is inspired by a spaCy Doc object and contains:
raw input text
normalized input text
romaji/slug/nodes (lazily available, where appropriate)
a reference to the generating Cutlet object (so you can check config options)
The CutletDoc object has a few advantages. One is that if you need two of the above output formats, it allows you to avoid duplicate computation (MeCab calls) without having to manage state yourself. The other is that it can codify linking MeCab tokens to romaji tokens. The linking is very simple, but it's a commonly requested feature (#34, #37, #40, etc.), and (partly due to lack of examples on my part) users often find it confusing, so it would be good to provide a canonical process.
Separately, I will try making RomajiTokens proxy classes for MeCab tokens. I think this will work without issue, but it's possible that MeCab Nodes being Cython objects will be a problem.
While the API will change, the actual internal code will not change very much as part of this process. At the fastest this will take a few months, and a new version with DeprecationWarnings will be released. If you have a stable application and are happy with the current API, please be sure to use version guards.
The text was updated successfully, but these errors were encountered:
Cutlet has been out for a few years now, and while I consider it basically functionally complete, the API is a little awkward as it's evolved over time. Since it's stable, I'd also like to release a v1.0 to indicate the API is reliable in the future. This issue is for my proposal and also to solicit feedback.
This is not a full API proposal - most of the evolution will be iterative and minor, like cleaning up which functions are public vs private. The main thing I want to do is make treatment of the different output options a little more clear. To that end I propose that the
Cutlet
object has the following main public methods of interest:__callable__
/to_doc
: returns aCutletDoc
(see below)to_romaji
: returns a human legible string, likeromaji
nowto_slug
: returns a machine-friendly string, likeslug
nowto_nodes
: returns a list of nodes, likeromaji_tokens
nowA
CutletDoc
is inspired by a spaCy Doc object and contains:The CutletDoc object has a few advantages. One is that if you need two of the above output formats, it allows you to avoid duplicate computation (MeCab calls) without having to manage state yourself. The other is that it can codify linking MeCab tokens to romaji tokens. The linking is very simple, but it's a commonly requested feature (#34, #37, #40, etc.), and (partly due to lack of examples on my part) users often find it confusing, so it would be good to provide a canonical process.
Separately, I will try making RomajiTokens proxy classes for MeCab tokens. I think this will work without issue, but it's possible that MeCab Nodes being Cython objects will be a problem.
While the API will change, the actual internal code will not change very much as part of this process. At the fastest this will take a few months, and a new version with DeprecationWarnings will be released. If you have a stable application and are happy with the current API, please be sure to use version guards.
The text was updated successfully, but these errors were encountered: