Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve agglutinated forms control #326

Open
matgille opened this issue Jul 30, 2024 · 0 comments
Open

Improve agglutinated forms control #326

matgille opened this issue Jul 30, 2024 · 0 comments

Comments

@matgille
Copy link
Contributor

matgille commented Jul 30, 2024

Is your feature request related to a problem? Please describe.
In case of a contraction or agglutination, the control lists won't be used properly and the lemmas (and pos, but the number of possible combinations is much lower) will always be marked as unauthorized.

Example:

The form aunquel is the contraction of lemmas aunque and el. In our project it will be tagged as aunque+el. Even if both lemmas are in the list, an error will be raised, because aunque+el is not in the control list.

Describe the solution you'd like
As the delimiter for contractions is always the same, it should be possible for the engine to split the analysis using the delimiter. It would require for the user to add the delimiter information somewhere (in the control list panel I would say).

In the above example, aunque+el would be analyzed as two lemmas: aunque and el, each of them being compared to the control list. An error would be raised only if one of the lemmas are not in the list. A warning would tell the user that the analysis is wrong (and could indicate which lemma/POS is not in the control list)

@matgille matgille changed the title Improve contracted forms control Improve agglutinated forms control Jul 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant