Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding new parser: MultiChoicesParser #133

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

HichemAK
Copy link

Hello,

I added a new parser called multi-choices parser. I couldn't find any efficient implementation of an incremental parser for this type of grammar in Python (even in packages specialized in grammar parsing) so I coded one myself here, which is presented in what follows.

Feature added: MultiChoicesParser (new parser)

Multi-choices Parser is an pure-Python efficient incremental parser for multi-choices grammars. These grammars are composed of lists of choices, where each choice is a literal string and can possibly be empty (grammar form below). This parser is optimized for scenarios where the size of the lists of choices is very large, such as representing entities preceded by a determiner (which is specifically my usecase).

Here is the type of grammar handled by this parser:

start: list1 list2 ... listn
list1: choice1_1 | choice1_2 | ... | choice1_k1
list2: choice2_1 | choice2_2 | ... | choice2_k2
...
listm: choicem_1 | choicem_2 | ... | choicem_km

Example:

start: det noun
det: "the" | "a" | "an" | ""
noun: a large list of entities (tested on up to millions)

Implementation

I created a parser class named MultiChoicesParser which inherits from the CharacterLevelParser class and I implemented the abstract methods + cache_key method, which was intuitive and easy to follow.

I couldn't find where dependencies were handled for your package, is it possible to add this dependency https://github.com/HichemAK/multi-choices-parser to the project (through a "pip install multi-choices-parser" command)?

Tests

  • I included tests (copied from regexparser tests then adapted to multi choices parser)
  • test_negative_matching was not included since it is not supported by multi choices parser.
  • test_increasing_alphabet passed but not sure of the implementation.
  • All other tests passed.

Best regards,
Hichem Ammar Khodja

- Including tests (copied from regexparser tests then adapted to multi choices parser)
    - test_negative_matching not included since it is not supported by multi choices parser.
    -  test_increasing_alphabet passed but not sure of the implementation.
    - All other tests passed.
@noamgat
Copy link
Owner

noamgat commented Sep 3, 2024

Thank you for this contribution!
The tests look good, and the implementation looks clean enough. My only request before merging, is could you please separate this into a separate file (not characterlevelparser.py) - the parsers there are very basic and are used in composite patterns by the other parsers, and this is a standalone class (like Json / Regex).

@HichemAK
Copy link
Author

HichemAK commented Sep 4, 2024

Yes sure !

I'll comeback to you in approximately two weeks when I'll have time to work on this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants