Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Versatile Sentence Boundary Detection #460

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

lecoqnicolas
Copy link

  1. Created property sdb_package, with values either pointing to SDB subdirs (stanza/language-specific spacy) or None, in package
  2. Created class StanzaSentencizer, rewrote SpacySentencizerSmall and base-class to be package-dependant and load from previously described property, or if None, from cache, in sbd
  3. Fixed spacy automatic download: created function cache_spacy in networking
  4. To avoid circular import, called the classes upon initializing the sentencizer in translate,
  5. Fixed byte-fallback bug whilst kept active underscores rewritten as spaces (commented legacy code) in tokenizer
  6. Commented all legacy code relative to "stanza_available" environment variable in settings, package and translate.
  7. Swapped a few lines and edited spelling things here and there for consistency.

Fixed spacy automatic download
Fixed byte-fallback bug in tokenizer
The while True is used in Locomotive to download Stanza SBD, I tried doing wothout it, and it works too (may fail more, but not so often)
@lecoqnicolas
Copy link
Author

Hello,
If you deem necessary to clean the legacy code or revert a feature before merging, please tell me next week, I'll do it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants