-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor to use Pytorch for training models #202
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
percevalw
force-pushed
the
core-refacto
branch
2 times, most recently
from
July 28, 2023 20:44
a5a3c48
to
1516017
Compare
percevalw
force-pushed
the
core-refacto
branch
4 times, most recently
from
August 8, 2023 22:55
08699ae
to
b66eea1
Compare
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #202 +/- ##
==========================================
+ Coverage 94.76% 96.58% +1.81%
==========================================
Files 233 254 +21
Lines 6099 8356 +2257
==========================================
+ Hits 5780 8071 +2291
+ Misses 319 285 -34 ☔ View full report in Codecov by Sentry. |
percevalw
force-pushed
the
core-refacto
branch
2 times, most recently
from
August 9, 2023 12:56
62c7fbc
to
d06dde6
Compare
percevalw
force-pushed
the
core-refacto
branch
4 times, most recently
from
August 25, 2023 22:55
440779e
to
a17230e
Compare
percevalw
force-pushed
the
core-refacto
branch
from
September 29, 2023 08:56
a17230e
to
4e5f2ed
Compare
percevalw
force-pushed
the
core-refacto
branch
8 times, most recently
from
October 16, 2023 17:26
8d5a3d7
to
d17e677
Compare
percevalw
force-pushed
the
core-refacto
branch
4 times, most recently
from
October 26, 2023 15:05
7aa37ef
to
a6b7e0b
Compare
…cuda & quantization + faster transfer pipeline via tmp
…) + smoother multiprocessing stopping
percevalw
force-pushed
the
core-refacto
branch
from
December 4, 2023 09:17
4a37752
to
df2bf0a
Compare
percevalw
force-pushed
the
core-refacto
branch
from
December 4, 2023 09:49
df2bf0a
to
3ec32ab
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR refactors EDS-NLP to allow training models and performing inference using PyTorch as the deep-learning backend. Rather than a mere wrapper of Pytorch using spaCy, this is a new framework to build hybrid multi-task models.
To achieve this, instead of patching spaCy's pipeline, a new pipeline was implemented in a similar fashion to aphp/edspdf#12. The new pipeline tries to preserve the existing API, especially for non-machine learning uses such as rule-based components. This means that users can continue to use the library in the same way as before (
spacy.blank('xx')
,nlp.add_pipe(...)
), while also having the option to train models using PyTorch. We still use spaCy data structures such as Doc and Span to represent the texts and their annotations.It should be noted that this is a work-in-progress and will require further testing before it can be released. We should maybe release it under alpha version number ? Once testing is complete, the new version will be released as a stable version.
Core changes / new features:
confit
package to instantiate components (soon to be published)Language.factory
->edsnlp.registry.factory.register
(confit
registry)spacy.Language.__init__
) to avoid having to wrap everyimport torch
statement for pure rule-based use cases. Hence, torch is not a required dependencytests/training/
)eds.ner
eds.transformer
(to be used in place ofspacy-transformer
)eds.text_cnn
embedding contextualizerChecklist
confit
eds.ner
(from_ents, from_span_groups)eds.transformer
,eds.text_cnn
)