-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial integration of NER into Spring batch #24
Comments
Regarding 3, it was already implemented ( |
In This has problem in situation when one patient has many documents. Also, A way to reduce I/O could be
The rest seems to work quite well. |
But I don't think First, it may delete Second, it may tamper with retry logic in Spring batch. I think a better place is to do such clean up is in a listener after 2nd pass (inherently we need to keep track of how much doc per patient and count how many have been processed / written)? |
Thanks @mbelousov , I don't have further comments. One last long term question: |
@hkkenneth, we have committed changes to fix the logic and for issue #9. Would be awesome if you could have a look into workflow and add job listener to clean output folders. Feel free to create a separate issue for that, but if you have any concerns regarding the logic of the current workflow let's discuss it here. Thanks! |
Not urgent - for the next version we may take a look whether duplicate entries in the |
Hey guys,
Please have a look into
feature/ner-integration
branch. I have integrated NER (fromfeature/NamedEntityExtractor
) into ourdevelopment
branch and implemented majority of stuff that we have discussed during a call.I've decided to keep separate data structure for annotated document (so renamed GATEDocument into AnnotatedDocument that has gate.Document as a property). Ideally it should extend gate.Document rather than have it as a property.
It turned out that we need to have two files per patient (both .lst and .def), but if you know how to minimise the number of files please let me know.
Currently it writes gateDoc XML into result txt files, so what we need to do is to implement getScrubbedContent properly for AnnotatedDocument.
There are bunch of stuff that we could try to optimise in order to keep I/O operations to minimum, I have ignored this for now, but feel free to contribute.
Before we can merge it into development, let's test it and discuss all issues that are left.
The text was updated successfully, but these errors were encountered: