All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog.
- Updated to Grobid version 0.8.1
- Docker image snapshots are built and pushed on dockerhub at each commit
- new Dockerfile.local that does not clone from github
- End 2 end evaluation using MeasEVAL (#164)
- Updated to Grobid version 0.8.0
- Updated to Dropwizard version 4.x (from version 1.x)
- Updated training data, removed some leftover callout references that were partially removed
- Updated models and evaluations (available here)
- Fixed and improved the word2number that now supports also fractions and other constructs #176, #110, #91
- Fixed the segmentation issue for the quantified object. Now the spurious characters from PDF documents are removed #158
- Added additional units in the lexicon
- Added missing log when exception are raised
- Introduced Kotlin for new development
- Upgrade to grobid 0.7.3 and support to JDK > 11
- Updated Docker image to support JDK 17 and use the gradle distribution script instead of the JAR directly
- Transitioned from circleci to GitHub actions
- Fix notation lexicon #97
- Fix list and labelled sequence extraction with DL BERT models #153
- Improve recognition of composed units using sentence segmentation #155 #87
- Create holdout set by @lfoppiano in #145
- Add additional DL and transformers models by @lfoppiano in #146
Update to Grobid 0.7.2
- Fix value parser's incorrect recognition by @lfoppiano in #141
- New BidLSTM_CRF models for quantities, values and units parsing #129
- Add docker image on hub.docker.com #142
- Update to Grobid 0.7.1 #137
- Use the grobid sentence segmentation for the quantified object sentence splitting #138
- Fixes incorrect boxes colors #125
- Fixed lexicon #134
- Docker image #128
- Configurable number of parallel request
- Various improvement in the unit normalisation and update of library Unit of measurement to version 2.x #95
- Retrained models with CRF
- Grobid 0.7.0 #123
- Coveralls build #127
- Fixed command line parameters #119
0.6.0 – 2020-04-30
- First official release
- Extraction of quantities, units and values using CRF
- Support for Text and PDF
- Added evaluation measurement and models