-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
About SROIE annotations #21
Comments
@furkanpala are u able to achieve good performance in any std dataset? |
Yes, I got good results on SROIE dataset and on my in-house datasets. However, I used the matching strategy that is mentioned in readme, thus there is 3-5% drop in scores. |
Can you share your SROIE dataset and config file? |
This is a worth noticing problem. We find that there might be multiple occurrences of "date" and "total" entities in a single receipt. For the "date" entities, we annotate the first matching string in reading order as the key information, while others are annotated as background. For the "total" entities, we pick the most likely one, which usually has a larger/bolded font and has a key containing string "total" nearby. |
Unfortunately, I am not able to share the dataset right now |
Hi again, I would like to follow up SROIE annotations if you do not mind. I wonder whether you used the preprocessed data for labelling the words or you OCRed the images from scratch. More precisely, we have the boxes, keys and images from original SROIE. Then, we can either use your |
We used an in-house OCR engine to label the dataset from scratch. |
Sorry for follow-ups. In readme file, you mentioned that the matcing key information fields with OCR output causes |
Hello, sorry for my late reply. I re-launch the experiments using the dataset which is pre-processed by the matching strategy, and I got a token-level F1 of 97 and an entity-level F1 of 60. The result "3~5 point decrease" comes from earlier experiments using codes of previous version, and it seems that the metric computation strategy is wrong in the old codes. I will change the description in readme and provide a model weight trained using my re-labelled data (which has an entity-level F1 of 96+). |
Hi, Thanks for the reply. |
Hi,
You mentioned that you have annotated SROIE dataset to be able to use it effectively with ViBERTgrid. While annotating, what did you do with multiple occurring tokens? For example date label, there are receipts in which there are multiple occurrences of the same date. Have you annotated all of them as date or only one? Thanks.
The text was updated successfully, but these errors were encountered: