-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No predictions in inference. #20
Comments
Hello, Sorry for my late reply. The OCR I used is an private engine of IntSig, which is for in-house development only. The public version of the API is TextIn and it is not free. PaddleOCR is a good choice. To use other APIs, some modification is required. Here's how the OCR pipeline works: The following code calls the OCR API and obtain the result. You may refer to the API documents you used and modify the request operations. In my codes, the OCR API requires the image bytes and returns the result in json formate, so the ViBERTgrid-PyTorch/deployment/inference_preporcessing.py Lines 116 to 136 in 50e1ca9
The OCR raw result will then be passed to the result parsing function, which is shown in You may check your codes to see whether the parsing function generates the correct result or not. If you have any further questions, please feel free to contact me. |
Hi, Thanks for the replay. Hence the return function from the wrapper will be in the below format.
But the results are still empty. I am not able to understand where am failing. |
Hi, Something seems weird in your returned data. The In your example, the text string ViBERTgrid-PyTorch/deployment/inference_preporcessing.py Lines 159 to 179 in 50e1ca9
I think that you can modify your wrapper, let it only return the |
Sorry, Actually the shared data is just a part of the actual data. I am sharing the full version here.
|
The returns of your wrapper seems correct this time. Now we should try to figure out which part of the pipeline failed, and I need more information. Try to add some breakpoints in the model's inference function to see whether it returns something or not. If debugger doesn't work in a flask service, just remove the flask part and run the model's inference function seperately. |
Hi, Sorry for the delay in response.
The rest of the hyperparameters are unchanged.
Here I got a nice F1 score (~0.91). But most of the time outputs from test images were empty only. Your model implementation seems to be very promising. I guess there are some issues in the pipeline. Thanks, |
Hello, It seems that you set the classifier to "full" mode, which has the same architecture described in the paper. I also found that this setting gives poor results, as I have mentioned in section Thus, I implemented a simplified classifier, which directly apply multi-category classification using a single linear layer, and it gives good results. Just change the The For the empty output problem during inference, there might be several reasons. Could you please share me your inference configuration file? I will also try to re-train the model and debug the pipeline these days. |
Hello~ A pre-train weight is available at google drive, try to load it using the |
I am attaching my config file here. Please let me know your comments.
|
I tried this model. The results seem fine (Checked qualitatively). I can extract data(not all the fields, FN still exists). |
Hello~ Sorry for my delayed response. I found that the |
In my experiments, I also found that the model gives poor results on the |
Hi, The config file is configured for training. While doing inference, I have given the checkpoint abs dir in |
Hello, Does the model give results in inference mode when using the pretrained weights I provide? |
Yes, It does. But like you said, It gives poor performance for numeric fields like |
I think that the configuration mismatch may cause the error in your case. What kind of architecture did you use when traning the model, The poor results in total field is the shortcoming of ViBERTgrid model and hard to optimize, since it uses a grid encoding, which may confuse neighboring features. Using a weighted loss that gives more attention to the total field may slightly imporve the performance, but still far from satisfaction. I assume that using a larger size of input image may help, but it will take up more time during training and inference. |
I have trained the CORD dataset as per the "example.yaml" file. F1 scores seem to be excellent (with the CRF network).
But when I was trying to create the predictions, It was not predicting anything.
Can you provide an example of OCR API? Currently, I am using a custom Paddleocr flask server to get the OCR results and I convert the outputs to the required format that you have mentioned in the script.
If possible please share the OCR script. or the exact format that the module needs.
The text was updated successfully, but these errors were encountered: