Skip to content

kb-labb/post-ocr-correction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Post OCR Correction

This code was used to train a new post-OCR correction model for Swedish that can be downloaded here https://huggingface.co/KBLab/swedish-ocr-correction

The model and implementations are based on Post-OCR Correction of Digitized Swedish Newspapers with ByT5 whose original model can be downloaded here.

Data

The data used to train the model is described in A Two-OCR Engine Method for Digitized Swedish Newspapers and is partially available via Språkbanken Text. The more recent annotated newspapers are not publicly available due to copyright restrictions.

Results

Model CER WER
Original OCR 3.01 13.23
viklofg 1.92 7.41
KBLab 1.57 6.23

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages