Nougat is an advanced Transformer-based OCR model that simplifies the process of converting complex scientific documents, often stored in PDF format, into a common and machine-readable Markdown format. Developed by a team of experts, Nougat leverages state-of-the-art architecture and training techniques to make scientific knowledge more accessible and usable.
-
Transformer Architecture: Nougat uses a Swin Transformer as a vision encoder and an mBART-based text decoder, allowing for end-to-end transcription of scientific PDFs.
-
End-to-End Training: With Nougat, there's no need for complex pipelines. The model takes raw pixels as input and generates Markdown text as output, simplifying the entire OCR process.
-
Bridging the Gap: Nougat not only transcribes scientific documents but also bridges the gap between human-readable content and machine-readable text, making it easier to access and utilize scientific knowledge.
git clone https://github.com/inuwamobarak/nougat.git