Awesome Sanskrit Manuscriptology

This repository is dedicated to the intersection of Sanskrit Manuscriptology and Computational Linguistics, focusing on the application of modern AI and OCR techniques to the study and preservation of Sanskrit manuscripts.

It is estimated that only about 1/10th of Sanskrit literature is exposed to the daylight. The vast knowledge hidden in the manuscript will take 100s of years to decipher. But if AI can be leveraged to get it into readable form, the period can be reduced dramatically. That's the attempt.

तमसो मा ज्योतिर्गमय ।

Introduction

Sanskrit Manuscriptology is the study of Sanskrit manuscripts, their history, preservation, and interpretation. With the advent of computational linguistics and AI technologies, new avenues have opened up for the analysis, digitization, and understanding of these ancient texts.

Degrees and courses

Post Graduate Diploma In Manuscriptology And Palaeography (PGDMP)
Online Diploma Program in Manuscriptology and Paleography

Key References

Sahoo, J., & Mohanty, B. (2015). "Digitization of Indian manuscripts heritage: Role of the National Mission for Manuscripts." IFLA Journal, 41(3), 237-250.
Hellwig, O. (2010). "Improving the Morphological Analysis of Classical Sanskrit." In Proceedings of the 24th Pacific Asia Conference on Language, Information and Computation.
Goyal, P., & Huet, G. (2016). "Design and analysis of a lean interface for Sanskrit corpus annotation." Journal of Language Modelling, 4(2), 117-144.
Kulkarni, A., & Huet, G. (2009). "Sanskrit computational linguistics." In Third International Symposium, Hyderabad, India, January 15-17, 2009, Proceedings (Vol. 5402). Springer.

AI and OCR Techniques

SanskritOCR: An OCR system specifically designed for Sanskrit manuscripts - GitHub: SanskritOCR
Tesseract OCR with Sanskrit support - GitHub: tesseract-ocr/tesseract
Devanagari Optical Character Recognition using Convolutional Neural Networks - Paper: arXiv:1708.03543
NLTK Sanskrit Library - GitHub: sanskrit-nltk

Institutes and Research Centers

National Mission for Manuscripts Website
Indian Institute of Advanced Study, Shimla, India Website
Bhandarkar Oriental Research Institute, Pune, India Website
French Institute of Pondicherry, India Website
Sanskrit Department, Harvard University, USA Website
Oxford Centre for Hindu Studies, UK Website
SAMHiTA South Asian Manuscript Histories and Textual Archive Website
Sangrah, is a part of Dharohar working on making India’s ancient wisdom easily, and globally accessible to scholars around the world.
SRI Sanskrit OCR
Vedvaapi
Sunbird Anuvaad bootstrapped by EkStep Foundation in late 2019 as a solution to enable easier translation of legal documents from English to Indic languages & vice-versa.

Notable Researchers

Prof. Amba Kulkarni - Sanskrit Computational Linguistics - Profile
Dr. Oliver Hellwig - Digital Sanskrit Philology - Profile
Prof. Gérard Huet - Sanskrit Heritage Site - Website
Dr. Pawan Goyal - Sanskrit NLP and Digital Humanities - Profile
Malhar Arvind KulkarniProfile
Dharmapuri Vedaratna LinkedIn
Dr. Diwakar Mishra LinkedIn
Girish Nath (Girish Nath Jha) Jha LinkedIn
Anil Kumar
Prof. Ganesh Ramakrishnan, OCR
Ayush Maheshwari. pe-ocr-sanskrit Source and Data of our EMNLP Paper 'A Benchmark and Dataset for Post-OCR text correction in Sanskrit'

Websites and Online Resources

Sanskrit Documents - Website
sanskrit-ocr
DevDigitizer Project (Sanskrit OCR) aims to build a state of the art Optical Character Recognition Software for Sanskrit/ Samskritam (Devanagari Script).
SRI, Sanskrit OCR Tool
𝐒𝐡𝐚𝐫𝐞𝐎𝐂𝐑 1, ShareOCR 2 𝐓𝐡𝐞 𝐄𝐧𝐝-𝐭𝐨-𝐄𝐧𝐝 𝐎𝐂𝐑 𝐟𝐨𝐫 𝐈𝐧𝐝𝐢𝐜 𝐂𝐨𝐧𝐭𝐞𝐧𝐭
GRETIL - Göttingen Register of Electronic Texts in Indian Languages - Website
Sanskrit Heritage Site - Website
Digital Corpus of Sanskrit - Website
SARIT - Search and Retrieval of Indic Texts - Website
Manuscriptology & Paleography National Workshop SAMSKRITAM & BHARATIYASAMSKRITI 1
कार्यशाला- ग्रंथ संधानम्
Manuscriptology - I, Editing Process
Manuscriptology: Introduction, Definition of Manuscript, Manuscript composition. (Common elements of Manuscript)
18 CME Dr Mohan Joshi - Basics of Manuscriptology
Manuscripts Treasure of India : Repository of our Heritage || Dr. Sarwarul Haque ||
भारतीय पाण्डुलिपि विज्ञान | डॉ० कीर्ति कान्त शर्मा | कलानिधि, IGNCA
NYCIKS 2023 - Workshop on Manuscriptology – Prof. Gauri Mahulikar
Manuscriptology - Prof. Malhar Kulkarni
61. Manuscriptology - Grantha Script - Dr.Krishnamachari
leap-pe-tool A framework for assisting human while correcting the translation/OCR errors in documents, mostly dedicated to Indian Languages., Udaan Projecy

Video Playlists and Lectures

Sanskrit and Indian Manuscriptology Series by IIAS Shimla - YouTube Playlist
Computational Sanskrit and Digital Humanities by IIT Kharagpur - NPTEL Course
Introduction to Sanskrit Computational Linguistics by Amba Kulkarni - YouTube Playlist

Personal Perspective: Why Pursue Sanskrit Manuscriptology?

Sanskrit Manuscriptology is a field that offers unique opportunities and challenges:

It's a much-needed area of study with significant potential for research and development.
The work is monk-like, requiring dedication and a lifelong commitment to learning.
There's a heavy scope and need for AI applications in this field.
It can be an "ikigai" - a reason for being that combines passion, mission, profession, and vocation.
It involves specific knowledge that's not widely available, making it a valuable niche.
The field has international relevance, with opportunities for collaboration in countries like Germany and the US.
There are ample chances to write research papers and books for both academic and general audiences.
Projects like Namami are unearthing vast knowledge, providing opportunities to become an expert in the field.

How to Get Started

Pursue a course or degree in Sanskrit Manuscriptology:
- Bhandarkar Oriental Research Institute (BORI)
- Savitribai Phule Pune University (SPPU)
- Online playlists and courses (see Video Playlists and Lectures)
Focus on various scripts:
- Sanskrit
- Modi
- Sharada
- Study the works of experts like Shrinand Bapat
Develop AI skills:
- Learn libraries like spaCy, OpenCV, scikit-learn, and PyTorch
- Focus on handwritten recognition models and custom OCR techniques (e.g., pytesseract)
Approach the field as real R&D:
- Embrace the lack of pressure and view it as an "ikigai"
- Develop specific knowledge in the intersection of AI and manuscriptology
Learn Sanskrit on the side:
- Continuously improve your language skills while working on AI and manuscriptology projects
Consider coaching or teaching AI applications in this field

Technical Approach

A typical workflow for manuscriptology using AI might include:

Image Processing:
- Input: Manuscript image
- Process: Edge detection
- Output: SVG (Scalable Vector Graphics)
Feature Recognition:
- Input: Vector graphics
- Process: AI-based feature recognition
- Output: JSON data structure
Knowledge Extraction:
- Input: JSON data
- Process: RAG (Retrieval-Augmented Generation)
- Output: Structured information and insights from the manuscript

This approach combines computer vision, machine learning, and natural language processing techniques to extract and understand information from ancient manuscripts.

Contributing

We welcome contributions to this repository. Please read our CONTRIBUTING.md file for guidelines on how to submit issues, feature requests, and pull requests.

License

This project is licensed under the MIT License - see the LICENSE.md file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Awesome_manuscriptology.md

Awesome_manuscriptology.md

Awesome Sanskrit Manuscriptology

Table of Contents

Introduction

Degrees and courses

Key References

AI and OCR Techniques

Institutes and Research Centers

Notable Researchers

Websites and Online Resources

Video Playlists and Lectures

Personal Perspective: Why Pursue Sanskrit Manuscriptology?

How to Get Started

Technical Approach

Contributing

License

Files

Awesome_manuscriptology.md

Latest commit

History

Awesome_manuscriptology.md

File metadata and controls

Awesome Sanskrit Manuscriptology

Table of Contents

Introduction

Degrees and courses

Key References

AI and OCR Techniques

Institutes and Research Centers

Notable Researchers

Websites and Online Resources

Video Playlists and Lectures

Personal Perspective: Why Pursue Sanskrit Manuscriptology?

How to Get Started

Technical Approach

Contributing

License