Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

source for Scans of mahabharata, etc #383

Open
funderburkjim opened this issue Dec 3, 2021 · 4 comments
Open

source for Scans of mahabharata, etc #383

funderburkjim opened this issue Dec 3, 2021 · 4 comments
Labels
Documentation How TXT , XML work

Comments

@funderburkjim
Copy link
Contributor

Sources for some of the frequently mentioned editions of Sanskrit works have been identified here.

The link shows the title pages and asserts that the works have been digitized by Google.

These have the potential to be developed into link targets for references to

  • Calcutta ed. of Mahabharata
  • Harivamsa
  • Gorresio's Ramayana.

This note included here so the references may be more findable later when work is done.

A first step, in developing a link target, would be to separate the pdfs into separate one-page pdf files, with
'useful' file names (i.e. file names corresponding to the page citations in dictionaries referring to these editions).

@Andhrabharati
Copy link

@funderburkjim

The context makes me point to another post of mine-
#371 (comment)

@gasyoun gasyoun added the Documentation How TXT , XML work label Dec 5, 2021
@gasyoun
Copy link
Member

gasyoun commented Dec 5, 2021

would be to separate the pdfs into separate one-page pdf files, with 'useful' file name

Is there a real need for that split? There are thousands of pages there. To be handled manually? What kind of automation can be thought of @Andhrabharati ?

@funderburkjim
Copy link
Contributor Author

The separation into individual page pdfs can be done with Adobe Acrobat, and the renaming of the generated single-page pdfs can be done by a Python script. So, by this estimation, relatively little 'manual' work is required.

@gasyoun
Copy link
Member

gasyoun commented Dec 6, 2021

The separation into individual page pdfs can be done with Adobe Acrobat,

This is a 20 min task.

renaming of the generated single-page pdfs can be done by a Python script

No idea how you the woodoo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Documentation How TXT , XML work
Projects
None yet
Development

No branches or pull requests

3 participants