Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BEN Study #32

Open
Andhrabharati opened this issue Sep 27, 2021 · 53 comments
Open

BEN Study #32

Andhrabharati opened this issue Sep 27, 2021 · 53 comments
Labels
bug Something isn't working

Comments

@Andhrabharati
Copy link

Some "interesting" findings in this.

  1. [Page10xx is made [Paêxx

  2. sh in non-skt strings made as ṣ
    BEN sh (non-Skt) made as ṣ.txt

  3. Most importantly, the <ls> marking is limited to the work names alone, but not extended to the content numbers (citations), as I have been mentioning in MW, PWG etc., while I was into those works.

@Andhrabharati
Copy link
Author

2. sh in non-skt strings made as ṣ

It may be noted that Scientific names should not be treated as Skt. words, they must be retained as spelt in the English (Latin) words.

@gasyoun
Copy link
Member

gasyoun commented Sep 28, 2021

marking is limited to the work names alone, but not extended to the content numbers (citations)

Not critical, still good to have.

Scientific names should not be treated as Skt. words, they must be retained as spelt in the English (Latin) words

Yap, so it is a general notion.

@funderburkjim
Copy link

[Page10xx is made [Paêxx

I don't find Paê in csl-orig/v02/ben/ben.txt.

<ls> marking is limited to the work names

Yes, I took this shortcut here in benfey, so tooltips for the works would be available in displays.
At some future time we may decide to fully mark those for which there is a linkable target.

@funderburkjim
Copy link

The 'sh' list is good. These errors in conversion to IAST need to be corrected.

Possibly there are other errors generated in conversion to IAST.

@Andhrabharati
Copy link
Author

Andhrabharati commented Sep 28, 2021

[Page10xx is made [Paêxx

I don't find Paê in csl-orig/v02/ben/ben.txt.

Sorry, I missed the 'g' in '[Pagêxx'; and here is the list for it.

Line 133383: {%[Pagê03-a+ 40]%}
Line 133454: {#[Pagê03-b+ 43]#}
Line 134740: {%[Pagê13-a+ 40]%}
Line 134970: {%[Pagê14-b+ 41]%}
Line 135120: {%[Pagê15-b+ 41]%}
Line 135269: {%[Pagê16-b+ 41]%}
Line 135481: {%[Pagê18-a+ 40]%}
Line 135555: {%[Pagê18-b+ 39]%}
Line 136377: {%[Pagê24-b+ 41]%}
Line 136721: {%[Pagê27-a+ 42]%}
Line 137909: {%[Pagê36-a+ 43]%}
Line 138559: {%[Pagê41-a+ 41]%}
Line 140162: {@[Pagê53-a+ 42]@}
Line 141246: {%[Pagê62-a+ 41]%}
Line 143604: {@[Pagê81-a+ 44]@}

Seems e10 of the LN encoding got applied here to get the ê.

image

@Andhrabharati
Copy link
Author

Just like many others, BEN scan with CDSL is also bad and has led to many errors in the digitisation.

BTW, I am on BEN for last two days and in another two days, will be posting my file, for study by the CDSL team.

@gasyoun
Copy link
Member

gasyoun commented Sep 28, 2021

I am on BEN for last two days and in another two days, will be posting my file

Have I said enough times I like this guy? ))

@Andhrabharati
Copy link
Author

Some more points-

  1. Quite many verbal entries (Dhatu etc.) which are in all CAP form, have no {%...%} tagging.
  2. Few wrong taggings between @...@ & %...% are seen and corrected.
  3. Plenty of dot places are typed as comma (and many dots are missing altogether); just corrected the comma places as they would reflect in <ab> and <ls> tags.

@Andhrabharati
Copy link
Author

Andhrabharati commented Sep 29, 2021

Incidentally,

  1. The effect of "[Page10xx is made [Pagêxx" is seen in the metalines' <pc> content, which are all "wrong" in those 15 segments (spanning many entries)!!

Apart from displaying the pdf page, is there any use for this <pc> content, @funderburkjim?

@drdhaval2785
Copy link
Contributor

In old AS notation, e10 was used for ê. Therefore, it seems to be an erroneous side-effect of converting AS to IAST.

@Andhrabharati
Copy link
Author

What is AS, Alphabet-Sequence?

I read somewhere, Jim mentioning the LN notation (Letter-Number) and so mentioned in my post above.

@drdhaval2785
Copy link
Contributor

There is no formal definition. We sometimes called these encoding 'Anglicized Sanskrit'.

@Andhrabharati
Copy link
Author

Andhrabharati commented Sep 29, 2021

In old AS notation, e10 was used for ê. Therefore, it seems to be an erroneous side-effect of converting AS to IAST.

As I noticed, the error might've crept in, because those [Page...] strings are tagged as Sanskrit {%[Page...]%}

@Andhrabharati
Copy link
Author

I thought that the 'Anglicized Sanskrit' term is more used for words like Sanskrit, Aryan, Brahmin etc. as mentioned by MW!!

@Andhrabharati
Copy link
Author

Yes, I took this shortcut here in benfey, so tooltips for the works would be available in displays.
At some future time we may decide to fully mark those for which there is a linkable target.

@funderburkjim

In some other thread, there was a discussion on using PDFs for linking to the citations in CDSL dictionaries.

I just got reminded of this, as Benfey had used Gorresio ed. of Ramayana.

Seems Gorresio had spent 24 years of his life in bringing out his Ramayana (critical) ed., at the behest of Burnouf, and it got very popular in the Western countries those days.

And @gasyoun was pondering on whereabouts of the Bombay ed. and Calcutta ed. that are widely referred in the "European" Lexicons of Sanskrit.

One can find these and many more editions of Ramayana at http://onlinebooks.library.upenn.edu/webbin/book/lookupname?key=V%26amacr%3Blm%26imacr%3Bki

So you may think again on using the PDF-links for the citations across all the CDSL works.

@funderburkjim
Copy link

funderburkjim commented Sep 29, 2021

AS, Alphabet-Sequence?

This terminology is due to @thomasincambodia , who originally used it in mw; see CDSL.pdf, where he termed it 'Anglicized Sanskrit'.

Over time, Thomas has used variations of his original AS notation; and has extended the usage to represent any Latin alphabet-with-diacritics in whatever language. I thought it best to remove this letter-number representation in the digitizations, by replacing the letter-number codes with Unicode characters.

In this replacement process, there is always the issue that some letter-number sequences should NOT be replaced by Unicode; for example the 'e10' in [Page10 should not be changed to e-circumflex (generally Thomas uses the number 10 to indicate 'circumflex').

It is good that you point out the erroneous conversion of 'e10' to circumflex in Benfey. These need to be changed.

@Andhrabharati
Copy link
Author

Your ref. to this paper by Thomas has reminded me of another wish in MW revision; to add Winternitz's corrections, apart from incorporating MW's own addenda into the main text.

And can you get from Thomas, the details of other 'private' works that he was mentioning in this paper?

@funderburkjim
Copy link

can you get from Thomas,

Suggest you make a new issue regarding mw, and address question to @thomasincambodia .

@Andhrabharati
Copy link
Author

Just like many others, BEN scan with CDSL is also bad and has led to many errors in the digitisation.

See for example, the scan page
image

and the text
image

Here are two corresponding god scans-
image

image

And I also have a photocopy of a good print.

@Andhrabharati
Copy link
Author

@funderburkjim,

Can you think of some plan by which we can correct those bad places in the text?

Full proofing is the best way out, but it definitely takes more time.
Just browsing through the Cologne scan, to identify "bad areas", is one possibility that comes to my mind.

@gasyoun
Copy link
Member

gasyoun commented Sep 30, 2021

Plenty of dot places are typed as comma (and many dots are missing altogether)

So a few thousand of them in each dictionary.

Just browsing through the Cologne scan, to identify "bad areas", is one possibility that comes to my mind.

One can't browse such an amount in full. Only randomly.

to add Winternitz's corrections, apart from incorporating MW's own addenda into the main text.

Do you have a link to the Winternitz's corrections?

Gorresio had spent 24 years of his life in bringing out his Ramayana (critical) ed., at the behest of Burnouf, and it got very popular in the Western countries those days.

Yes, the links to Ramayana and Mahabhrata is what comes to mind, but where we lack an idea what exactly to do, as the older editions where never digitised and only scanned. If we can't link to the exact schloka in the book, linking at least to the chapter would make sense. @Andhrabharati in the case of Gorresio what scan would you propose?

@Andhrabharati
Copy link
Author

I have marked the Greek text places with ???, and seen someone's "handwritten notes" coming onto the digitisation at one place.

<H>{#श#} {%Ś%}. = <lang n="greek">???</lang>
; Here it is not a Greek text in print, but just someone's handwritten text "= Bopp's ς◌́" {Greek letter Ending Sigma with acute accent ?) in his copy!
image

Here is the corresponding image from another scan-

image

@Andhrabharati
Copy link
Author

Just browsing through the Cologne scan, to identify "bad areas", is one possibility that comes to my mind.

One can't browse such an amount in full. Only randomly.

It all depends on the person on the job!

to add Winternitz's corrections, apart from incorporating MW's own addenda into the main text.

Do you have a link to the Winternitz's corrections?

Yes, I do have the PDF.

Gorresio had spent 24 years of his life in bringing out his Ramayana (critical) ed., at the behest of Burnouf, and it got very popular in the Western countries those days.

Yes, the links to Ramayana and Mahabhrata is what comes to mind, but where we lack an idea what exactly to do, as the older editions where never digitised and only scanned. If we can't link to the exact schloka in the book, linking at least to the chapter would make sense. @Andhrabharati in the case of Gorresio what scan would you propose?

I recall @funderburkjim asking you to make this a student's project, marking the pdf page number against the citation, so that the page can be displayed.

I have two diff. scans of Gorresio volumes. Need to look into both, to decide which is the better one.

@Andhrabharati
Copy link
Author

Here is the file, @gasyoun-
MW99-Review by Winternitz.pdf

@gasyoun
Copy link
Member

gasyoun commented Sep 30, 2021

Greek letter Ending Sigma with acute accent ?) in his copy!

I do not see no Greek here but just the French ç

@maltenth
Copy link

Your ref. to this paper by Thomas has reminded me of another wish in MW revision; to add Winternitz's corrections, apart from incorporating MW's own addenda into the main text.

And can you get from Thomas, the details of other 'private' works that he was mentioning in this paper?

@Andhrabharati
can you be more specific?

@maltenth
Copy link

I have marked the Greek text places with ???, and seen someone's "handwritten notes" coming onto the digitisation at one place.

<H>{#श#} {%Ś%}. = <lang n="greek">???</lang> ; Here it is not a Greek text in print, but just someone's handwritten text "= Bopp's ς◌́" {Greek letter Ending Sigma with acute accent ?) in his copy! image

Here is the corresponding image from another scan-

image

this refers to one of the pre-IAST transliterations of श, viz. S' (S followed by accent aigu)

@Andhrabharati
Copy link
Author

Your ref. to this paper by Thomas has reminded me of another wish in MW revision; to add Winternitz's corrections, apart from incorporating MW's own addenda into the main text.
And can you get from Thomas, the details of other 'private' works that he was mentioning in this paper?

@Andhrabharati can you be more specific?

I was referring to your statement under "Further corrections and tags" in 1.5 of the CDSL.pdf, @thomasincambodia

@maltenth
Copy link

maltenth commented Oct 1, 2021

@Andhrabharati

"private corrections lists" should have been "unpublished correction lists"
I have yet to come across any.

@maltenth
Copy link

maltenth commented Oct 1, 2021

I thought that the 'Anglicized Sanskrit' term is more used for words like Sanskrit, Aryan, Brahmin etc. as mentioned by MW!!

I would rather call these Sanskrit loanwords especially when the form has been altered/adapted to English, but there are borderline cases: Rigveda, pandit, karma, Shiva etc.
AS would be definitely those Sanskrit words that are used with diacritics, as Boehtlingk uses them in boesp.:

Pa1nduiden Pa1n2ini Pa1riga1ta Pa1rtha Pa1rvati1 Pa1t2ala1 Pa1t2ala1-Blüthe
Pa1ta1laketu Pr2thu Ra1dha1 Ra1jagr2ha
Ra1hu Ra1kshasa Ra1ma Ra1ma1jan2a Ra1van2a Si1ta1 Su1kimukha
Su1ryaka1nta-Steine

It can be said that AS words are always nouns/names. Also mark the initial capital, as there are no capital letters in Indian scripts.
here, AS should be more appropriately termed AG (Anglicized German) but AS could cover any Roman script.

@gasyoun
Copy link
Member

gasyoun commented Oct 2, 2021

AS should be more appropriately termed AG (Anglicized German)

Interesting thought.

@maltenth
Copy link

maltenth commented Oct 2, 2021

sorry, of course meant GS = Germanized Sanskrit
but I think AS might do for any language, including Russian, French, Catalan , etc.

@drdhaval2785
Copy link
Contributor

I agree with Thomas's viewpoint that discussions can be pursued without unnecessary judgmental tone.

I would like to appreciate hard work done by Thomas and his team, of which we are reaping fruits. To be blunt, whatever we do here in this repository is ultimately a correction or feature addition to the work which was handed over to us because of the hard work put in by Thomas et al.

I also appreciate the fact that @Andhrabharati is quite methodical in his approach and has been bringing forth many issues which have not been attended to hitherto. We need that vigour too.

Kind request is to focus on content of the issue being raised, and keep the value judgments away from discussion.

@gasyoun
Copy link
Member

gasyoun commented Oct 2, 2021

Agree with every word of Dhaval.
We have come here not to fight
against each other, but to grow
the seeds Thomas has planted.

@gasyoun gasyoun added the bug Something isn't working label Oct 2, 2021
@Andhrabharati
Copy link
Author

Andhrabharati commented Oct 2, 2021

For the last 4 days, I was completely bed-ridden (with high-fever), away from the computer. Just started sitting at the computer since this evening.

So, I am just posting my BEN_main work as is, without spending any more time, though I had many pending aspects to cover in it.
[The Addenda part is separated from this text, as it was intended to be incorporated into the main text, as done in my IEG work.]
ben_Main.txt

As this work is made with a format close enough to CDSL one, and hope it would be accepted as is.

Not many comments henceforth from my side, as my words are harsh at times (as they come from my heart, without any bad intention), but they seem unbearable.

Just like to say now that the ls count increased from 113 to 219 & ab count from 107 to 282.
[There were many interesting points noticed in ths work, but unfortunately I have decided to shut my mouth.]

And until I hear back about my MW etym., IEG & this BEN works, will take good rest doing nothing (for CDSL, of course!).

@maltenth
Copy link

maltenth commented Oct 3, 2021

I think AS might do for any language, including Russian, French, Catalan , etc.

Sanskrit being the language of the Gods, all other languages are the languages of Angels, hence AS = Anglicized Sanskrit.

@funderburkjim
Copy link

Apart from displaying the pdf page, is there any use for this <pc> content?

No, the main purpose is to provide a link between the entry and the printed text, which is available from the scan.

@Andhrabharati
Copy link
Author

Andhrabharati commented Oct 3, 2021

Sanskrit being the language of the Gods, all other languages are the languages of Angels, hence AS = Anglicized Sanskrit.

Nice idea!
(But then, shouldn't it be Angelicised Sanskrit?)

@funderburkjim
Copy link

Just browsing through the Cologne scan, to identify "bad areas"

I browsed through the first 200 pages of the old scans, found several places especially where the image was
skewed. But this did not lead to finding places like in niryUha. So this approach does not look promising.

@maltenth
Copy link

maltenth commented Oct 3, 2021

Anglicized and Anglicised are just spelling variants.

But if we take angels as the base
it should be Angelized Sanskrit or perhaps Angelicized or Angelified

@maltenth
Copy link

maltenth commented Oct 3, 2021

one more: Angelificated

@funderburkjim
Copy link

I was completely bed-ridden

Sorry to hear that; hope you will recover quickly and completely.

@Andhrabharati
Copy link
Author

I am recovered, @funderburkjim; only little weakness, no obstacle for any working.

@Andhrabharati
Copy link
Author

Andhrabharati commented Oct 30, 2021

Here are the 4 vol.s of Calcutta ed. of Mahabharata (that are referred by all the early (European) Sanskrit works-

image
image
image
image

@Andhrabharati
Copy link
Author

Andhrabharati commented Oct 30, 2021

And the 'associated' Harivamsa-

image

[All the 5 books above are digitised (scanned) by Google.]

@Andhrabharati
Copy link
Author

I have two diff. scans of Gorresio volumes. Need to look into both, to decide which is the better one.

And, here are the 10 vol.s of Gorresio's Ramayana-
image
image
image
image
image
image
image
image
image
image

[All these are digitised (scanned) by Google.]

@Andhrabharati
Copy link
Author

Andhrabharati commented Feb 13, 2022

@jmigliori

I have noticed many Greek words in Benfey dictionary having a Roman 'j' in between and found that it denotes some gliding sound.

And there is one word having a Roman 'y' in it - σαγyω, at the entry word 1. सञ्ज् (p. 996).
Does this also have some significance (as the 'j' above)?

Here is the page image for your reference-
image

@jmigliori
Copy link

jmigliori commented Feb 13, 2022 via email

@Andhrabharati
Copy link
Author

Andhrabharati commented May 2, 2022

@jmigliori

Of late I was filling up greek text in the BOPP's glossary, and identified that the “j” character (mentioned in my above post) is not Roman (u+006A), but is Greek small letter 'yot' “ϳ” (u+03F3).

Νοw Ι would like to request you to pl. identify the character after τ, occurring in BOPP's work-
image

Is it 'σ' (as in the preceding greek word group), and is there any reference to this word somewhere?

@Andhrabharati
Copy link
Author

Here is the BEN_main.txt with greek strings filled up--
BEN_main_L2a.txt

Now, this stands corrected for the j (u+006A) > ϳ (u+03F3), as mentioned above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants