Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

INM Study #34

Open
Andhrabharati opened this issue Oct 30, 2021 · 49 comments
Open

INM Study #34

Andhrabharati opened this issue Oct 30, 2021 · 49 comments
Labels
documentation Improvements or additions to documentation

Comments

@Andhrabharati
Copy link

Andhrabharati commented Oct 30, 2021

I started with the (xml) header file.

The Author's name has ø in place of ö (which is properly rendered at Köln, though).

Though it (ø) might be correct in some notation (which I am not aware of), it does look strange esp. when it is possible to use the letter 'ö'.

@Andhrabharati
Copy link
Author

The Calcutta ed. of Mahabharata -

#32 (comment)

@Andhrabharati
Copy link
Author

Andhrabharati commented Oct 30, 2021

This is what Sörensen says, in his preface-

The text on which the Index is based is, on the whole, the Calcutta edition. Readings from other sources (i.e. almost exclusively the Bombay edition) have only been noticed for special reasons.

@Andhrabharati Andhrabharati changed the title INM issues INM Study Oct 30, 2021
@Andhrabharati
Copy link
Author

Andhrabharati commented Oct 30, 2021

The superscripts:

  1. There are few numbers which were marked thus, instead of their preceding numbers!
<sup>1133</sup>, <sup>1817</sup>, <sup>1843</sup>, <sup>1862</sup>
<sup>1925</sup>, <sup>1957</sup>, <sup>1963</sup>, <sup>2010</sup>
<sup>2019</sup>, <sup>2715</sup>, <sup>2836</sup>
  1. The small letters of Greek and (italic) English are wrongly enclosed in the bold tags ({@...@} along with the verse numbers. They all are to be outside the bold tags.

  2. There is one place where the superscript character [paragraph mark (¶)- the pilcrow] is "missed" (<sup></sup>), and also at its corresponding "FN string".

  3. At one entry (अङ्गिरस), the FNs for ‡ (12 times) and || (twice) are repeated. Is it (FN repetition) required to be so, even if the marks appear so many times in the main text? I've never come across such incidence so far!

  4. There are about 480 concurrences of Do.<sup>x</sup> under another HW (mostly with a <sup>), which could be split as new entries, replacing the "Do." with the Parent HW.
    [And there are some "Do." strings with a different gender or number wrt the HW entry; these could also be split.]

@Andhrabharati
Copy link
Author

Andhrabharati commented Oct 31, 2021

Did the Concordance portion in the INM text.

Padded the Parva numbers (in braces) to the entries, and then noted that it is exactly how it is intended by the compiler(s) (as seen in the Additions & Corrections at p. 790).
[Glad that I could correctly grasp the (intended) theme, in my thinking!]

Also incorporated the corrections into the text, as appropriate.

The Small Cap. titles of the 100 upa-parvas are marked with ¤, as noted in my PWG ls working.
INM_Concordance.txt

@Andhrabharati
Copy link
Author

Also did the listed abbr.s and updated the list from the A&C page.
INM_Abbr.txt

@Andhrabharati
Copy link
Author

Andhrabharati commented Oct 31, 2021

Here is the Concordance as prepared by Hermann Jacobi in 1903.
Mahabharata Concordance (German) Hermann Jacobi.pdf

This has the references to the Madras edition, in addition to the Bombay and Calcutta editions.

@Andhrabharati
Copy link
Author

Looked at the meta2 file.

{??} : Unreadable text. about 100 instances.

I've filled up all the {??} places in all the CDSL files sometime back, and they were incorporated by @drdhaval2785 into the resp. main (production) files. All the {??} in all the meta2 files could be removed.

Often line breaks have semantic significance, such as indicating the row of a table. Three types of line break divisions are indicated, although the distinction among these types is not currently understood.
<div n="lb"> Line break
<div n="P"> Line break with indentation
<div n="HI"> Line break - Possible list of headwords; 22 instances.

The <div n="HI"> entries indeed are the HW entries, but are presently merged with the prev. entries in the txt file. I had split them now, with a "xxx.1" L-number. Many of these are list of words, that are to be appropriately split as separate entries.

The <div n="P"> entries are mostly the synonyms (or section names or related words etc.) of the HW entry, and can be appropriately handled. Many of these are with just a "q.v.", but quite some are with full expansions. They could in a way be treated as the sub-entries, as a list (with some marking, say a bullet, ●) under the main entry, without splitting them as sep. entries as the <div n="HI"> or the "Do. xxx" entries (which I had split with a "xxx.1" L-number) mentioned in my post under "The superscript" head above. I did mark them thus in my working now.

The <div n="lb"> are the regular line-breaks; and as concluded in the "Line markers" issue (#26) to have no marks altogether (agreed by all the team; though Jim asked Dhaval to do it at #26 (comment), it did not seem to have happened yet)- I have removed all of them here.

Also did some punctuation corrections in the txt file.

This is the basic file I have started with, for the main pages content.
inm-devanagari-main_L0.txt

All other portions are split as separate files- (i) Preface, (ii) Postscript , (iii) Abbreviations, (iv) Concordance & (v) Additions and Corrections.

Now, I will be filling up the ~10000 Greek letters (plain or accented) in the text, in next couple of days.

And finally, the A&C content will be incorporated in the main text.

Estimating the whole exercise to be over in 10-15 days time.

@Andhrabharati
Copy link
Author

<C n="N"> (N=1,2,3,4,5,6). Indicates columns in a tabular display. 200 instances.

I guess, many of these instances can better be rendered as pictures, instead of tables (which they are not!).

I will be marking them as appropriate, during my working.

@gasyoun
Copy link
Member

gasyoun commented Nov 1, 2021

The Author's name has ø in place of ö

And so should remain, see https://en.wikipedia.org/wiki/S._P._L._S%C3%B8rensen

[And there are some "Do." strings with a different gender or number wrt the HW entry; these could also be split.]

So you propose alternate headwords.

@Andhrabharati
Copy link
Author

And so should remain, see https://en.wikipedia.org/wiki/S._P._L._S%C3%B8rensen

You mean the original print had it wrong?
image

image

image

@Andhrabharati
Copy link
Author

This is what the Wikipedia says in "Danish orthography"-

image

The reform of 1948 abolished the capitalization of all nouns.

Interesting to know that Danish also has no capitalization, like the Indic scripts!

@Andhrabharati
Copy link
Author

Whatever it is, the spelling should be kept uniform across all CDSL pages, isn't it?
[Except at this xml file, all other places seem to have the name spelt as in the print book.]

@Andhrabharati
Copy link
Author

Andhrabharati commented Nov 1, 2021

5. There are about 480 concurrences of `Do.<sup>x</sup>` under another HW (mostly with a `<sup>`), which could be split as new entries, replacing the "Do." with the Parent HW.

The following 69 enries have no <sup> number at the header, but contain subsequent <sup> entries inside the matter-

<L>1929, <L>2075, <L>3039, <L>3074, <L>5112, <L>5403, <L>5404, <L>5708, <L>5978, <L>5981,
<L>6552, <L>6649, <L>7351, <L>7355, <L>7497, <L>7719, <L>7726, <L>7740, <L>7795, <L>7901,
<L>7948, <L>8084, <L>8108, <L>8139, <L>8192, <L>8312, <L>8426, <L>8543, <L>8797, <L>8812,
<L>8834, <L>8838, <L>9135, <L>9273, <L>9329, <L>9391, <L>9392, <L>9551, <L>9561, <L>9600,
<L>9624, <L>9657, <L>9671, <L>9679, <L>9778, <L>9808, <L>9831, <L>9848, <L>10411, <L>10647,
<L>10685, <L>10864, <L>10871, <L>10961, <L>11059, <L>11332, <L>11647, <L>11766, <L>11781, <L>11795,
<L>11950, <L>12009, <L>12371, <L>12394, <L>12408, <L>12412, <L>12456, <L>12494, <L>12543

These need to be with h-number in the metaline, as appropriate.

@gasyoun
Copy link
Member

gasyoun commented Nov 1, 2021

10000 Greek letters (plain or accented) in the text, in next couple of days.

So quick.

You mean the original print had it wrong?

You win.

@Andhrabharati
Copy link
Author

Andhrabharati commented Nov 2, 2021

Here is the file with first phase corrections-
inm-devanagari-main_L1.txt
[I am trying to make the corrections (mostly) in a controlled manner now, for a change!]

Noticed that the Greek letters are in italics when after a verse number or adhyAya number, and the § numbers are followed by Roman (English) letters in italics (is it always so?); and some Greek strings are in normal type (not italics).

Now filling the Greek letters (italics) [without the <lang> tag] and the Greek strings [with the <lang> tag].

@Andhrabharati
Copy link
Author

Here are the Preface and Postscriptum files-
INM_Preface.txt
INM_Postscript.txt

@Andhrabharati
Copy link
Author

Andhrabharati commented Nov 2, 2021

Found 40 sh words (unconverted to ṣ) in the A&C pages content, out of the total 46 count (other 6 are in English).
words with sh remained unconverted to ṣ.txt

And then found 2 such words in the main pages- Ṛshabha & Ishupad.

@Andhrabharati
Copy link
Author

Here is the A&C file prepared to be incorporated into the main text.
INM_A & C.txt

It has 55 "Do. strings", which need to be split (as done in the main text) while merging.

@gasyoun
Copy link
Member

gasyoun commented Nov 2, 2021

some Greek strings are in normal type (not italics).

But you mark the italic ones, right?

@Andhrabharati
Copy link
Author

Andhrabharati commented Nov 2, 2021

will be doing at the end (by regex); but guess they can be done while displaying with the info I had given above

Noticed that the Greek letters are in italics when after a verse number or adhyAya number

@Andhrabharati
Copy link
Author

@gasyoun
do you have any Danish friends, who can explain about this (from the above Wiki article)?-

although the orthographic dictionaries continued to use ø and ö (collated as if they were the same letter) until 1918.

@Andhrabharati
Copy link
Author

Revised A&C file-
INM_A & C_L2.txt

@gasyoun gasyoun added the documentation Improvements or additions to documentation label Nov 2, 2021
@Andhrabharati
Copy link
Author

some Greek strings are in normal type (not italics).

But you mark the italic ones, right?

you mean to mark these normal strings in italics?

how would I go against the print style, when they have some (underlying) theme?

@gasyoun
Copy link
Member

gasyoun commented Nov 3, 2021

how would I go against the print style, when they have some (underlying) theme?

Not against, but to replicate what we observe. Normal should remain normal.

@Andhrabharati
Copy link
Author

how would I go against the print style, when they have some (underlying) theme?

Not against, but to replicate what we observe. Normal should remain normal.

yes, I do replicate the book data.

@Andhrabharati
Copy link
Author

Just happened to note the "spaced" names in INM as well, which are quite frequently occurring in the early generations of European works (esp. PWG, pwk); but these INM words are not marked as in the PWG/pwk digitised texts.

image

Is there any specific notation for such spaced words in those early works?

I presume that @thomasincambodia and/or @SergeA can throw some light on this, if in good moods.

@gasyoun
Copy link
Member

gasyoun commented Nov 12, 2021

"spaced" names

Agree it's here.

@maltenth
Copy link

maltenth commented Nov 13, 2021

spaced words are marked with {|...|}.
the original notation is this:
<>{%yoga%} and hymned {%Na1ra1yan2a%} (<g></g>) [i.e. {|Maha1purushastava|}]
there are 252 such markings in INM.
I don't know what purpose they are supposed to serve in INM, and there is no way of displaying s p a c e d words
in modern type setting, except by adding a space after each character. G o o d l u c k searching for them, or breaking t h e - m (note the spacing of m) at the end of a line, or displaying words with one o r two characters. Generally they are awkward customers, and I (note the spacing of I) had rather say N O to them.

@Andhrabharati
Copy link
Author

the original notation is this:
<>{%yoga%} and hymned {%Na1ra1yan2a%} (<g></g>) [i.e. {|Maha1purushastava|}]
there are 252 such markings in INM.

Thank you @thomasincambodia, for this info; so this is yet another case of losing some details from original digitisation (over time, for whatever reason).

It is surprising why some (present) Koeln texts have retained the notation and some have "lost" it.

I don't know what purpose they are supposed to serve in INM, and there is no way of displaying s p a c e d words in modern type setting

On a 2nd thought, these appear to denote the entry words in the book (so is the case, mostly, in PWG etc.) but without a "(q.)v." tag!

I do agree that they look awkward with spacing (at present times' printing style), but guess they could somehow be displayed separately; say with a different font (but without the spacing)- probably with a sans font, as against the serif font used throughout otherwise.

@Andhrabharati
Copy link
Author

And with the clue from @thomasincambodia, I could re-introduce those 252 tags in the text, even if they are "not used" at the moment in any manner.

@maltenth
Copy link

N o h a r m , I a m s u r e.

@gasyoun
Copy link
Member

gasyoun commented Nov 13, 2021

here is no way of displaying s p a c e d words in modern type setting

There is in CSS.

On a 2nd thought, these appear to denote the entry words in the book (so is the case, mostly, in PWG etc.) but without a "(q.)v." tag!

Same thought I have.

@Andhrabharati
Copy link
Author

Pl. see this-

image

It is customary to write the rṛ as र्ऋ, not as रृ in the Devanagari script.

Request @drdhaval2785 & @funderburkjim to have a look at this in all the CDSL works.

Seen the same issue in PWG as well-

image

@Andhrabharati
Copy link
Author

Now filling the Greek letters (italics) [without the <lang> tag] and the Greek strings [with the <lang> tag].

Completed filling the Greek letters.

Interestingly there is just one Greek word, in the whole book, under <L>93 (p. 8), <lang n="greek">Ἀσσακηνοί</lang> and it has only the starting capital letter as non-italic, all other letters of the word are in italics.

image

Can we ask the Greek expert (@jmigliori), if the full word can be put in italics?

@Andhrabharati
Copy link
Author

Looked into the utf8 file, that was indicated by @funderburkjim sometime back.

It has 55 spaced word markings in the Concordance pages, 2 in the A&C pages (Concordance) and 301 in the Main pages (a total of 358, not 252).
spaced word markings in utf8 file.txt

Many of these are either captions or section (§) names, and others are proper names (initial cap., mostly with diacritics). Some words that are not proper names (no initial cap.), like manas, buddhi, nakshatra etc,. are also indicated as spaced words.

And some of these included the English words also! So, we cannot presume that these are all of the AS type words as in PWG/pwk.

As plenty of the words in this list (the extracted file above) are seen elsewhere throughout the text without spacing or not in italic or bold letters, it is thought better to leave the matter untouched in the present Cologne file.

@jmigliori
Copy link

jmigliori commented Nov 14, 2021 via email

@Andhrabharati
Copy link
Author

Thanks @jmigliori.

Another small query-

image

Except here in p.10, where the ̕ (u+0315) character is seen 2 times, all other places seem to be with ʹ (u+0374) character 183 times, which I understand is a numeric mark in Greek.

See for example, the screenshot from p. 82

image

Can you throw some light on this as well?

@Andhrabharati
Copy link
Author

Also seen that αγʹ occurred twice and αδʹ thrice in the whole text.

See the screenshot from p. 396. for example-

image

@jmigliori
Copy link

jmigliori commented Nov 14, 2021 via email

@gasyoun
Copy link
Member

gasyoun commented Nov 15, 2021

I can’t see any reason aside from an error why that other symbol was used

Thanks

@Andhrabharati
Copy link
Author

Andhrabharati commented Nov 19, 2021

I've started extending the Concordance, encompassing all major printed versions of Mahabharata- Calc. ed. (1834 onwards), Madras ed. (1855), Bomb. ed. (1863 onwards), PC Roy ed. (1882 onwards), Kumbhakonam Southern ed. (1906 onwards)., Pune (Chitraśālā) ed. (1929 onwards), Telugu ed. (1929), PPS Southern Critical ed. (1931 onwards), BORI Critical ed. (1927 onwards, first parva completed in 1933), and the Gita Press ed. (1955); this is with the concordances by Jacobi (1903) & Sorensen (1904) as the starting point.
[I could not get hold of the śaka 1785 (1842 A.D.) or śaka 1799 (1856 A.D.) prints of Bomb. ed., but got only the 1863 A.D. print.]

Just happened to note the "spaced" names in INM as well, which are quite frequently occurring in the early generations of European works (esp. PWG, pwk); but these INM words are not marked as in the PWG/pwk digitised texts.

image

Is there any specific notation for such spaced words in those early works?

I presume that @thomasincambodia and/or @SergeA can throw some light on this, if in good moods.

And found that Sorensen himself had indicated this usage, in the preface

image

& also repeated in the concordance pages!

image

And then, observed that the chapter (adhyāya) names are indeed rendered in two different types (normal & spaced)-

image

what a minute distinction - needs a very clear mindset!

@gasyoun
Copy link
Member

gasyoun commented Nov 19, 2021

what a minute distinction - needs a very clear mindset!

Well done, @Andhrabharati

@Andhrabharati
Copy link
Author

Here is the final form of INM main text, consolidated with the A&C data (and also with quite a few corrections done in the digitization).
ΙΝΜ-devanagari_consolidated (Andhrabharati).txt

Wish @drdhaval2785 or @funderburkjim would have a look at this sometime soon.

I urge to pay attention to my comment lines-

;; this is a geneological tree, not a table; as such, appropriate representation can be thought of.
;;<L>nnn is deleted as per Addendum.
;;<L>nnnn is deleted as per Addendum; instead a new entry {@Śaṅkhacakragadāpāṇi, Śaṅkhacakrāsipāṇi@} is added above.
;;<L>nnnn is deleted as per Addendum; instead a new entry {@Dhanya@} is added above.
;;<L>nnnn is modified/deleted as per the Addendum & relocated to <L>nnnn
;;<L>nnnn is wrongly rendered as a new HW; it just contains the list members of the prev. entry <L>nnnn; as such it is deleted and the data updated in the above entry.
<LEND>;; [This is a split of {@Ananta@}³ ᶠᶠ·¦ = Kṛṣṇa, Viṣṇu, Śiva, Mahāpuruṣa, Nārāyaṇa, the Sun (Sūrya); reason being no unicode for the superscript 'f']
<LEND>;; [Note. This is a new entry from the Addendum & updated]
<LEND>;; [Note. This is a new entry from the Addendum]
<LEND>;; [Note. This is an indicated entry, as a note.]
<LEND>;; [Note. This is split from the earlier entry & updated from the Addendum]
<LEND>;; [Note. This is split from the earlier entry]
<LEND>;; [Note. This is updated as a consequence to the above]
<LEND>;; [Note. This is updated from the Addendum & relocated from <L>nnnn]
<LEND>;; [Note. This is updated from the Addendum; and split from the earlier entry]
<LEND>;; [Note. This is updated from the Addendum; as per the next entry, this should not be with a <sup> tag.]
<LEND>;; [Note. This is updated from the Addendum]

The only remaining work for Cologne team, is to "handle" the grouped and variant form entries appropriately, as done (by Jim) in MW and others.

@Andhrabharati
Copy link
Author

BTW, though I had split the "Do." words as separate entries, I did not (mostly) replace the "Do." with the Parent HW, as I had initially thought of.
[And this could also be done by the team.]

@funderburkjim
Copy link

xml-header Ö and ø

Found this comment:

What is the difference between Ö and ø?
In Danish and Norwegian, ö was previously used in place of ø in older texts to distinguish between open and closed ö-sounds. It is also used when confusion with other symbols could occur, on maps for instance. The Dano-Norwegian ø is, like the German ö, a development of oe and can be compared with the French œ .

In our printed text, ö appears on Title page and in preface.

Also in inm.txt in the digitization of the preface, ö is used.

Thus, it seems reasonable to use ö in the xml header file, and I have changed it accordingly.

funderburkjim added a commit to sanskrit-lexicon/csl-orig that referenced this issue Dec 3, 2021
@funderburkjim
Copy link

superscript corrections

The 'sup' errors (<sup>1133</sup>, etc in comment above) now corrected.
Interestingly, they all occur in a small collection of lines in one entry.

The remaining numbers marked by 'sup' have 2 digits (<sup>N</sup>) (290 instances) or 1 digit (6297 instances).

funderburkjim added a commit to sanskrit-lexicon/csl-orig that referenced this issue Dec 3, 2021
@funderburkjim
Copy link

The small letters of Greek and (italic) English are wrongly enclosed in the bold tags ({@...@} along with the verse numbers. They all are to be outside the bold tags.

Not true in general for Greek text.
Here is example (page 159-b, line 2 brahman where the verse and following 'alpha' is bold.

<div n="lb">(do.): XIII, {@84<lang n="greek"></lang>,@} 3954 {%(Pitāmahaḥ).%}—§ 747{%b%} (Suvarṇot-

and
image

To my eye, this 'alpha' is bold. However, also my eye does not readily identify this as an 'alpha', but first thinks
it to be Latin 'a'.

Here are two examples (from page 1-b) where it is ambiguous to my eye whether the (definitely) Greek letters are bold.
image

The markup (bold-digits+greek) occurs often in the digitization:
7945 matches in 7264 lines for "{@[0-9]+<lang n="greek"" in buffer: inm.txt

From this review, I don't see the need to remove bold from those Greek letters.

I don't find any English italic letters after digits marked as bold. Note the 747b in first image of this comment. Here there is an italic English letter, but neither the preceding digits nor the letter 'b' is bold, and this appears to correspond to the scan.

@gasyoun
Copy link
Member

gasyoun commented Dec 5, 2021

To my eye, this 'alpha' is bold. However, also my eye does not readily identify this as an 'alpha', but first thinks
it to be Latin 'a'.

Agree to both.

where it is ambiguous to my eye whether the (definitely) Greek letters are bold.

non-bold @jmigliori I believe

@Andhrabharati
Copy link
Author

Andhrabharati commented Dec 12, 2021

To my eye, this 'alpha' is bold. However, also my eye does not readily identify this as an 'alpha', but first thinks
it to be Latin 'a'.

From this review, I don't see the need to remove bold from those Greek letters.

Incidentally, this particular Greek letter (alpha) is the only one appearing as bold throughout the text, all others being in normal style.
Probably it is due to some typographical limitation, as @jmigliori suggested (above) on a different context sometime back.

And it is the reason why I had thought all the Greek letters are to be plain, not in bold style.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

5 participants