Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow for common-sense manual improvements to punctuations and formatting #747

Open
vvasuki opened this issue Feb 17, 2022 · 21 comments
Open
Labels
doc Improvements or additions to documentation question Further information is requested

Comments

@vvasuki
Copy link

vvasuki commented Feb 17, 2022

I observed in a few threads some insistence on sticking to "what's in the printed text" - even with regards to punctuation and formatting!

Opening this thread so that it may be considered more fully. Some pertinent notes:

Given that git + dict system allows:

  • version control so as to retrieve "pristine" versions of files
  • distributed correction effort
  • checking of proposed changes with diffs
  • easy comparison with text images

why not let manual formatting improvements come through at whatever rate they do - as long as they don't affect future programmatic corrections?

@gasyoun
Copy link
Member

gasyoun commented Feb 17, 2022

why not let manual formatting improvements come through at whatever rate they do - as long as they don't affect future programmatic corrections?

You want to go away from the original dictionary format?

@vvasuki
Copy link
Author

vvasuki commented Feb 18, 2022

You want to go away from the original dictionary format?

Where it makes sense - yes! One has to use "common sense" and see from the perspective of dict users. Not so hard. Constraints of printing in 2-column format paper 100+ years ago don't apply to computer screens. And users have come to adapt new equivalent notations and routinely use more punctuations.

@vvasuki
Copy link
Author

vvasuki commented Feb 18, 2022

Also, today's scenario where users easily and routinely refer to dozens of dicts side by side, consistency in notation becomes a matter of concern (Eg. sanskrit-lexicon/csl-ldev#7 ). That too motivates harmless deviations from the original.

@vvasuki
Copy link
Author

vvasuki commented Feb 18, 2022

Everyone read this please (via @drdhaval2785 at sanskrit-lexicon/csl-ldev#7 (comment) ):

The creation of a TEI version of the Cologne Sanskrit Lexicon is
part of the Lazarus Project1 and aims for long-time preservation of the data. It is based
on the original digitisations and mark-up versions of the CSL and uses the TEI Guidelines,
especially the dictionary module. The objective of the TEI Cologne Sanskrit Lexicon is to
preserve all information contained in the original prints, as far as it was preserved in the
digitisation process (Kapp and Malten, 1997, as described in), while using a well docu-
mented and standardised XML. The second objective is to display the information as con-
sistent and faithfully as possible to the original prints, while allowing the user to choose
the writing system in which the Sanskrit words are displayed.

So, no one needs to obsess over "keeping it close to original" here. Others have that aspect well under control. This project can move along to the objective of best serving today's users.

@vvasuki
Copy link
Author

vvasuki commented Apr 16, 2022

Case in point - indic-dict/stardict-sanskrit#139

@drdhaval2785 drdhaval2785 added question Further information is requested doc Improvements or additions to documentation labels Jan 3, 2023
@vvasuki
Copy link
Author

vvasuki commented Jan 13, 2023

The same dissatisfaction bothers me. Do I feel like reading the mess below?

image

It could be presented so much better. I hope this changes either here or in some project which will render all this obsolete.

@funderburkjim
Copy link
Contributor

Link for TEI Sanskrit Lexicon: http://c-salt.uni-koeln.de/

There is no ongoing collaboration between the 'Github/sanskrit-lexicon' (CDSL) project at Cologne and the 'C-SALT' project at Cologne.

Maybe @fxru could provide a description of the relation between CDSL and C-SALT.

@funderburkjim
Copy link
Contributor

... this mess could be so much better

Would you provide a mock-up of a better presentation? This would help others understand what is in your mind.

@vvasuki
Copy link
Author

vvasuki commented Jan 15, 2023

There is no ongoing collaboration between the 'Github/sanskrit-lexicon' (CDSL) project at Cologne and the 'C-SALT' project at Cologne.

I didn't say there was; and that's good think too! That leaves both projects free to pursue their distinct goals without compromise. The goal of CDSL should be to present what the dict maker intended in the best possible way given the current non-paper media and tech.

... this mess could be so much better

Would you provide a mock-up of a better presentation? This would help others understand what is in your mind.

विकल्पः, पुं, (विरुद्धं कल्पनमिति । वि + कृप + घञ् ।) 

भ्रान्तिः ।
 (यथा, देवीभाग-वते । १ । १९ । ३२ ।
“विकल्पोपहतस्त्वं वै दूरदेशमुपागतः ।
न मे विकल्पसन्देहो निर्व्विकल्पोऽस्मि सर्व्वथा ॥”)

कल्पनम् । इति मेदिनी । पे, ॥
(यथा, भागवते । ५ । १६ । २ ।
“तत्रापि प्रितव्रतरथचरणपरिखातैः सप्तभिः सप्त सिन्धवः उपकॢप्ताः ।   
यत एतस्याः सप्तद्वीपविशेषविकल्पस्त्वया भगवन् खलु सूचितः ॥”)

संशयः । यथा, रघुः । १७ । ४९ ।
(“रात्रिन्दिवविभागेषु यथादिष्टं महीक्षिताम् ।
तत्सिषेवे नियोगेन स विकल्पपराङ्मुखः ॥”)

नानाविधः । यथा, मनुः । ९ । २२८ ।
(“प्रच्छन्नं वा प्रकाशं वा तन्निषेवेत यो नरः ।
तस्य दण्डविकल्पः स्याद्तथेष्टं नृपतेस्तथा ॥”)

विविधकल्पः । स च द्विविधः । व्यवस्थितः । एच्छिकश्च । सोऽप्याकाङ्क्षाविरहे युक्तः ।
 तथा च भविष्ये -

See how much more pleasant and readable that is?

@funderburkjim
Copy link
Contributor

Certainly the format you show is pleasant.

From my naive perspective, I do not see how it derives from the vacaspatyam text -- there is almost no overlap between the two texts.

What am I missing?

@vvasuki
Copy link
Author

vvasuki commented Jan 15, 2023

What am I missing?

That was kalpadruma. Compare with:

image

Also, please refer to sanskrit-lexicon/csl-ldev#3 (comment) linked in the first post above - there was even an objection to the addition of quotation marks around quotes because "Not traceable in the printed text"! Such robotic fidelity should be dropped.

@funderburkjim
Copy link
Contributor

Markup can generate the nicer format.

image

@funderburkjim
Copy link
Contributor

Here is the bit of the vikalpa digitization corresponding to sample display:

OLD
<L>32332<pc>4-371-b<k1>vikalpaH<k2>vikalpaH
vikalpaH¦, puM, (virudDaM kalpanamiti . vi +
kfpa + GaY .) BrAntiH . (yaTA, devIBAga-
vate . 1 . 19 . 32 .
“vikalpopahatastvaM vE dUradeSamupAgataH .
na me vikalpasandeho nirvvikalpo'smi sarvvaTA ..”)
kalpanam . iti medinI . pe, .. (yaTA, BAga-
vate . 5 . 16 . 2 .
“tatrApi pritavrataraTacaraRapariKAtEH saptaBiH
sapta sinDavaH upakxptAH . yata etasyAH sapta-
dvIpaviSezavikalpastvayA Bagavan Kalu sUcitaH ..”

And the changes which generate the above:

NEW
vikalpaH¦, puM, (virudDaM kalpanamiti . vi +
kfpa + GaY .) <lb/><lb/>BrAntiH . <lb/>(yaTA, devIBAgavate <lbinfo n="devIBAga+vate"/>
. 1 . 19 . 32 .
<lb/>“vikalpopahatastvaM vE dUradeSamupAgataH .
<lb/>na me vikalpasandeho nirvvikalpo'smi sarvvaTA ..”)
<lb/><lb/>kalpanam . iti medinI . pe, .. <lb/>(yaTA, BAgavate <lbinfo n="BAga+vate"/>
. 5 . 16 . 2 .
<lb/>“tatrApi pritavrataraTacaraRapariKAtEH saptaBiH
sapta sinDavaH upakxptAH . <lb/>yata etasyAH saptadvIpaviSezavikalpastvayA <lbinfo n="sapta+dvIpaviSezavikalpastvayA"/>
Bagavan Kalu sUcitaH ..”

@funderburkjim
Copy link
Contributor

As you see, there are only two pieces of markup:

  • <lb/> to generate a line break
  • <lbinfo n="X+Y/> to resolve text with extra '-' at line breaks.

The lbinfo is awkward to write, but could be simplified such as

kfpa + GaY .) <lb/><lb/>BrAntiH . <lb/>(yaTA, devIBAgavate <lbinfo n="devIBAga+vate"/>
SIMPLER, using a special character (such as '@')
kfpa + GaY .) <lb/><lb/>BrAntiH . <lb/>(yaTA, devIBAga@vate 

Thus, at least for skd, the digitization could be changed so that

  • the display is considerably easier to read, and
  • the 'sanctity' of the original digitization is maintained.

@funderburkjim
Copy link
Contributor

For comparison to the skd-dev example above,
here is the current display of vikalpaH in skd:

image

@vvasuki
Copy link
Author

vvasuki commented Jan 16, 2023

the 'sanctity' of the original digitization is maintained.

Why put that burden on yourself? As mentioned there is a separate project focused on "sanctitiy"-preservation.
Sure - I suppose that @drdhaval2785 's scripts can insert such extra new-lines or quotes using your markup based on what users update (at csl-dev?) - it's just more (unnecessary) trouble; and is furthermore a cause for delay.

@funderburkjim
Copy link
Contributor

it's just more (unnecessary) trouble; and is furthermore a cause for delay.

What is your proposed remedy? What is your proposed path ending in a better display of skd?

@funderburkjim
Copy link
Contributor

Why put that burden on yourself? As mentioned there is a separate project focused on "sanctity"-preservation.

[Here is link to 'lazarus project' : https://cceh.uni-koeln.de/portfolio/lazarus/]

I think this ('sanctity ...') remains a responsibility of CDSL.
We can't just say 'Oh, someone else is taking care of this aspect.'

However, we are not restricted to only this task.
We are free to create better displays, for instance better displays for skd.
@vvasuki Are you interested in leading an effort for a better skd?

@vvasuki
Copy link
Author

vvasuki commented Jan 21, 2023

We are free to create better displays, for instance better displays for skd. @vvasuki Are you interested in leading an effort for a better skd?

No. All I want is for users (myself included) to be able to add superior presentation markup wherever they care to while referring to the dict, and for maintainers not to reject such improvements out of hand. So, it should be written down in some contribution policy somewhere.

And, @drdhaval2785 - please clear backlog at https://github.com/sanskrit-lexicon/csl-ldev/pulls - I recently thought of editing some typo, but gave up upon seeing it.

I think this ('sanctity ...') remains a responsibility of CDSL. We can't just say 'Oh, someone else is taking care of this aspect.'

CDSL is free to burden itself of course, but I am curious why you think you can't just say 'Oh, someone else is taking care of this aspect.'

@drdhaval2785
Copy link
Collaborator

Will clear backlog soon.

@vvasuki
Copy link
Author

vvasuki commented Feb 7, 2024

Related, but insufficient - sanskrit-lexicon/COLOGNE#419

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc Improvements or additions to documentation question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants