-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
slp1 to slp1_accented in transliteration-- Side effects!! #38
Comments
Even VCP is heavily affected by this change. |
@vvasuki |
Yes - Don't use slp1_accented on dictionaries which don't have accents! Garbage in - garbage out. Going back to a more basic design issue - why do you keep SLP1 encoding in the dicts in the first place? Maybe back in the day, unicode devanAgarI standard was not popular, so they had to do such monkey tricks. But in 2022, one can save devanAgarI data directly using devanAgarI unicode. |
Can you specify which dictionaries have accents and which do not? I will make modifications accordingly. It is not at all feasible to keep data in Devanagari unicode without unnecessary hassles. So SLP1 is going to stay for long time. I really look forward to a day when Devanagari Unicode would emulate Sanskrit consonants and vowels more naturally. It |
शब्दकल्पद्रुमः, वाचस्पत्यं च। अन्येऽपि स्युर् बहवः - ये जानन्ति, ते वदेयुः। चिता एव कोशाः स्वरं दर्शयन्ति।
Sentence is broken in the middle? Anyway, use SLP1 or ISCII or ... for internal processing as needed however much you like. You don't need to store textual data in it - that's what leads to avoidable problems such as this. EDIT: If you digitized SKD or VSP, you would use devanAgarI unicode! (as you know from your kosha project) |
Incidentally, even the tags and English text in those places (within the body matter) got converted to Devanagari, in those dictionaries. This point also needs to be addressed. |
I would appreciate examples |
In VCP, In SKD, Also SKD (in contrast) has quite a few And interestingly KRM has no such tag conversion issue. As these are not related to the accent mark, guess they need spl. attention even with slp1 conversion! |
Surprisingly, even the MW has fell a victim of this "tag conversion"! |
These tag issues could be because @drdhaval2785 's scripts are not passing some toggler arguments (which are no longer set by default) - https://github.com/indic-transliteration/indic_transliteration_py/blob/1ba2688d235eccc0c5ac629c46ac9df83ef331f7/indic_transliteration/sanscript/__init__.py#L189 . Also, suitable togglers can be used to leave non-svara-encoding |
yes, I understand it. I was informing him these tags, to be marked suitably similar to many other tags that are out of the purview of transliteration. |
I am not aware when indic_transliteration package started to require explicit togglers. I never had similar problem earier. Maybe some version update introduced this artefact. Will correct soon. |
just fyi (if you didn't notice it earlier)-- this indic_transliteration package can generate iast output as well, in addition to various other scripts (apart fron Devanagari). |
@drdhaval2785
Just noticed that this new transliteration code has unwanted effects mainly in BOR & SKD, where no accent is involved, but just a slash in normal sense is intended.
The text was updated successfully, but these errors were encountered: