-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUR Study #37
Comments
|
|
Here is my scan: https://vk.com/samskrtamru?w=wall-88831040_13310
Sounds like a pity.
Guess is something you can do yourself @Andhrabharati with the pull github function? |
Just fyi, I have been doing all such stuff myself, and (unfortunately!) I do much more than what cologne team can 'accept' (when I feel it leads to a better 'presentation' of the text, I leave no stone unturned). |
@Andhrabharati In the current Cologne system, a given dictionary xxx exists in three related forms:
There is also a stardict dictionary form created in https://github.com/sanskrit-lexicon/cologne-stardict repository (which Dhaval maintains completely). So, if you create a 'better' form of some xxx.txt, then that may be incompatible with the make_xml.py or with the php display code. I think you find this incompatibility frustrating. On the other hand, many changes can be made to xxx.txt that ARE compatible with the Cologne system. In regard to your particular suggestions re Burnouf dictionary, I suggest you fill in the Greek text in csl-orig/v02/bur/bur.txt. This is a kind of change which is should cause no compatibility problems. Once this is done, let's discuss further the idea of 'promoting' the |
@Andhrabharati Just realized you will likely be starting with bur.txt as it exists in this @drdhaval2785 Suppose AB adds greek text to this devanagari version of bur.txt. |
Invertibility is taken care of. echo "Convert to Devanagari."
mkdir -p ../v02/$1
python3 to_devanagari.py $1
echo "Convert back to SLP1."
python3 to_slp1.py $1
echo "Store differences in ../diff/$1.txt."
diff ../slp1/$1.txt ../../csl-orig/v02/$1/$1.txt > ../diff/$1.txt
echo "Complete."
Once the script is run, manually see that the diff folder holds all files with 0 bytes i.e. there is no difference. This way invertibility is ensured. When a change is made in csl-devanagari filesSee carry_changes_to_cslorig.sh dicts=(wil yat gst ben mw72 lan cae md mw shs ap90 mwe bor ae bur stc pwg gra pw ccs sch bop armh vcp skd inm vei pui bhs acc krm ieg snp pe pgn mci)
echo "STARTED TAKING CORRECTIONS FROM CSL-DEVANAGARI TO CSL-ORIG";
for dict in ${dicts[@]};
do
echo $dict
python3 to_slp1.py $dict
cp ../slp1/$dict.txt ../../csl-orig/v02/$dict/$dict.txt
echo "";
done
Hope this takes care of your concerns about invertibility, Jim. |
Thanks for docs. Looks eminently usable. Will give it a trial run if AB uploads a version of devanagari bur.txt with Greek text. |
Dear @Andhrabharati Please update the csl-devanagari repository and use the latest file. |
latest file? is this repo being updated? |
Not at all, I keep on doing what I feel better; it's just that CDSL is 'not willing' to 'accept' to undertake the changes, if they seem different to the 'style' adopted there-- having no scope for 'real improvements'. |
I could as well just use the latest (SLP1) file from csl-orig itself, if it is just filling the Greek stuff. But that's too little a portion of the work; I point to my recent INM work in this context, wherein I did quite some changes, apart from filling the Greek stuff all in one go. (of course, it did not attract the FULL attention of Jim.) |
These pages are even at csldoc, as 'Dictionary front matter'; a misnomer for these particular pages!! |
After many years of association with CDSL, I would like to paraphrase your viewpoint so that it correctly reflects the status of collective wisdom at CDSL. CDSL is 'not willing' to adopt major changes which do not allow programmatic conversion between current version and suggested version programmatically. |
I do understand the point well, @drdhaval2785. What I fail to understand is-- while programs are being modified or even developed for small changes, why the same is NOT being done for major changes. It's just beyond my comprehension! Anyway, let's not spend more time on this, but continue the efforts in bringing the texts to "correct form" first and fill the gaps (if any). ("Presentation" can be taken up by someone sometime, if it deserves!) |
It it is not in book - we can't accept such and improvement. Even if we like it.
My scan quality is higher.
Are you ready to code it? Jim is busy with things only he can do. We do not have enough coders on board.
Exactly, thanks. |
I can show innumerable instances contradicting this, that are already present in the CDSL texts!
Yes, noticed this. How many such others do you have?
Yes, I can; but I won't (at least for time-being)! |
Yes, that is so. |
I am already halfway through my file, with many more changes already done. And I presumed giving just the ref. line ( |
From my perspective, the best form would be a copy of bur.txt with all the Greek text filled in. As a second choice, a file of changes to the lines of bur.txt. For example,,
and a similar pair of 'old/new' lines for each of the other 667 lines with greek text. As a third choice, a file of the lines changed. For example, the first Greek text appears on line 19 of bur.txt, so a file 'bur-greek.txt' would have as its first line
and similarly for the other 667 lines with greek text. |
My file has no line breaks now; all entries are in a single line. But, I prefer making the second form (but slightly different)- And few of them would be with Would this suit you? |
What about the few lines where there is more than one |
They would all be in the resp. line, unless a comment line mentions some merger (if any); otherwise all diff. strings would be present individually. |
Likely I can reliably convert your form to my second form. |
If you are interested, I can give the full etym. lines (all languages) as well, as many had undergone changes, like tagging or correcting. But probably sticking to Greek alone in the first step is preferable. |
Agree |
|
The front pages matter (p.3) clearly mentioned the points 1 and 6. [6] La barre horizontale -- sépare les mots dans un même article. This indicates that making the digital text of all the dictionaries' "Front matters" (with Google OCRing) and probably translating into English (with DeepL) would be beneficial to understand the dictionaries' well, and plan to work on them properly. Any takers for this simple task from your 'new team', @gasyoun? |
Indeed there are. I guess it would be a good idea to document them as we know them.
Not sure, not all volumes required, but will show in 2022 what I have.
Would love to see them myself.
Can you document the steps for them to be done, please? One by one. |
Hope @drdhaval2785 or @funderburkjim would be willing to give the steps. |
Just recalled that you also worked with Abbyy OCR, @gasyoun. So probably you yourself could get the first step done, by explaining to the team. Once a quick proofing for obvious errors in the OCRed text is done, translation (as and when required) could be taken up. |
@Andhrabharati Regarding Burnouf Front matter. Are you aware of |
@Andhrabharati Is your main point regarding Burnouf Front matter to make an English Translation of the front matter? |
yes, I do. In fact, I had already commented previously that even the "end matter" is lying here under the header of "front matter"! but these are just the images; and I am talking about searchable digital text. |
not really, my main intention is to have a digital text first. of course, having english text suits some people-- but there would be many people who might like to have the native language text as is. |
Yes, since 2002. https://www.youtube.com/watch?v=oXH65ISgZRo and https://www.youtube.com/c/MarcisGasuns/search?query=abbyy
Agree |
I had finished filling Greek strings in BUR few days back and just waiting for you to be free from the MBh. linking task. Here are the lines (wrt csl-orig file) as we discussed earlier (above), and hope you won't be facing much issues in using this data. BUR greek string lines (csl-org) filled.txt I just like to suggest that you handle the ; commented ones first. |
There are NO ls candidates in BUR, but quite many abbr candidates are there. Here is the list that covers most of them. And here are the language abbr. items that could be tagged first, and expanded. |
As I am doubting if you would be interested to do any further changes, not posting my full observations, but only giving some global corrections (just in case you like to correct them) below- |
One final comment before I move on to some other work- There are quite many grouped entries in this work as well (marked with et, au, ',' or otherwise), and these could be handled as done in MW. |
@Andhrabharati From first look at your greek text lines, the form should be readily useable. Will let you know when this is incorporated into bur.txt. |
Here is my full BUR file, for whatever use/worth it has to the cdsl team-- |
@funderburkjim did you had a chance to take an eye on it ever since? |
@drdhaval2785, @funderburkjim
Just thought of filling up the Greek strings in BUR, and had a quick look at the file & book contents.
There are almost 15000
<P>
entries which either do not "appear" in the CDSL online searching as of now, or are part of the prev.<L>
entry, though present in the text file. Seems most of these (if not all) have to be "promoted" to<L>
status, being alternate HWs or derived HWs etc. wrt the prev. entry.I would suggest marking them all with
<L>xxx.n
numbering, as separate entries.There are two good lists of Anubandhas (5pp.) & Dhatus (15pp.) in the book, after the p.759 (where the text file ended), which could also be digitized and added to the search.
Do not know if this already done and lying somewhere "inaccessible". (Could not see them even in the bur_orig.txt)
The text was updated successfully, but these errors were encountered: