Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

csl-inflect status #2 #3

Open
funderburkjim opened this issue Dec 13, 2019 · 12 comments
Open

csl-inflect status #2 #3

funderburkjim opened this issue Dec 13, 2019 · 12 comments
Labels
documentation Improvements or additions to documentation

Comments

@funderburkjim
Copy link
Contributor

Some of the inflection coverage limitations mentioned in #1 have been reduced. These pertain to verb conjugations.

The file calc_distrib.txt has counts of number of inflected verb forms in various categories. From the 'aggregated models' section of the file, we can see how many additional forms have been added in this round of enhancements.

Counts of previous form

Previously, the verbal forms included:

42300 spcltense-a-am  Forms for special tenses (present, imperfect, imperative, optative),
                                      the 'a' conjugation classes of roots (i.e., classes 1, 4, 6, and 10),
                                      and active or middle voices
24840 spcltense-passive Forms for the four special tenses, with passive voice.

67140 total conjugational forms

Counts of additional forms

03377 spcltense-b-am Forms for special tenses (present, imperfect, imperative, optative),
                                      the other conjugation classes of roots (i.e., classes 2, 3, 5, 7, 8, 9),
                                      and active or middle voices
10521 fut future tense, active/middle voices
10611 pft periphrastic future tense, active/middle voices
10521 con conditional tense, active/middle voices

02169 ben benedictive tense, active/middle voices
01713 prf  perfect tense, active/middle voices
00263 ppf periphrastic perfect tense, active/middle voices
01150 aor aorist tense, active/middle voices

40325 total conjugational forms

107465 Total of previous and additional conjugational forms.

The following comments summarize the methodology used for the additional forms.

@funderburkjim
Copy link
Contributor Author

All of the additional work was done with substantial guidance from the text A Sanskrit Primer by
Madhav M. Deshpande, 2003.

future tense

Future tense conjugation tables are computed by joining a base for the 'sya' future to endings which
are the the same as those for the present tense - active or middle voice. This joining is computed
in the conjugate_from_bases.py program.

The base for the 'sya' future is computed by the bases_test2.py program. This program uses a previous algorithm to get a future base and then adds the 'sya' suffix to this base, taking into account whether an 'i' needs to be inserted. The previous algorithm is part of the very complicated test2.py program, which is based on Kale (Kale's Higher Sanskrit Grammar).

@funderburkjim
Copy link
Contributor Author

periphrastic future tense

Following Deshpande's suggestion (p. 296), a base for the periphrastic future is formed by

  • computing the infinitive of the root, using a part of the test2.py program, and
  • then dropping the ending um of that infinitive.

This computation is done by the bases_test2.py program.
Then the conjugation is obtained by a simple addition to the base of endings appropriate for the periphrastic future.

For example:

python3 conjugate_one_v2.py _,a,pft kzip md

Conjugation of _,a,pft kzip

Case S D P
3p kzeptA kzeptArO kzeptAraH
2p kzeptAsi kzeptAsTaH kzeptAsTa
1p kzeptAsmi kzeptAsvaH kzeptAsmaH

@funderburkjim
Copy link
Contributor Author

conditional tense

From p.327 of Deshpande:

The conditional mood paradigms look like a combination of the '-sya' future base with the past
imperfect augment 'a' and terminations.

The bases_test2.py program computes the future base (as described above) and adds the 'a' affix.
The result is taken as the base for the conditional tense.

The conjugate_from_bases program then joins this base to the endings for the active or middle voice, and these endings are the same as for the imperfect active/middle endings.

For example:

python3 conjugate_one_v2.py _,a,con gam md

Conjugation of _,a,con gam

Case S D P
3p agamizyat agamizyatAm agamizyan
2p agamizyaH agamizyatam agamizyata
1p agamizyam agamizyAva agamizyAma

@funderburkjim
Copy link
Contributor Author

benedictive tense

Benedictive conjugations are given only for those roots and voices given by Deshpande in Lesson 38.

benedictive base

For the benedictive bases, we begin with a digitization of the benedictive 3rd singular from Deshpande's table on pages 330-335; this digitization is in file benedictive_3s.txt.
From a 3rd singular form, we derive a base:

  • if the 3s form is for the active voice, then that 3s form ends with 'yAt'; we drop that 'yAt' and
    consider the remainder to be the base
  • if the 3s form is for the middle voice, that that 3s form ends with either 'sizwa' or 'zizwa' (recall
    we are using slp1 transliteration to spell Sanskrit); we drop those final 5 characters, and consider
    the remainder to be the base.
    • We also note whether the dropped suffix is 'sizwa' or 'zizwa', remembering 's' or 'z'. This
      will be needed when combining the base with the benedictive endings.
      For example:
  • the benedictive 3s for root 'ad' in active voice is 'adyAt', and 'ad' is the base.
  • the benedictive 3s for root 'Ikz' in middle voice is 'IkzizIzwa', and 'Ikzi' is the base.

benedictive endings

Benedictive endings active voice

Case S D P
3p yAt yAstAm yAsuH
2p yAH yAstam yAsta
1p yAsam yAsva yAsma

Benedictive endings middle voice

Case S D P
3p sIzwa sIyAstAm sIran
2p sIzWAH sIyAsTAm sIDvam
1p sIya sIvahi sImahi

combining benedictive base and endings

The combination of benedictive base and endings involves no sandhi in the active voice,
and at most one sandhi ('s' to 'z') in the middle voice.
Examples:

Conjugation of _,a,ben ad (base = ad)

Case S D P
3p adyAt adyAstAm adyAsuH
2p adyAH adyAstam adyAsta
1p adyAsam adyAsva adyAsma

Conjugation of _,m,ben Ikz (base = Ikzi, endings start with z)

Case S D P
3p IkzizIzwa IkzizIyAstAm IkzizIran
2p IkzizIzWAH IkzizIyAsTAm IkzizIDvam
1p IkzizIya IkzizIvahi IkzizImahi

Conjugation of _,m,ben kzip (base = kzip, endings start with s)

Case S D P
3p kzipsIzwa kzipsIyAstAm kzipsIran
2p kzipsIzWAH kzipsIyAsTAm kzipsIDvam
1p kzipsIya kzipsIvahi kzipsImahi

@funderburkjim
Copy link
Contributor Author

Perfect tense

Although test2.py has logic for computing perfect tense conjugations, that logic is extremely
complicated, and difficult to 'tweak'. Thus, rather than using test2.py directly, we devise another
simpler, though less algorithmic, method.

perfect_3p.txt

The file perfect_3p.txt.
is a digitization of the perfect 3rd person perfect forms (in singular, dual and plural, for selected active and middle voices) from Deshpande's table on pages 305-310; this digitization is in file

This file is used to check the 3rd person values of our derived perfect tense conjugations,
Also, we currently only compute perfect conjugations for the roots and voices appearing in
Deshpande's table. Note that this provides no independent confirmation of our derivations of
1st person and 2nd person perfect forms.

Strategy for derivation

According to my reading of Kale, pages 306-7, a perfect conjugation table can be
derived for a given root and voice (active/middle) from a table of endings and from
four pieces of information derived from the root:

  • a reduplicated base to be used before strong endings
    • the singular active voice endings
  • a (possibly different) reduplicated base to be used before weak endings
    • dual or plural active voice endings
    • singular, dual or plural middle voice endings (i.e., any middle voice ending)
  • a sew-code relevant for all endings EXCEPT the Ta ending of the 2nd person singular active voice.
    This sew-code has one of three values:
    • sew which means that 'i' is inserted between the base and ending
    • aniw which means that 'i' is NOT inserted between the base and ending
    • vew which means that 'i' is optionally inserted between the base and ending
  • a (possibly different) sew-code that applies just to the Ta ending of the 2nd person singular active voice.

@funderburkjim
Copy link
Contributor Author

perfect tense implementation

initialization of models

We start with the roots and voices from Deshpande's table on pages 305-310,
in the file verb_cp_deshpande_305.txt.
From this in constructed models/calc_models_prf.txt (see models/redo.sh).
Essentially, this models file contain the roots and voices from Deshpande's table.

initialization of bases

The perfect_bases_test2.py program is used once to initialize the 4-part base for the
perfect models.
It does this by referencing several parts of the test2.py program.
The result is the bases/perfect_bases.txt file.
This file was subsequently modified manually, as described below.

perfect tense endings

These are take from Deshpande p. 303, or Kale p. 306-7.
###Perfect Active terminations (bold = strong)

Person S D P
3p a atuH uH
2p iTa aTuH a
1p a va ma

perfect Middle terminations

Person S D P
3p e Ate ire
2p se ATe Dve
1p e vahe mahe

@funderburkjim
Copy link
Contributor Author

Perfect tense combination of base and endings

As with other parts of the derivation of perfect tense conjugations, the combination of base with
endings is itself intricate. In our programs:

  • tables/conjugate_from_bases.py reads a record from the bases/perfect_bases.txt file and
    prepares to combine the 4-part base with the appropriate voice for a given root (from the bases/perfect_bases.txt) and a given ending
  • then the perfect_join program actually carries out the generation of inflections by
    • adding an 'i' insert between base and ending when appropriate
    • performing needed sandhis.

testing the conjugation table

After completing the conjugation table, conjugate_from_bases compares the 3rd person forms
to those in the tables/perfect_3p.txt file (digitization of Deshpande's table of perfect forms).
Any differences are printed.

iteration

A process of iteration was used to resolve discrepancies between the 3rd person forms and those
of Deshpande. This involved a few changes to bases/perfect_bases.txt as well as refinement of
the perfect_join program. Currently, there are no discrepancies between the 3rd person forms and those of Deshpande.

@funderburkjim
Copy link
Contributor Author

Periphrastic perfect tense

Although it was not mentioned in the above discussion of the perfect tense, not all roots take the
reduplicative perfect tense. If a root does not take the reduplicative perfect tense, then it will take
the periphrastic perfect tense. A few roots will take both the reduplicative and periphrastic perfect.

Currently, we restrict the periphrastic perfect to roots mentioned in Deshpande's perfect tense
tables on pages 305-310.

The bases are taken from the file bases/ppfactn.txt. This file was initialized programmatically:

python3 ppf_bases_test2.py ../models/calc_models_ppf.txt temp_ppfactn.txt

ppfactn.txt was then modified slightly to be in accordance with Deshpande.

Periphrastic perfect conjugation tables can be constructed for a given root and voice (active/middle) by prefixing the base to the reduplicative perfect conjugation table of the root kf in the corresponding voice.

For example, the base for the root Ikz is IkzAm. The middle voice periphrastic perfect conjugation
of Ikz joins the base to the middle voice perfect conjugation of kf:
Conjugation of _,m,prf kf

Person S D P
3p cakre cakrAte cakrire
2p cakfse cakrATe cakfDve
1p cakre cakfvahe cakfmahe

The resulting conjugation for Ikz is then:
Conjugation of _,m,ppf Ikz

Person S D P
3p IkzAYcakre IkzAYcakrAte IkzAYcakrire
2p IkzAYcakfze IkzAYcakrATe IkzAYcakfQve
1p IkzAYcakre IkzAYcakfvahe IkzAYcakfmahe

Note the final 'm' of the base IkzAm has a sandhi change to palatal nasal Y (slp1 spelling) before
the palatal c of cakre.

It is also the case that the perfect conjugations of as (to be) or BU (to become) may be
used instead of the perfect conjugations of kf.

Currrently, we only use the perfect conjugations of kf.

@funderburkjim
Copy link
Contributor Author

aorist tense

The previous coding of conjugation algorithms (pysanskritv1/test2.py) includes an attempt to
transcribe the material in Kale on aorist forms. However, this previous work is inadequate. Rather
than attempt to upgrade it, I have chosen simply to manually digitize the forms provided by Deshpande in Lesson 37.

These Deshpande aorist forms are in two files:

  • tables_aorist.txt contains
    • 13 full conjugation tables
    • 230 partial conjugation tables, with the 3rd person forms only. Unknown forms (2nd person
      and 1st person) appear with value '?'
  • tables_aorist_passive.txt contains 193 partial conjugation tables, with only the 3rd person singular passive form, ending in 'i'. Other forms
    appear with '?' to represent unknown values.

@funderburkjim funderburkjim added the documentation Improvements or additions to documentation label Dec 14, 2019
@funderburkjim
Copy link
Contributor Author

spcltense-b-am

These are the special tenses (pre, ipf, ipv, opt) in active and middle voices for roots in conjugational classes 2,3,5,7,8,9.

The derivations of conjugations for these cases are more complex than the corresponding derivations for roots of classes 1,4,6 and 10. Deshpande (p. 203) summarises the differences:

The conjugations 2. 3, 5, 7, 8 and 9 are different from the conjugations 1, 4, 6, and 10, in that that the verbal base in the latter conjugations ends in -a, while the verbal base in the first group of conjugations does not end in -a. This fact leads to a greater sandhi impact of the final affixes on vowels and consonants of the verbal base in these conjugations. In order to appreciate this impact, the final affixes may be divided between those with strong bases and weak bases.

The approach taken currently is similar for each of the 6 conjugation classes:

  • Restrict the conjugations to the roots/voices presented by Deshpande.
    • There are model files for each class. e.g. models_1_2.txt contains all the class 2 roots/voices for which Deshpande presents conjugations. In addition, this file contains other class 2 roots/voices
      from Monier-Williams dictionary, but commented out (i.e., conjugation tables are not currently
      prepared for these commented out roots).
  • The bases are generally just the root. These bases are not actually used.
  • Generate conjugation table candidates for the Deshpande models. This is done programmatically,
    using pysanskritv1/conjugate_one_v1.py program. This program uses previous work (i.e., parts of
    test2.py program) to generate the conjugations. This may be done by a script. For class 2:
    sh conjugate_one_v1.sh ../models/models_1_2.txt > temp_1_2.txt
    temp_1_2.txt is taken as the initial value of the tables_1_2.txt
    It includes conjugations for the 4 special tense (pre, ipf, ipv, opt).
  • Compare the computed conjugations in tables_1_2.txt with those published by Deshpande;
    resolve differences. There are few differences, and usually these are resolved by choosing
    Deshpande's version. Also, comments are added.
  • After editing, the tables_1_c.txt files are used as the final result for these conjugations.

After working through the comparisons with Deshpande, I feel confidence in the derivations from
prior work. One viable avenue for extending the conjugation tables to other roots not in
Deshpande would be to use conjugate_one_v1.sh for other sets of models.

@funderburkjim
Copy link
Contributor Author

This concludes my initial documentary comments on the extension to the verbal forms provided by csl-inflect repository.

@gasyoun
Copy link
Member

gasyoun commented Oct 18, 2020

Detailed, as usual. Only now I manage to read some of the older documentation. Without that the code would be dead after a while.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

2 participants