simple search, v1.1 #26

funderburkjim · 2021-01-25T01:21:37Z

A new version of simple search is currently available under a 'test' url:

https://sanskrit-lexicon.uni-koeln.de/simplet/

The previous version is also available, under https://www.sanskrit-lexicon.uni-koeln.de/simple/

Would hope to have some users experiment with the new version before making the new version available under
https://sanskrit-lexicon.uni-koeln.de/simple/

funderburkjim · 2021-01-25T01:33:47Z

The new version can be called with parameters DICT and KEY: https://www.sanskrit-lexicon.uni-koeln.de/simplet/DICT/KEY.

But also admits optional additional parameters: /SIMPLE_INPUT/OUTPUT/ACCENT

The SIMPLE_INPUT parameter specifies the assumed spelling of KEY, and this value is visible in another menu.

When not specified in the URL, SIMPLE_INPUT defaults to 'default'. This assumes a phonetic type spelling, which
may be include IAST-type diacritics. This spelling is also not case-sensitive (i.e., all letters are lower-cased before searching).
You can enter KEY in Devanagari with the 'default' SIMPLE_INPUT.

When SIMPLE_INPUT is one of the other values (slp1, hk, itrans), then the spelling of KEY is assumed to use the peculiarities
of the chosen transcoding.

In addition to the SIMPLE_INPUT parameter, some additional enhancements have been made to better model spelling
variations in SKD and other dictionaries. I expect some additional tweaks will be discovered that can be handled within
the current model of simple_search.

funderburkjim · 2021-01-25T02:17:21Z

Working on bug exemplified by 'rupa'. Problem is that 'ru' has variants, but then the variants of 'u' are lost!

funderburkjim · 2021-01-25T03:39:42Z

Some problems from sanskrit-lexicon/COLOGNE#167

resolved

vrisapha gives varṣapa AND vṛṣabha
vacakah -> vācaka, vacaka
rupa -> rupa rūpa arbha rūpā (former version, only rupa returned in mw)
KRISNA -> kṛṣṇa kṛṣṇā (formerly no results)
- KRISN -> kf kfzRa kfSa karza kfz kruS karSana kfS kfSana kfza (WEIRD -- not sure why all those others? )

STILL unresolved

yoginaḥ : This is an inflected form. The current model tweaks a few inflected forms (nom. singular: am, aH)
More common inflected forms could probably be recognized.
gṛhastha is recognized, but not gṛhatsha
- the difference is 'st' and 'ts'. This kind of spelling difference does not fit current search model. Currently hard to solve.
hariṇyagarbha instead of hiraṇyagarbha --- Right, current model doesn't know how to handle this
kūṭstha still no results - (whereas kūṭastha is found). --

funderburkjim · 2021-01-25T03:41:56Z

@gasyoun (or others) Please point out where some 'low-hanging fruit' improvements to simple search might be.

funderburkjim · 2021-01-25T03:51:52Z

transitions should be different for slp1 than default

guru with INPUT_SIMPLE = default:
- guru guRa guRin GuRa guRana gur guRI gurU gUr Gur GuR GUr guRi GuRi
guru with INPUT_SIMPLE = slp1
- guru guRa guRin GuRa guRana gur guRI gurU gUr Gur GuR GUr guRi GuRi

They are the same.
But, it probably would be reasonable, for slp1, not to use transitions like 'r R', 'g G', etc. Although these are reasonable
for default.
Agree?

gasyoun · 2021-01-25T17:06:51Z

If I had a grandpa still alive, wish it would be you - Jim the magician. The four resolved ones work perfect.

KRISN -> kf kfzRa kfSa karza kfz kruS karSana kfS kfSana kfza (WEIRD -- not sure why all those others? )

10 results: kṛ kṛṣṇa kṛśa karṣa kṛṣ kruś karśana kṛś kṛśana kṛṣa strange over generation indeed versus just KRISNA

More common inflected forms could probably be recognized.

Right, just a few more common ones.

the difference is 'st' and 'ts'. This kind of spelling difference does not fit current search model. Currently hard to solve.

Yes, and it's not critical. Good to have, because Sanskrit words live in strange ways, but not critical, as hard to solve.

'low-hanging fruit' improvements

I would add a few inflected forms, to cover Nominative forms:
yoginaḥ
Koush for koṣa

for slp1, not to use transitions like 'r R', 'g G', etc. Although these are reasonable for default.

Agree.

gasyoun · 2021-01-27T18:27:29Z

@Andhrabharati & @funderburkjim
Tamilised Sanskrit word checklist (https://youtu.be/BW4qa0lBZX4). If I enter the:

tamilish version of Sanskrit anantha, I get ananta as planned (and cologne symlink to apidev #1).
tamilish version of Sanskrit samudhra, I get samudra as planned (and cologne symlink to apidev #1).
tamilish version of Sanskrit thara, I get tāra as planned (and cologne symlink to apidev #1).
tamilish version of Sanskrit poojana, I do not get pūjanā as planned. Even pooja does not work for pūjā
namaḥ has no possible verb listed (5 results: nāman nāma namana nama nāmana)
tamilish version of Sanskrit varahsini, I do not get vārāśini as planned. So ah can equal to ā
tamilish version of Sanskrit dourbhagya, I do not get daurbhāgya as planned. So ou can equal to au.
tamilish version of Sanskrit dhoorikrutha, I do not get dūrīkṛta as planned. We have almost all the replacements, other than the oo and ee (= ī) I guess.
tamilish version of Sanskrit vipathitha, I do not get vipattita as planned, but vipatita and vipāṭita. So t, but would want tt
tamilish version of Sanskrit natha, I do not get natā as planned.
Instead that 6 results: nata nātha naṭa naṭana naṭā nāṭa where natá is the closest, but does not contain natā.
tamilish version of Sanskrit kadachid, I do not get kadācid as planned, because it's not a base in any of the dictionaries.
Compare:
kaccid:BEN;2788,CAE;6847,CCS;4222,MD;5341,MW72;13322,MW;41751,PW;23308,PWG;14284,STC;7514
kiMcid:MW;50367,PW;27806,PWG;70717,SCH;10618

Andhrabharati · 2021-01-27T19:31:58Z

@gasyoun
Are you trying for some sort of AI (artificial Intelligence, with self-learning/building patterns!) in the search process?

Coming to the subject matter of Tamilish Sanksrit, this is kind of a better style, I should say. I have seen far worst texts, even more unimaginable than the spellings of original DLI titles (now I see that various language-wise teams are working on cleaning those titles).

And finally, why am I addressed in this?!!

gasyoun · 2021-01-27T19:40:43Z

Are you trying for some sort of AI (artificial Intelligence, with self-learning/building patterns!) in the search process?

non-AI, rule based.

Coming to the subject matter of Tamilish Sanksrit, this is kind of a better style, I should say.

Oh, ok.

I have seen far worst text

Can you show me a sample?

even more unimaginable than the spellings of original DLI titles

Let's document the worst ones?

And finally, why am I addressed in this?!!

You might have some samples I have missed above.

Andhrabharati · 2021-01-28T05:04:17Z

11. tamilish version of Sanskrit `kadachid`, I do not get `kadācid` as planned, because it's not a base in any of the dictionaries.
    Compare:
    `kaccid:BEN;2788,CAE;6847,CCS;4222,MD;5341,MW72;13322,MW;41751,PW;23308,PWG;14284,STC;7514`
    `kiMcid:MW;50367,PW;27806,PWG;70717,SCH;10618`

The reason for not finding kadācid is- its not a single word by grammatical rules (though in print, many books club the two words "कदा चित्" together) -

Look at this in MW, for example-

कदा चित्, at some time or other, sometimes, once [ID=42894.45]

So are the words like "कदा चन"-

न कदा चन, never at any time, RV. ; AV. &c. [ID=42894.4]

gasyoun · 2021-01-28T06:32:39Z

kadācid is- its not a single word by grammatical rules

I do know that. But many people still look for it as a single word. I would want to have it as an entry point.

Andhrabharati · 2021-01-28T06:59:02Z

Only way for this is to ignore the spaces in the "texts" to get such entries (and that was the way the manuscripts texts were, before the punctuation system [space, quote marks, exclamation & question marks, comma, ... ... ...] got introduced in Indian texts).

gasyoun · 2021-01-28T12:33:31Z

and that was the way the manuscripts texts were, before the punctuation system

No only Indian, same was in Latin until Middle ages.

Ref: #26 (comment)

funderburkjim · 2021-02-06T21:06:46Z

non-default spelling results limited on match

When using non-default input spelling, if the given spelling is found,
then the alternates are NOT shown. For example, azva with HK input spelling:

When using default-input spelling, all the dictionary matches shown:

This change makes semantic sense to me. What do others think?

funderburkjim · 2021-02-06T21:31:38Z

tamilish alternates

Based on examples above:

These are 'solved':

poojana -> pūjana
pooja -> pūjā
dourbhagya -> daurbhāgya
dhoorikrutha -> dūrīkṛta

These are mentioned in comment, but not believed to be problems:

vipathitha -> vipatita vipāṭita vipattita is not in MW (or any other current dictionary)
natha still does not give natA (natA not in any dictionary) -- gives words starting with 'm'.
namaḥ still no verb. Looking for nam? And, namaḥ is not a verb form AFAIK.
varahsini no results. Note vārāśini not in mw or any other dictionary

Still no matches:

kadachid no results.

Phrases as well as 'very common inflected forms' should yield results. How to do such enhancement not clear.

cut/paste good results

I've got good results with small test of cut/paste of words from wikipedia.
Capitalization no longer a problem.

unwanted substitutions

There are still sometimes too many results, as with 'natha':

16 results: mātṛ mata nātha mātā naṭa nata maṭha naṭā nāṭa maṭa matha mathan mathā māṭha māta mātha

Allowing initial 'n' to be replaced by 'm' is the main culprit in this example.
With current program design, solution to this not obvious.

many skd spelling differences now resolved.

skd usually (always?) shows the nominative singular for substantive headwords.
Many (all?) of these are now handled.
Examples:

search mw for kartri: get kartṛ (as expected, and some others)
search skd for kartri: get kāritā kartra karttā kartrī (was expecting karttā in skd)
search skd for brahman (expecting brahmā)
- 25 results: paramaḥ parama prāṇaḥ pramāṇaṃ vraṇa vraṇaḥ bhramaḥ bhrama pramā praṇāmaḥ bharaṇaḥ bharaṇaṃ bhramaṇaṃ varaṇaṃ varaṇaḥ prāṇanaṃ brahma varaṇā prāṇā braṇa varāṇaḥ vraṇahaḥ bhraṇa brahmā praṇaḥ
- got result, but lot's of 'p' 'bh' and 'v' words also. Should these be removed from
  results?
brahman in mw: 28 results: brahman parama prāṇa pramāṇa praṇam vraṇa bhrama pramā praṇāma bharaṇa bhramaṇa varaṇa prāṇana brahma bhrāmaṇa varaṇā bhrāma praṇa vraṇaha vrahman praman vraṇana paramam paraṇa parāṇa varāṇa bharama vrāṇa

gasyoun · 2021-02-06T21:38:46Z

This change makes semantic sense to me. What do others think?

Agree. As an additional option it makes sense - when you know what you actually search for.

funderburkjim · 2021-02-06T21:56:55Z

search time difference

There is a noticeable difference in search time between local machine and cologne.
Local machine (e.g. for brahman in mw) is almost instantaneous (< 1 second).
Cologne is about 8 seconds.

This is probably a combination of:

php 7.3.26 at Cologne, vs. php 8.0.0 on local machine
ssd differences

gasyoun · 2021-02-08T17:53:19Z

is almost instantaneous (< 1 second). Cologne is about 8 seconds.

And that is even after the ngrams are turned off? Cologne seems really slow on this.

brahman in mw: 28 results

Now we have a problem of over-generation.

dhoorikrutha -> dūrīkṛta

A dream come true. Thanks, @funderburkjim

funderburkjim · 2021-02-08T21:57:46Z

p/b removed.

Removed this spelling equivalence in simplet Now brahman (default/mw) gives 16 results:

brahman vraṇa bhrama bhramaṇa bharaṇa varaṇa brahma bhrāmaṇa varaṇā bhrāma vraṇana 
vraṇaha vrāṇa vrahman varāṇa bharama

Still 8+ seconds at Cologne (are you also seeing slow times in Cologne search?)

When using prior version (/simple), same search for brahman takes about 2 seconds, and
gives 3 results: brahman brāhmaṇa vrahman

With simple and simplet search engines it is hard to know the 'cause' of the differences.

Current comparisons between the prior version (simple) and dev version (simplet)

simple is faster at Cologne
simple interprets 3rd parameter (if present) as 'output spelling'
simplet interprets 3rd parameter (if present) as 'input_simple' spelling assumption
- simplet handles 'non-default' spellings such as simplet/mw/azva/hk
Capitalization: simplet (with input_simple = default) is better at cut-paste, especially with capitalization
- EXAMPLE: simple LAKSHMI (no results in mw); simplet LAKSHMI -> lakṣmī lakṣmi
simple results better precision than simplet in some cases (like brahman above).

@gasyoun As SEO expert, how do you think we should proceed?
Should we make 'simplet' the current production version of simple-search?
If not, what needs to be done to simplet to get it ready for production?

gasyoun · 2021-02-09T05:52:45Z

Still 8+ seconds at Cologne (are you also seeing slow times in Cologne search?)

No, it did not look like 8 to me, quicker, close to 3 as your experience simple. Can we write the time in seconds for the SIMPLET queries, so one compare?

Should we make 'simplet' the current production version of simple-search?

I do not see no reason for why not. Speeding it up might take longer than expected.

If not, what needs to be done to simplet to get it ready for production?

Not only production - it's ready to go outside the simple folder, Jim.

funderburkjim · 2021-02-09T20:25:54Z

Change .htaccess:

/simple/ now goes to version 1.1 (formerly called /simplet/)

/simple1.0/ now goes to version 1.0 (formerly called /simple/.

outside the simple folder

What does that mean?

gasyoun · 2021-02-10T06:51:12Z

What does that mean?

add to all dispay dropdowns
make default

funderburkjim · 2021-02-15T01:42:46Z

While trivial to add a simple option to basicdisplay's input menu,
the implementation of the functionality is another matter.

It may be better to think of the whole system of basic, list, advanced-search displays as legacy applications, which will remain as is for the foreseeable future.

Indeed with simple-search, there is no need for basic or list, in my opinion.

However, basic, list, etc. do have the advantage that they can be easily installed as local applications.
By contrast, simple-search requires a lot more resources to do its job. Simple-search can be
installed locally but it's a much bigger commitment. When docker engines are as easy to
install as xampp, then we can replace the current local implementations of basic, etc. with
docker containers, and also perhaps have more flexibility to revise basic, etc. to include simple-search capabilities.

On the other side, Advanced search has some unique features that simple-search lacks:

substring matching for headwords
full-text searching (also has substring matching).

So I think I get your idea, but that it is premature to spend much time thinking about it.

gasyoun · 2021-02-15T06:32:47Z

Indeed with simple-search, there is no need for basic or list, in my opinion.

Agree. But adding it will hurt in no way. A big remake is big. Let's do the trivial.

Advanced search has some unique features that simple-search lacks

I do not see why these options can't be implemented in simple search.

gasyoun added the enhancement New feature or request label Jan 25, 2021

funderburkjim added a commit that referenced this issue Feb 6, 2021

A couple of 'tamilish' spellings.

21c8672

Ref: #26 (comment)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

simple search, v1.1 #26

simple search, v1.1 #26

funderburkjim commented Jan 25, 2021 •

edited

Loading

funderburkjim commented Jan 25, 2021

funderburkjim commented Jan 25, 2021

funderburkjim commented Jan 25, 2021

funderburkjim commented Jan 25, 2021

funderburkjim commented Jan 25, 2021

gasyoun commented Jan 25, 2021 •

edited

Loading

gasyoun commented Jan 27, 2021 •

edited

Loading

Andhrabharati commented Jan 27, 2021 •

edited

Loading

gasyoun commented Jan 27, 2021

Andhrabharati commented Jan 28, 2021

gasyoun commented Jan 28, 2021

Andhrabharati commented Jan 28, 2021

gasyoun commented Jan 28, 2021

funderburkjim commented Feb 6, 2021

funderburkjim commented Feb 6, 2021

gasyoun commented Feb 6, 2021

funderburkjim commented Feb 6, 2021

gasyoun commented Feb 8, 2021

funderburkjim commented Feb 8, 2021

gasyoun commented Feb 9, 2021

funderburkjim commented Feb 9, 2021

gasyoun commented Feb 10, 2021

funderburkjim commented Feb 15, 2021

gasyoun commented Feb 15, 2021

simple search, v1.1 #26

simple search, v1.1 #26

Comments

funderburkjim commented Jan 25, 2021 • edited Loading

funderburkjim commented Jan 25, 2021

funderburkjim commented Jan 25, 2021

funderburkjim commented Jan 25, 2021

resolved

STILL unresolved

funderburkjim commented Jan 25, 2021

funderburkjim commented Jan 25, 2021

transitions should be different for slp1 than default

gasyoun commented Jan 25, 2021 • edited Loading

gasyoun commented Jan 27, 2021 • edited Loading

Andhrabharati commented Jan 27, 2021 • edited Loading

gasyoun commented Jan 27, 2021

Andhrabharati commented Jan 28, 2021

gasyoun commented Jan 28, 2021

Andhrabharati commented Jan 28, 2021

gasyoun commented Jan 28, 2021

funderburkjim commented Feb 6, 2021

non-default spelling results limited on match

funderburkjim commented Feb 6, 2021

tamilish alternates

Still no matches:

cut/paste good results

unwanted substitutions

many skd spelling differences now resolved.

gasyoun commented Feb 6, 2021

funderburkjim commented Feb 6, 2021

search time difference

gasyoun commented Feb 8, 2021

funderburkjim commented Feb 8, 2021

p/b removed.

gasyoun commented Feb 9, 2021

funderburkjim commented Feb 9, 2021

Change .htaccess:

gasyoun commented Feb 10, 2021

funderburkjim commented Feb 15, 2021

gasyoun commented Feb 15, 2021

funderburkjim commented Jan 25, 2021 •

edited

Loading

gasyoun commented Jan 25, 2021 •

edited

Loading

gasyoun commented Jan 27, 2021 •

edited

Loading

Andhrabharati commented Jan 27, 2021 •

edited

Loading