Skip to content

A rest-api based webapp to access various dictionaries stored in a database.

License

Notifications You must be signed in to change notification settings

indic-dict/dict-api

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

74 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Table of contents generated using this app

Introduction

We seek to provide a functional rest-api based webapp to access various dictionaries stored in a database. We intend for the database to serve a huge, eclectic mix of dictionaries which are not already available in a single spot.

Motivations

(in decreasing order of importance)

  • Any web developer should be able to make simple REST-API calls to our backend to easily get entries for a lot of dictionaries (most of whom aren't already available in such manner).
    • Imagine being able to highlight a word and use the context menu while reading a text to get it's meanings, grammatical info (linga, vibhakti, vachana, puruSha) etc..
    • Imagine being able to make simple commentaries (like this) with a few mouse clicks per word (selecting the meaning most appropriate to the context).
  • End users ( sanskrit students and scholars ) should be able to thoroughly investigate a term in the dictionaries using just a browser, without having to install any software.
  • End users should be able to point out errors (eg. via a link).
  • Users should be able to log in and submit new words to a "user dictionary" (as in <sanskritdictionary.de> or wiktionary, but using convenience of openId and less fuss.)

Detailed design

Intro to the data

  • We have several projects which collect dictionaries for various languages (link). These dictionaries are further subcategorized by topic, entry language etc.. (eg. Ayurveda, sa-german etc..).
  • Simplest programmatic current way to access dictionary data is through babylon files (some here - open one to check out the format).
  • We ultimately have the (extensible) JSON structure exemplified below for each entry:
{
  "_id": "stardict-sanskrit__sa-head__sa-entries__vAchaspatyam-sa__vAchaspatyam-sa__22721",
  "_rev": "1-bce5a5d86829ba751b32be7839771d42",
  "entry": {
    "text": "{@कटि(टी)@}¦ स्त्री कट-इन्। <br><br>१ श्रोणिदेशे(काकांल)। कृदिकारान्तत्वात् वा ङीप्। <br>“सहासनमभिप्रेप्सुरुत्कृष्टस्पाप्यपकृ-ष्टजः। अट्यां कृताङ्कोनिर्वास्यः” मनुः। <br>“कटिश्च तस्या-तिकृतप्रमाणा” भा॰ व॰ <br><br>१० <br><br>५४ । <br>“सव्येन च कटीदेशेगृह्य वाससि पाण्डवः” भा॰ आ॰ <br><br>१६ <br><br>३ <br>“कटिस्तु हरतेमनः” सा॰ द॰। तत्र तच्छब्दस्य ग्राम्यत्वमुक्तम्। ङीबन्तः<br><br>२ पिप्पल्पां स्त्री मेदि॰"
  },
  "jsonClass": "DictEntry",
  "location": {
    "dictionaryId": "stardict-sanskrit__sa-head__sa-entries__vAchaspatyam-sa__vAchaspatyam-sa",
    "entryNumber": 22721,
    "jsonClass": "DictLocation",
  },
  "headwords": [
    {
      "text": "kaTi"
    },
    {
      "text": "कटि"
    }
  ]
}
  • For dictionary metadata, we have the following structure:
{
"_id": "amarakosha",
"name": "अमरकोशः",
"authors": ["अमरसिंहः"],
"licenseLink": "http://some-link",
"canonicalSource": "http://some-link",
"categories": ["sanskrit to sanskrit", "thesaurus"],
"issuePage": "https://github.com/sanskrit-coders/stardict-sanskrit/issues"
}

Web API

Actually, whatever couchdb provides is enough. Examples below are from the vedavaapi.org deployment (for https equivalents and other servers, see the "Deployment" section).

  • Querying dictionary entries: use the dict_entries database and the index_headwords index
    • Get a particular dictionary entry: link.
    • Get a grouped count of some words starting with ast: link .
    • Get a list of 10 entries starting with कटि : get all
      • This also gives you the offset for the starting result (useful for randomizing, stats etc..).

General reference and tips

Potential simplification

In the ideal case, we would have the following (from the view of simplicity) (in decreasing order of importance):

  • /words/xyz yields the appropriate entry if it exists; or returns a list of n=40 words starting with that substring 'xyz' - from all dictionaries.
  • /dictionaries/dictionaryId/words/xyz - same as above, restricted to one dictionary.
  • /dictionaries/dictionaryCategory/words/xyz does the same - except for all dictionaries in a given category.
  • /words_with_substring/xyz etc..

UI

(in decreasing order of importance)

  • Should be mobile friendly, with flowing text.
  • Searching:
    • User is able to search for a word in multiple dictionaries.
    • As a user types, a dropdown of suggestions appear, thereby avoiding unnecessary strain.
    • User can restrict search to certain dictionaries or dictionary categories.
    • Complex (ok if slower) searches based on substrings (or eventually regular expressions), matching headwords.
      • Complex search within entries.
  • Results:
    • A term can appear in many dictionaries, the user should be able to quickly navigate to the dictionary or dictionary-category of his choice.
    • Some history (what words were recently looked up).
  • Stats:
    • Collect site use stats (eg. using google analytics).
    • Submit lookup stats to the server.
  • Example from Goldendict (desktop) here.

Design choices

Ideas

  • Use no-sql database with good replication characteristics, like couchdb. couchdb already provides some simple API, and it is extensible.
  • Leverage the rest api that couchdb provides to the extant possible.
  • Use offline catching intelligently to minimize communicating with the server. Caching ideas:
    • Headword-dictionary mappings

Software choices

  • Pouchdb has size limitations due to its use of JS indexed Db, so we cannot use it heavily.
  • Using Polymer for example frontend because of its utility and its easy compatibility with web components standards.

Deployment

Database repilicas

UI deployments

  • Polymer app
  • You want to host copies (or even develop a superior UI?) and make things faster for folks in your geographical area? Just open an issue in this project and let us know. We'd love to list it here.

About

A rest-api based webapp to access various dictionaries stored in a database.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published