Merge pull request #28 from neocl/dev

Release version 0.1a9
neocl · Apr 19, 2021 · 1bd1332 · 1bd1332
2 parents b55f035 + c6c5d73
commit 1bd1332
Show file tree

Hide file tree

Showing 10 changed files with 245 additions and 47 deletions.
diff --git a/CHANGES.md b/CHANGES.md
diff --git a/README.md b/README.md
@@ -18,24 +18,28 @@
 
 Homepage: [https://github.com/neocl/jamdict](https://github.com/neocl/jamdict)
 
-[Contributors](#contributors) are welcome! 🙇
+[Contributors](#contributors) are welcome! 🙇. If you want to help, please see [Contributing](https://jamdict.readthedocs.io/en/latest/contributing.html) page.
+
+# Try Jamdict out
+
+There is a demo Jamdict virtual machine to try out on the web on Repl.it: https://replit.com/@tuananhle/jamdict-demo
 
 # Installation
 
 Jamdict & Jamdict database are both available on [PyPI](https://pypi.org/project/jamdict/) and can be installed using pip
 
 ```bash
-pip install jamdict jamdict-data
+pip install --upgrade jamdict jamdict-data
 ```
 
 # Sample jamdict Python code
 
 ```python
 from jamdict import Jamdict
-jmd = Jamdict()
+jam = Jamdict()
 
 # use wildcard matching to find anything starts with 食べ and ends with る
-result = jmd.lookup('食べ%る')
+result = jam.lookup('食べ%る')
 
 # print all word entries
 for entry in result.entries:
@@ -108,21 +112,21 @@ The terminology of radicals/components used by Jamdict can be different from els
 
 By default jamdict provides two maps:
 
-- jmd.krad is a Python dict that maps characters to list of components.
-- jmd.radk is a Python dict that maps each available components to a list of characters.
+- jam.krad is a Python dict that maps characters to list of components.
+- jam.radk is a Python dict that maps each available components to a list of characters.
 
 ```python
 # Find all writing components (often called "radicals") of the character 雲
-print(jmd.krad['雲'])
+print(jam.krad['雲'])
 # ['一', '雨', '二', '厶']
 
 # Find all characters with the component 鼎
-chars = jmd.radk['鼎']
+chars = jam.radk['鼎']
 print(chars)
 # {'鼏', '鼒', '鼐', '鼎', '鼑'}
 
 # look up the characters info
-result = jmd.lookup(''.join(chars))
+result = jam.lookup(''.join(chars))
 for c in result.chars:
     print(c, c.meanings())
 # 鼏 ['cover of tripod cauldron']
@@ -136,7 +140,7 @@ for c in result.chars:
 
 ```bash
 # Find all names with 鈴木 inside
-result = jmd.lookup('%鈴木%')
+result = jam.lookup('%鈴木%')
 for name in result.names:
     print(name)
 
@@ -154,30 +158,27 @@ for name in result.names:
 
 ## Exact matching
 
-Use exact matching for faster search
+Use exact matching for faster search.
 
-```python
-# Find an entry (word, name entity) by idseq
-result = jmd.lookup('id#5711308')
-print(result.names[0])
-# [id#5711308] すすき (鈴木) : Susuki (family or surname)
-result = jmd.lookup('id#1467640')
-print(result.entries[0])
-# ねこ (猫) : 1. cat 2. shamisen 3. geisha 4. wheelbarrow 5. clay bed-warmer 6. bottom/submissive partner of a homosexual relationship
+Find the word 花火 by idseq (1194580)
 
-# use exact matching to increase searching speed (thanks to @reem-codes)
-result = jmd.lookup('猫')
+```python
+>>> result = jam.lookup('id#1194580')
+>>> print(result.names[0])
+[id#1194580] はなび (花火) : fireworks ((noun (common) (futsuumeishi)))
+```
 
-for entry in result.entries:
-    print(entry)
+Find an exact name 花火 by idseq (5170462)
 
-# [id#1467640] ねこ (猫) : 1. cat ((noun (common) (futsuumeishi))) 2. shamisen 3. geisha 4. wheelbarrow 5. clay bed-warmer 6. bottom/submissive partner of a homosexual relationship
-# [id#2698030] ねこま (猫) : cat ((noun (common) (futsuumeishi)))
+```python
+>>> result = jam.lookup('id#5170462')
+>>> print(result.names[0])
+[id#5170462] はなび (花火) : Hanabi (female given name or forename)
 ```
 
 See `jamdict_demo.py` and `jamdict/tools.py` for more information.
 
-# Official website
+# Useful links
 
 * JMdict: [http://edrdg.org/jmdict/edict_doc.html](http://edrdg.org/jmdict/edict_doc.html)
 * kanjidic2: [https://www.edrdg.org/wiki/index.php/KANJIDIC_Project](https://www.edrdg.org/wiki/index.php/KANJIDIC_Project)
@@ -189,4 +190,4 @@ See `jamdict_demo.py` and `jamdict/tools.py` for more information.
 - [Le Tuan Anh](https://github.com/letuananh) (Maintainer)
 - [Matteo Fumagalli](https://github.com/matteofumagalli1275)
 - [Reem Alghamdi](https://github.com/reem-codes)
-- [alt-romes](https://github.com/alt-romes)
+- [alt-romes](https://github.com/alt-romes)
diff --git a/docs/_static/jamdict_db_schema.png b/docs/_static/jamdict_db_schema.png
diff --git a/docs/contributing.rst b/docs/contributing.rst
@@ -0,0 +1,87 @@
+.. _contributing:
+
+Contributing
+============
+
+There are many ways to contribute to the Jamdict project.
+The one that Jamdict development team are focusing on at the moment are:
+
+- Fixing :ref:`existing bugs <contrib_bugfix>`
+- Improving query functions
+- Improving :ref:`documentation <contrib_docs>`
+- Keeping jamdict database up to date
+
+If you have some suggestions or bug reports, please share on `jamdict issues tracker <https://github.com/neocl/jamdict/issues>`_.
+
+.. _contrib_bugfix:
+
+Fixing bugs
+-----------
+
+If you found a bug please report at https://github.com/neocl/jamdict/issues
+
+When it is possible, please also share how to reproduce the bugs and a snapshot of jamdict info to help with the bug finding process.
+
+.. code:: bash
+
+   python3 -m jamdict info
+
+Pull requests are welcome.
+
+.. _contrib_docs:
+
+Updating Documentation
+----------------------
+
+1. Fork `jamdict <https://github.com/neocl/jamdict>`_ repository to your own Github account.
+
+#. Clone `jamdict` repository to your local machine.
+
+   .. code:: bash
+      
+      git clone https://github.com/<your-account-name>/jamdict
+      
+#. Create a virtual environment (optional, but highly recommended)
+
+   .. code:: bash
+
+      # if you use virtualenvwrapper
+      mkvirtualenv jamdev
+      workon jamdev
+
+      # if you use Python venv
+      python3 -m venv .env
+      . .env/bin/activate
+      python3 -m pip install --upgrade pip wheel Sphinx
+
+#. Build the docs
+
+   .. code:: bash
+
+      cd jamdict/docs
+      # compile the docs
+      make dirhtml
+      # serve the docs using Python3 built-in development server
+      # Note: this requires Python >= 3.7 to support --directory
+      python3 -m http.server 7000 --directory _build/dirhtml
+      # if you use earlier Python 3, you may use
+      cd _build/dirhtml
+      python3 -m http.server 7000
+
+#. Now the docs should be ready to view at http://localhost:7000 . You can visit that URL on your browser to view the docs.
+
+#. More information:
+
+   - Sphinx tutorial: https://sphinx-tutorial.readthedocs.io/start/
+   - Using `virtualenv`: https://virtualenvwrapper.readthedocs.io/en/latest/install.html
+   - Using `venv`: https://docs.python.org/3/library/venv.html
+
+.. _contrib_dev:
+
+Development
+-----------
+
+Development contributions are welcome.
+Setting up development environment for Jamdict should be similar to :ref:`contrib_docs`.
+
+Please contact the development team if you need more information: https://github.com/neocl/jamdict/issues
diff --git a/docs/index.rst b/docs/index.rst
@@ -21,7 +21,8 @@ Main features
    Hide this for now
    -  jamdol (jamdol-flask) - a Python/Flask server that provides Jamdict lookup via REST API (experimental state)
 
-:ref:`Contributors <contributors>` are welcome! 🙇
+:ref:`Contributors <contributors>` are welcome! 🙇.
+If you want to help developing Jamdict, please visit :ref:`contributing` page.
 
 Installation
 ------------
@@ -70,7 +71,7 @@ Looking up named entities
    [id#5053163] ディズニー : Disney (family or surname/company name)
    [id#5741091] ディズニーランド : Disneyland (place name)
 
-See :ref:`recipes` for more sample code.
+See :ref:`recipes` for more code samples.
 
 .. _commandline:
 
@@ -123,10 +124,16 @@ Documentation
    tutorials
    recipes
    api
+   contributing
 
 Other info
 ==========
 
+Release Notes
+-------------
+
+Release notes is available :ref:`here <updates>`.
+
 .. _contributors:
 
 Contributors

diff --git a/docs/recipes.rst b/docs/recipes.rst
@@ -99,3 +99,72 @@ Use exact matching for faster search
    # [id#1467640] ねこ (猫) : 1. cat ((noun (common) (futsuumeishi))) 2. shamisen 3. geisha 4. wheelbarrow 5. clay bed-warmer 6. bottom/submissive partner of a homosexual relationship
    # [id#2698030] ねこま (猫) : cat ((noun (common) (futsuumeishi)))
 
+Low-level data queries
+----------------------
+
+It’s possible to access to the dictionary data by querying database directly using lower level APIs.
+However these are prone to future changes so please keep that in mind.
+
+When you create a Jamdict object, you have direct access to the
+underlying databases, via these properties
+
+.. code:: python
+
+   from jamdict import Jamdict
+   jam = Jamdict()
+   >>> jam.jmdict    # jamdict.JMDictSQLite object for accessing word dictionary
+   >>> jam.kd2       # jamdict.KanjiDic2SQLite object, for accessing kanji dictionary
+   >>> jam.jmnedict  # jamdict.JMNEDictSQLite object, for accessing named-entities dictionary
+
+You can perform database queries on each of these databases by obtaining
+a database cursor with ``ctx()`` function (i.e. database query context).
+
+For example the following code list down all existing part-of-speeches
+in the database.
+
+.. code:: python
+
+   # returns a list of sqlite3.Row object
+   pos_rows = jam.jmdict.ctx().select("SELECT DISTINCT text FROM pos")  
+
+   # access columns in each query row by name
+   all_pos = [x['text'] for x in pos_rows]  
+
+   # sort all POS
+   all_pos.sort()
+   for pos in all_pos:
+       print(pos)
+
+For more information, please see `Jamdict database schema </_static/jamdict_db_schema.png>`_.
+
+Say we want to get all irregular suru verbs, we can start with finding
+all Sense IDs with pos = ``suru verb - irregular``, and then find all the
+Entry idseq connected to those Senses.
+
+Words (and also named entities) can be retrieved directly using their ``idseq``.
+Each word may have many Senses (meaning) and each Sense may have different pos.
+
+::
+
+   # Entry (idseq) --(has many)--> Sense --(has many)--> pos
+
+.. note::
+   Tips: Since we hit the database so many times (to find the IDs, to retrieve
+   each word, etc.), we also should consider to reuse the database
+   connection using database context to have better performance
+   (``with jam.jmdict.ctx() as ctx:`` and ``ctx=ctx`` in the code below).
+
+Here is the sample code:
+
+.. code:: python
+
+   # find all idseq of lexical entry (i.e. words) that have at least 1 sense with pos = suru verb - irregular
+   with jam.jmdict.ctx() as ctx:
+       # query all word's idseqs
+       rows = ctx.select(
+           query="SELECT DISTINCT idseq FROM Sense WHERE ID IN (SELECT sid FROM pos WHERE text = ?) LIMIT 10000",
+           params=("suru verb - irregular",))
+       for row in rows:
+           # reuse database connection with ctx=ctx for better performance
+           word = jam.jmdict.get_entry(idseq=row['idseq'], ctx=ctx)
+           print(word)
diff --git a/docs/requirements.txt b/docs/requirements.txt
@@ -1 +1,2 @@
 jamdict
+Sphinx
diff --git a/docs/updates.rst b/docs/updates.rst
@@ -0,0 +1,49 @@
+.. _updates:
+
+Updates
+=======
+
+2021-04-19
+----------
+
+-  [Version 0.1a9]
+-  Fix data audit query
+-  Enhanced Jamdict() constructor. ``Jamdict('/path/to/jamdict.db')``
+   works properly.
+-  Code quality review
+-  Automated documentation build via
+   `readthedocs.org <https://jamdict.readthedocs.io/en/latest/>`__
+
+.. _section-1:
+
+2021-04-15
+----------
+
+-  Make ``lxml`` optional
+-  Data package can be installed via PyPI with ``jamdict_data`` package
+-  Make configuration file optional as data files can be installed via
+   PyPI.
+
+.. _section-2:
+
+2020-05-31
+----------
+
+-  [Version 0.1a7]
+-  Added Japanese Proper Names Dictionary (JMnedict) support
+-  Included built-in KRADFILE/RADKFile support
+-  Improved command line tools (json, compact mode, etc.)
+
+.. _section-3:
+
+2017-08-18
+----------
+
+-  Support KanjiDic2 (XML/SQLite formats)
+
+.. _section-4:
+
+2016-11-09
+----------
+
+-  Release first version to Github
diff --git a/jamdict/__version__.py b/jamdict/__version__.py
@@ -10,6 +10,6 @@
 __url__ = "https://github.com/neocl/jamdict"
 __maintainer__ = "Le Tuan Anh"
 __version_major__ = "0.1"
-__version__ = "{}a8".format(__version_major__)
+__version__ = "{}a9".format(__version_major__)
 __version_long__ = "{} - Alpha".format(__version_major__)
 __status__ = "Prototype"