Skip to content

Commit

Permalink
Merge pull request #28 from neocl/dev
Browse files Browse the repository at this point in the history
Release version 0.1a9
  • Loading branch information
letuananh authored Apr 19, 2021
2 parents b55f035 + c6c5d73 commit 1bd1332
Show file tree
Hide file tree
Showing 10 changed files with 245 additions and 47 deletions.
16 changes: 0 additions & 16 deletions CHANGES.md

This file was deleted.

55 changes: 28 additions & 27 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,24 +18,28 @@

Homepage: [https://github.com/neocl/jamdict](https://github.com/neocl/jamdict)

[Contributors](#contributors) are welcome! 🙇
[Contributors](#contributors) are welcome! 🙇. If you want to help, please see [Contributing](https://jamdict.readthedocs.io/en/latest/contributing.html) page.

# Try Jamdict out

There is a demo Jamdict virtual machine to try out on the web on Repl.it: https://replit.com/@tuananhle/jamdict-demo

# Installation

Jamdict & Jamdict database are both available on [PyPI](https://pypi.org/project/jamdict/) and can be installed using pip

```bash
pip install jamdict jamdict-data
pip install --upgrade jamdict jamdict-data
```

# Sample jamdict Python code

```python
from jamdict import Jamdict
jmd = Jamdict()
jam = Jamdict()

# use wildcard matching to find anything starts with 食べ and ends with る
result = jmd.lookup('食べ%る')
result = jam.lookup('食べ%る')

# print all word entries
for entry in result.entries:
Expand Down Expand Up @@ -108,21 +112,21 @@ The terminology of radicals/components used by Jamdict can be different from els

By default jamdict provides two maps:

- jmd.krad is a Python dict that maps characters to list of components.
- jmd.radk is a Python dict that maps each available components to a list of characters.
- jam.krad is a Python dict that maps characters to list of components.
- jam.radk is a Python dict that maps each available components to a list of characters.

```python
# Find all writing components (often called "radicals") of the character 雲
print(jmd.krad[''])
print(jam.krad[''])
# ['一', '雨', '二', '厶']

# Find all characters with the component 鼎
chars = jmd.radk['']
chars = jam.radk['']
print(chars)
# {'鼏', '鼒', '鼐', '鼎', '鼑'}

# look up the characters info
result = jmd.lookup(''.join(chars))
result = jam.lookup(''.join(chars))
for c in result.chars:
print(c, c.meanings())
# 鼏 ['cover of tripod cauldron']
Expand All @@ -136,7 +140,7 @@ for c in result.chars:

```bash
# Find all names with 鈴木 inside
result = jmd.lookup('%鈴木%')
result = jam.lookup('%鈴木%')
for name in result.names:
print(name)

Expand All @@ -154,30 +158,27 @@ for name in result.names:
## Exact matching
Use exact matching for faster search
Use exact matching for faster search.
```python
# Find an entry (word, name entity) by idseq
result = jmd.lookup('id#5711308')
print(result.names[0])
# [id#5711308] すすき (鈴木) : Susuki (family or surname)
result = jmd.lookup('id#1467640')
print(result.entries[0])
# ねこ (猫) : 1. cat 2. shamisen 3. geisha 4. wheelbarrow 5. clay bed-warmer 6. bottom/submissive partner of a homosexual relationship
Find the word 花火 by idseq (1194580)
# use exact matching to increase searching speed (thanks to @reem-codes)
result = jmd.lookup('')
```python
>>> result = jam.lookup('id#1194580')
>>> print(result.names[0])
[id#1194580] はなび (花火) : fireworks ((noun (common) (futsuumeishi)))
```
for entry in result.entries:
print(entry)
Find an exact name 花火 by idseq (5170462)
# [id#1467640] ねこ (猫) : 1. cat ((noun (common) (futsuumeishi))) 2. shamisen 3. geisha 4. wheelbarrow 5. clay bed-warmer 6. bottom/submissive partner of a homosexual relationship
# [id#2698030] ねこま (猫) : cat ((noun (common) (futsuumeishi)))
```python
>>> result = jam.lookup('id#5170462')
>>> print(result.names[0])
[id#5170462] はなび (花火) : Hanabi (female given name or forename)
```
See `jamdict_demo.py` and `jamdict/tools.py` for more information.
# Official website
# Useful links
* JMdict: [http://edrdg.org/jmdict/edict_doc.html](http://edrdg.org/jmdict/edict_doc.html)
* kanjidic2: [https://www.edrdg.org/wiki/index.php/KANJIDIC_Project](https://www.edrdg.org/wiki/index.php/KANJIDIC_Project)
Expand All @@ -189,4 +190,4 @@ See `jamdict_demo.py` and `jamdict/tools.py` for more information.
- [Le Tuan Anh](https://github.com/letuananh) (Maintainer)
- [Matteo Fumagalli](https://github.com/matteofumagalli1275)
- [Reem Alghamdi](https://github.com/reem-codes)
- [alt-romes](https://github.com/alt-romes)
- [alt-romes](https://github.com/alt-romes)
Binary file added docs/_static/jamdict_db_schema.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
87 changes: 87 additions & 0 deletions docs/contributing.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
.. _contributing:

Contributing
============

There are many ways to contribute to the Jamdict project.
The one that Jamdict development team are focusing on at the moment are:

- Fixing :ref:`existing bugs <contrib_bugfix>`
- Improving query functions
- Improving :ref:`documentation <contrib_docs>`
- Keeping jamdict database up to date

If you have some suggestions or bug reports, please share on `jamdict issues tracker <https://github.com/neocl/jamdict/issues>`_.

.. _contrib_bugfix:

Fixing bugs
-----------

If you found a bug please report at https://github.com/neocl/jamdict/issues

When it is possible, please also share how to reproduce the bugs and a snapshot of jamdict info to help with the bug finding process.

.. code:: bash
python3 -m jamdict info
Pull requests are welcome.

.. _contrib_docs:

Updating Documentation
----------------------

1. Fork `jamdict <https://github.com/neocl/jamdict>`_ repository to your own Github account.

#. Clone `jamdict` repository to your local machine.

.. code:: bash
git clone https://github.com/<your-account-name>/jamdict
#. Create a virtual environment (optional, but highly recommended)

.. code:: bash
# if you use virtualenvwrapper
mkvirtualenv jamdev
workon jamdev
# if you use Python venv
python3 -m venv .env
. .env/bin/activate
python3 -m pip install --upgrade pip wheel Sphinx
#. Build the docs

.. code:: bash
cd jamdict/docs
# compile the docs
make dirhtml
# serve the docs using Python3 built-in development server
# Note: this requires Python >= 3.7 to support --directory
python3 -m http.server 7000 --directory _build/dirhtml
# if you use earlier Python 3, you may use
cd _build/dirhtml
python3 -m http.server 7000
#. Now the docs should be ready to view at http://localhost:7000 . You can visit that URL on your browser to view the docs.

#. More information:

- Sphinx tutorial: https://sphinx-tutorial.readthedocs.io/start/
- Using `virtualenv`: https://virtualenvwrapper.readthedocs.io/en/latest/install.html
- Using `venv`: https://docs.python.org/3/library/venv.html

.. _contrib_dev:

Development
-----------

Development contributions are welcome.
Setting up development environment for Jamdict should be similar to :ref:`contrib_docs`.

Please contact the development team if you need more information: https://github.com/neocl/jamdict/issues
11 changes: 9 additions & 2 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,8 @@ Main features
Hide this for now
- jamdol (jamdol-flask) - a Python/Flask server that provides Jamdict lookup via REST API (experimental state)
:ref:`Contributors <contributors>` are welcome! 🙇
:ref:`Contributors <contributors>` are welcome! 🙇.
If you want to help developing Jamdict, please visit :ref:`contributing` page.

Installation
------------
Expand Down Expand Up @@ -70,7 +71,7 @@ Looking up named entities
[id#5053163] ディズニー : Disney (family or surname/company name)
[id#5741091] ディズニーランド : Disneyland (place name)

See :ref:`recipes` for more sample code.
See :ref:`recipes` for more code samples.

.. _commandline:

Expand Down Expand Up @@ -123,10 +124,16 @@ Documentation
tutorials
recipes
api
contributing

Other info
==========

Release Notes
-------------

Release notes is available :ref:`here <updates>`.

.. _contributors:

Contributors
Expand Down
69 changes: 69 additions & 0 deletions docs/recipes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -99,3 +99,72 @@ Use exact matching for faster search
# [id#1467640] ねこ (猫) : 1. cat ((noun (common) (futsuumeishi))) 2. shamisen 3. geisha 4. wheelbarrow 5. clay bed-warmer 6. bottom/submissive partner of a homosexual relationship
# [id#2698030] ねこま (猫) : cat ((noun (common) (futsuumeishi)))
Low-level data queries
----------------------
It’s possible to access to the dictionary data by querying database directly using lower level APIs.
However these are prone to future changes so please keep that in mind.
When you create a Jamdict object, you have direct access to the
underlying databases, via these properties
.. code:: python
from jamdict import Jamdict
jam = Jamdict()
>>> jam.jmdict # jamdict.JMDictSQLite object for accessing word dictionary
>>> jam.kd2 # jamdict.KanjiDic2SQLite object, for accessing kanji dictionary
>>> jam.jmnedict # jamdict.JMNEDictSQLite object, for accessing named-entities dictionary
You can perform database queries on each of these databases by obtaining
a database cursor with ``ctx()`` function (i.e. database query context).
For example the following code list down all existing part-of-speeches
in the database.
.. code:: python
# returns a list of sqlite3.Row object
pos_rows = jam.jmdict.ctx().select("SELECT DISTINCT text FROM pos")
# access columns in each query row by name
all_pos = [x['text'] for x in pos_rows]
# sort all POS
all_pos.sort()
for pos in all_pos:
print(pos)
For more information, please see `Jamdict database schema </_static/jamdict_db_schema.png>`_.
Say we want to get all irregular suru verbs, we can start with finding
all Sense IDs with pos = ``suru verb - irregular``, and then find all the
Entry idseq connected to those Senses.
Words (and also named entities) can be retrieved directly using their ``idseq``.
Each word may have many Senses (meaning) and each Sense may have different pos.
::
# Entry (idseq) --(has many)--> Sense --(has many)--> pos
.. note::
Tips: Since we hit the database so many times (to find the IDs, to retrieve
each word, etc.), we also should consider to reuse the database
connection using database context to have better performance
(``with jam.jmdict.ctx() as ctx:`` and ``ctx=ctx`` in the code below).
Here is the sample code:
.. code:: python
# find all idseq of lexical entry (i.e. words) that have at least 1 sense with pos = suru verb - irregular
with jam.jmdict.ctx() as ctx:
# query all word's idseqs
rows = ctx.select(
query="SELECT DISTINCT idseq FROM Sense WHERE ID IN (SELECT sid FROM pos WHERE text = ?) LIMIT 10000",
params=("suru verb - irregular",))
for row in rows:
# reuse database connection with ctx=ctx for better performance
word = jam.jmdict.get_entry(idseq=row['idseq'], ctx=ctx)
print(word)
1 change: 1 addition & 0 deletions docs/requirements.txt
Original file line number Diff line number Diff line change
@@ -1 +1,2 @@
jamdict
Sphinx
49 changes: 49 additions & 0 deletions docs/updates.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
.. _updates:

Updates
=======

2021-04-19
----------

- [Version 0.1a9]
- Fix data audit query
- Enhanced Jamdict() constructor. ``Jamdict('/path/to/jamdict.db')``
works properly.
- Code quality review
- Automated documentation build via
`readthedocs.org <https://jamdict.readthedocs.io/en/latest/>`__

.. _section-1:

2021-04-15
----------

- Make ``lxml`` optional
- Data package can be installed via PyPI with ``jamdict_data`` package
- Make configuration file optional as data files can be installed via
PyPI.

.. _section-2:

2020-05-31
----------

- [Version 0.1a7]
- Added Japanese Proper Names Dictionary (JMnedict) support
- Included built-in KRADFILE/RADKFile support
- Improved command line tools (json, compact mode, etc.)

.. _section-3:

2017-08-18
----------

- Support KanjiDic2 (XML/SQLite formats)

.. _section-4:

2016-11-09
----------

- Release first version to Github
2 changes: 1 addition & 1 deletion jamdict/__version__.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,6 @@
__url__ = "https://github.com/neocl/jamdict"
__maintainer__ = "Le Tuan Anh"
__version_major__ = "0.1"
__version__ = "{}a8".format(__version_major__)
__version__ = "{}a9".format(__version_major__)
__version_long__ = "{} - Alpha".format(__version_major__)
__status__ = "Prototype"
Loading

0 comments on commit 1bd1332

Please sign in to comment.