Jellyfish is a python library for doing approximate and phonetic matching of strings.
Written by James Turk <[email protected]> and Michael Stephens.
See https://github.com/jamesturk/jellyfish/graphs/contributors for contributors.
See http://jellyfish.readthedocs.io for documentation.
Source is available at http://github.com/jamesturk/jellyfish.
String comparison:
- Levenshtein Distance
- Damerau-Levenshtein Distance
- Jaro Distance
- Jaro-Winkler Distance
- Match Rating Approach Comparison
- Hamming Distance
Phonetic encoding:
- American Soundex
- Metaphone
- NYSIIS (New York State Identification and Intelligence System)
- Match Rating Codex
>>> import jellyfish
>>> jellyfish.levenshtein_distance(u'jellyfish', u'smellyfish')
2
>>> jellyfish.jaro_distance(u'jellyfish', u'smellyfish')
0.89629629629629637
>>> jellyfish.damerau_levenshtein_distance(u'jellyfish', u'jellyfihs')
1
>>> jellyfish.metaphone(u'Jellyfish')
'JLFX'
>>> jellyfish.soundex(u'Jellyfish')
'J412'
>>> jellyfish.nysiis(u'Jellyfish')
'JALYF'
>>> jellyfish.match_rating_codex(u'Jellyfish')
'JLLFSH'
If you are interested in contributing to Jellyfish, you may want to run tests locally. Jellyfish uses tox to run tests, which you can setup and run as follows:
pip install tox # cd jellyfish/ tox