- Handle encoding errors #149
- Bump supported Python version to 3.8 - 3.12 #151
- Remove numpy dependency #156
- Feature: distance comparer interface #159
- Remove support for Python 3.6
- Use compiled regex expression in
create_dictionary()
(#129) - Configure module logger instead of modifying root logger (#132, #133)
- Fix suggestion
count
inlookup_compound
whenignore_words=True
(#108) - Log error message when loading dictionary fails (#109)
- Fix
replaced_words
not being updated when best match is a combi (closes #103) - Implement a way to change the edit distance comparer algorightm via
distance_algorithm
property. Available values are found inDistanceAlgorithm
- Update
editdistpy
dependency version - Update
LevenshteinFast
andDamerauOsaFast
to match the functionality of theeditdistpy
library
- Update
editdistpy
dependency version
- Fix typo of Dameruau to Damerau in various places. Can potentially break some setups that explicitly
_distance_algorithm
- Implement fast distance comparers with editdistpy
- Set
DamerauOsaFast
as the default distance comparer
- Updated
frequency_dictionary_en_82_765.txt
dictionary with common contractions - Added
_below_threshold_words
,_bigrams
,_count_threshold
,_max_dictionary_edit_distance
, and_prefix_length
when saving to pickle. (closes #93) - Implemented
to_bytes
andfrom_bytes
options to save and load pickle with bytes string - Updated data_version to 3
- Removed Python 3.4 and Python 3.5 support
- Removed numpy dependency
word_segmentation
now retains/preserves case.word_segmentation
now keeps punctuation or apostrophe adjacent to previous word.word_segmentation
now normalizes ligatures: "scientific" -> "scientific".word_segmentation
now removes hyphens prior to word segmentation (untested).- American English word forms added to dictionary in addition to British English e.g. favourable & favorable.
- Modified
load_bigram_dictionary
to allow dictionary entries to be split into only 2 parts when using a custom separator - Added dictionary files to wheels so
pkg_resources
could be used to access them
- Added
separator
argument to allow user to choose custom separator forload_dictionary
- Added
load_bigram_dictionary
and bigram dictionaryfrequency_bigramdictionary_en_243_342.txt
- Updated
lookup_compound
algorithm - Added
Levenshtein
to compute edit distance - Added
save_pickle_stream
andload_pickle_stream
to save/load SymSpell data alongside other structure (contribution by marcoffee)
- Added
transfer_casing
tolookup
andlookup_compound
- Fixed prefix length check in
_edits_prefix
- Implemented
delete_dictionary_entry
- Improved performance by using python builtin hashing
- Added versioning of the pickle
- Fixed
include_unknown
inlookup
- Removed unused
initial_capacity
argument - Improved
_get_str_hash
performance - Implemented
save_pickle
andload_pickle
to avoid having to create the dictionary every time
- Added
create_dictionary()
feature
- Fixed
lookup_compound()
to return the correctdistance
- Added
<self._replaced_words = dict()>
to track number of misspelled words - Added
ignore_token
toword_segmentation()
to ignore words with regular expression
- Added
word_segmentation()
feature
- Added
encoding
option toload_dictionary()
- Create a package for
symspellpy
- Ported SymSpell v6.3