- Added support for ISO-639 language codes (
en
,de
,sk
, ...). #106 TextRankSummarizer
uses iterative algorithm. Previous algorithm is calledReductionSummarizer
. #100
- Added support for Chinese. #93
- Dropped support for distutils when installing sumy.
- Added support for Japanese. #79
- Fixed incorrect n-grams computation for more sentences. #84
- Fixed NLTK dependency for Python 3.3. NLTK 3.2 dropped support for Python 3.3 so sumy needs 3.1.
- Fixed missing stopwords in SumBasic summarizer. #74
- Added "--text" CLI parameter to summarize text in Emacs and other tools. #67
- Fixed computation of cosine similarity in LexRank summarizator. #63
- Fixed resource searching in .egg packages. #53
- Added support for Portuguese and Spanish. #49 #51
- Better error message when NLTK tokenizers are missing.
- Dropped support for Python 2.6 and 3.2. Only 2.7/3.3+ are officially supported now. Time to move :)
- CLI: Better message for unknown format.
- LexRank: fixed power method computation.
- Added some extra abbreviations (english, german) into tokenizer for better output.
- SumBasic: Added new summarization method - SumBasic. Thanks to Julian Griggs.
- KL: Added new summarization method - KL. Thanks to Julian Griggs.
- Added dependency requests to fix issues with downloading pages.
- Better documentation of expected Plaintext document format.
- Added possibility to specify format of input document for URL & stdin. Thanks to @Lucas-C.
- Added possibility to specify custom file with stop-words in CLI. Thanks to @Lucas-C.
- Added support for French language (added stopwords & stemmer). Thanks to @Lucas-C.
- Function
sumy.utils.get_stop_words
raisesLookupError
instead ofValueError
for unknown language. - Exception
LookupError
is raised for unknown language of stemmer instead of falling silently tonull_stemmer
.
- Fixed installation of my own readability fork. Added
breadability
to the dependencies instead of it #8. Thanks to @pratikpoddar.
- First public release.