This repository has been archived by the owner on May 3, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 27
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge branch 'release0.11.1' into develop
- Loading branch information
Showing
14 changed files
with
96 additions
and
37 deletions.
There are no files selected for viewing
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
name: Close inactive issues | ||
on: | ||
schedule: | ||
- cron: "23 3 * * *" | ||
|
||
jobs: | ||
close-issues: | ||
runs-on: ubuntu-latest | ||
permissions: | ||
issues: write | ||
pull-requests: write | ||
steps: | ||
- uses: actions/stale@v3 | ||
with: | ||
days-before-issue-stale: 30 | ||
days-before-issue-close: 14 | ||
stale-issue-label: "stale" | ||
stale-issue-message: "This issue is stale because it has been open for 30 days with no activity." | ||
close-issue-message: "This issue was closed because it has been inactive for 14 days since being marked as stale." | ||
days-before-pr-stale: -1 | ||
days-before-pr-close: -1 | ||
repo-token: ${{ secrets.GITHUB_TOKEN }} | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
""" | ||
Configuration for tests with pytest | ||
.. codeauthor:: Markus Konrad <[email protected]> | ||
""" | ||
|
||
from hypothesis import settings, HealthCheck | ||
|
||
# profile for CI runs on GitHub machines, which may be slow from time to time so we disable the "too slow" HealthCheck | ||
# and set the timeout deadline very high (60 sec.) | ||
settings.register_profile('ci', suppress_health_check=(HealthCheck.too_slow, ), deadline=60000) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
""" | ||
A minimal example to showcase a few features of tmtoolkit. | ||
Markus Konrad <[email protected]> | ||
Feb. 2022 | ||
""" | ||
|
||
from tmtoolkit.corpus import Corpus, tokens_table, lemmatize, to_lowercase, dtm | ||
from tmtoolkit.bow.bow_stats import tfidf, sorted_terms_table | ||
|
||
|
||
# load built-in sample dataset and use 4 worker processes | ||
corp = Corpus.from_builtin_corpus('en-News100', max_workers=4) | ||
|
||
# investigate corpus as dataframe | ||
toktbl = tokens_table(corp) | ||
print(toktbl) | ||
|
||
# apply some text normalization | ||
lemmatize(corp) | ||
to_lowercase(corp) | ||
|
||
# build sparse document-token matrix (DTM) | ||
# document labels identify rows, vocabulary tokens identify columns | ||
mat, doc_labels, vocab = dtm(corp, return_doc_labels=True, return_vocab=True) | ||
|
||
# apply tf-idf transformation to DTM | ||
# operation is applied on sparse matrix and uses few memory | ||
tfidf_mat = tfidf(mat) | ||
|
||
# show top 5 tokens per document ranked by tf-idf | ||
top_tokens = sorted_terms_table(tfidf_mat, vocab, doc_labels, top_n=5) | ||
print(top_tokens) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,7 @@ | ||
""" | ||
tmtoolkit setuptools based setup module | ||
.. codeauthor:: Markus Konrad <[email protected]> | ||
""" | ||
|
||
import os | ||
|
@@ -8,7 +10,7 @@ | |
from setuptools import setup, find_packages | ||
|
||
__title__ = 'tmtoolkit' | ||
__version__ = '0.11.1.dev' | ||
__version__ = '0.11.1' | ||
__author__ = 'Markus Konrad' | ||
__license__ = 'Apache License 2.0' | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters