Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0.5.5 #134

Merged
merged 93 commits into from
Sep 1, 2023
Merged

0.5.5 #134

Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
93 commits
Select commit Hold shift + click to select a range
402faea
added ruff
yiwen-h Jun 16, 2023
e2b821e
autofix with ruff
yiwen-h Jun 16, 2023
133473f
added codecov to test gh action
yiwen-h Jun 16, 2023
62cd264
added ruff to toml
yiwen-h Jun 16, 2023
5f0732d
fixed unused loop variable in for loop
yiwen-h Jun 16, 2023
3944e40
fixed errors identified by ruff in multilabel_pipeline
yiwen-h Jun 16, 2023
b3908c2
fixed ruff identified errors in factory_data_load_and_split
yiwen-h Jun 16, 2023
50ee694
fixed ruff error in factory_model_performance
yiwen-h Jun 16, 2023
a252b19
fixed minor ruff identified errors in factory_predict_unlabelled_text…
yiwen-h Jun 16, 2023
bea9387
adding precommit
yiwen-h Jun 16, 2023
5aa23b2
added folders to exclude in precommit
yiwen-h Jun 16, 2023
c0a6627
fixing test GH action yaml
yiwen-h Jun 16, 2023
237bc4f
updated precommit yaml
yiwen-h Jun 16, 2023
6f240ab
added test_sklearn_pipeline_sentiment
yiwen-h Jun 16, 2023
9aa28a6
added ruff to GH action
yiwen-h Jun 16, 2023
badc9ac
modified workflow yaml so upload to codecov only happens once
yiwen-h Jun 19, 2023
773e812
added more tests for factory_pipeline
yiwen-h Jun 19, 2023
200dd0b
added test_process_data
yiwen-h Jun 19, 2023
470bafc
removed tf pipeline and functions
yiwen-h Jun 19, 2023
b25864e
added test_create_sklearn_pipeline
yiwen-h Jun 19, 2023
d6c8596
added test_search_sklearn_pipelines
yiwen-h Jun 19, 2023
437e6d1
added more tests for factory_pipeline
yiwen-h Jun 20, 2023
cc0762d
Merge pull request #114 from CDU-data-science-team/104-75_linting_and…
yiwen-h Jun 20, 2023
2a4c160
added tests for factory_model_performance
yiwen-h Jun 21, 2023
0b9e304
Merge pull request #116 from CDU-data-science-team/75_pipeline_tests
yiwen-h Jun 27, 2023
8556d2b
started basic framework for dockerizing sentiment endpoint
yiwen-h Jul 3, 2023
243c69b
working docker container - cant use Alpine
yiwen-h Jul 3, 2023
3ad8e6b
got docker container to mount data folder, accept filename as argument
yiwen-h Jul 4, 2023
7885fcc
added test_sklearn_pipeline
yiwen-h Jul 12, 2023
4fdd06c
parametrized target for sklearn pipeline. wrote svc pipeline test
yiwen-h Jul 12, 2023
b8f7ef7
added tests for bert pipeline
yiwen-h Jul 12, 2023
3360bcb
added test for two_layer_sklearn pipeline. finished multilabel_pipeli…
yiwen-h Jul 12, 2023
4a0a835
added test_sentiment_sklearn
yiwen-h Jul 13, 2023
4120193
sentiment_bert test added
yiwen-h Jul 13, 2023
3bfcea8
finished testing for sentiment pipeline
yiwen-h Jul 13, 2023
64fe780
json input file now deleted if NOT run locally
yiwen-h Jul 14, 2023
86414f2
Predictions now outputted as json file in data_out folder
yiwen-h Jul 14, 2023
b726388
added label to dockerfile
yiwen-h Jul 18, 2023
6ab83a3
added most tests for docker_run
yiwen-h Jul 18, 2023
aca3601
added phase2 data for permitted trusts plus readme
yiwen-h Jul 18, 2023
a47da37
added more readme info
yiwen-h Jul 19, 2023
da605dc
added larger json file - about 8000 comments
yiwen-h Jul 19, 2023
8ea5ca0
Merge pull request #120 from CDU-data-science-team/119_datasets
ChrisBeeley Jul 19, 2023
f8c87cc
Merge pull request #121 from CDU-data-science-team/75_pipeline_tests
ChrisBeeley Jul 20, 2023
c57357a
retrained with multilabel_230719 data
yiwen-h Jul 25, 2023
2a23864
fixed svcpipeline not taking additional_features = False
yiwen-h Jul 25, 2023
43c5622
Merge pull request #124 from CDU-data-science-team/122_230719_data
ChrisBeeley Jul 25, 2023
852b569
Merge pull request #125 from CDU-data-science-team/123_svcpipeline_bu…
ChrisBeeley Jul 25, 2023
525f754
testing increasing prob of class if certain words are present in text
yiwen-h Jul 27, 2023
72d673a
moved rules to params for more flexibility
yiwen-h Jul 27, 2023
f62ba5b
added rules to pipeline. testing with test_rules.py
yiwen-h Jul 31, 2023
0902caa
finished rules_dict (subject to tweaking)
yiwen-h Aug 1, 2023
f0a023e
added rules to bert predictions
yiwen-h Aug 1, 2023
cca07bd
get_multilabel_metrics now uses predict_multilabel_bert
yiwen-h Aug 1, 2023
669fdf0
added probs_dict for variable probabilities too
yiwen-h Aug 1, 2023
de20e3a
moved test_rules out of tests folder, reran coverage
yiwen-h Aug 1, 2023
b3d2969
Merge branch 'development' into 117_rulebased_model
yiwen-h Aug 3, 2023
0d366e9
Merge pull request #130 from CDU-data-science-team/117_rulebased_model
yiwen-h Aug 4, 2023
e853659
wrote get_y_score function
yiwen-h Aug 4, 2023
9f135e6
prediction dfs now include probabilities as well
yiwen-h Aug 4, 2023
439dece
prediction dfs now include probabilities for sklearn multilabel
yiwen-h Aug 4, 2023
78632fc
added macro roc auc score to model summary
yiwen-h Aug 4, 2023
db180e0
write_model_preds now uses probabilities from predict_multilabel df o…
yiwen-h Aug 4, 2023
6eb5259
added model_performance.additional_analysis which calculates confusio…
yiwen-h Aug 4, 2023
42b5ec6
confusion matrix info and roc_auc_score now in model analysis
yiwen-h Aug 7, 2023
368f9d8
Replaced macro roc_aoc score with average_precision_score in perf ana…
yiwen-h Aug 9, 2023
bfb6dca
renamed "support" column to be more userfriendly
yiwen-h Aug 9, 2023
40a3cfd
fixed ruff complaining about == instead of isinstance in tests
yiwen-h Aug 9, 2023
f1256ed
fixed ruff complaining about == instead of isinstance in test_factory…
yiwen-h Aug 9, 2023
301c8b7
some broken dependencies causing test to fail, trying to fix pyprojec…
yiwen-h Aug 9, 2023
c815088
Merge pull request #131 from CDU-data-science-team/126_ROC
yiwen-h Aug 9, 2023
18bf54d
started basic framework for dockerizing sentiment endpoint
yiwen-h Jul 3, 2023
d4f0831
working docker container - cant use Alpine
yiwen-h Jul 3, 2023
784c306
got docker container to mount data folder, accept filename as argument
yiwen-h Jul 4, 2023
98c1846
json input file now deleted if NOT run locally
yiwen-h Jul 14, 2023
72c5a24
Predictions now outputted as json file in data_out folder
yiwen-h Jul 14, 2023
1a30581
added label to dockerfile
yiwen-h Jul 18, 2023
dfe0907
added most tests for docker_run
yiwen-h Jul 18, 2023
f3549b0
added larger json file - about 8000 comments
yiwen-h Jul 19, 2023
12a0f79
added cache removal to dockerfile in bid to reduce size
yiwen-h Jul 19, 2023
eb94f9a
fewer layers, slim-debian, for smaller size
yiwen-h Jul 19, 2023
866154c
updated dockerfile to reduce size
yiwen-h Aug 9, 2023
ca2e483
mocking load_model in test_predict_sentiment
yiwen-h Aug 9, 2023
5894def
Merge pull request #132 from CDU-data-science-team/118_distilbert_API
yiwen-h Aug 10, 2023
c3a0c7f
updated packages using poetry to address dependabot security issues
yiwen-h Aug 10, 2023
f35e007
updated api requirements to match 0.5.5 package requirements
yiwen-h Aug 10, 2023
71ece91
added proper reqs file to api folder (with dev group)
yiwen-h Aug 10, 2023
0c0e7e5
Merge pull request #133 from CDU-data-science-team/security_fixes_16_…
ChrisBeeley Aug 11, 2023
60e1415
added more docstrings
yiwen-h Aug 29, 2023
44ac72b
updated docs for API page and sentiment_pipeline
yiwen-h Aug 29, 2023
955e42f
added tornado^6.3.3 to address security issue #27
yiwen-h Aug 29, 2023
bf43ede
updated version to 0.5.5
yiwen-h Aug 29, 2023
c7806c3
fixed ruff complaining about isinstance
yiwen-h Aug 29, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .coveragerc
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,11 @@ omit = tests\*
*\params.py
api\test_api.py
setup.py
test_rules.py

source = api
pxtextmining

[report]
exclude_lines =
if __name__ == .__main__.:
8 changes: 8 additions & 0 deletions .github/workflows/test_package.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ on:
pull_request:
branches:
- development
- main

jobs:
build:
Expand All @@ -20,6 +21,8 @@ jobs:
- uses: actions/checkout@v3
- name: Install poetry
run: pipx install poetry
- name: Ruff
uses: chartboost/ruff-action@v1
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v4
with:
Expand All @@ -29,3 +32,8 @@ jobs:
run: poetry install --with dev
- name: Run tests
run: poetry run pytest tests/* -sx
- name: Upload coverage reports to Codecov
if: ${{ matrix.python-version }} == "3.10"
uses: codecov/codecov-action@v3
env:
CODECOV_TOKEN: ${{ secrets.CODECOV_TOKEN }}
30 changes: 30 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
exclude: '(build|datasets|current_best_multilabel|docs)/.*'

repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v2.3.0
hooks:
- id: check-added-large-files
name: Check for files larger than 75 MB
args: [ "--maxkb=750000" ]
- id: end-of-file-fixer
name: Check for a blank line at the end of scripts (auto-fixes)
exclude: 'json'
- id: trailing-whitespace
name: Check for trailing whitespaces (auto-fixes)
- repo: https://github.com/pycqa/isort
rev: 5.12.0
hooks:
- id: isort
name: isort - Sort Python imports (auto-fixes)
args: [ "--profile", "black", "--filter-files" ]
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.0.272
hooks:
- id: ruff
name: Ruff linting
- repo: https://github.com/psf/black
rev: 22.10.0
hooks:
- id: black
name: black - consistent Python code formatting (auto-fixes)
14 changes: 14 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
FROM python:3.10.12-slim-bookworm
VOLUME /data

COPY docker-requirements.txt requirements.txt
RUN pip install --upgrade pip setuptools \
&& pip install -r requirements.txt \
&& rm -rf /root/.cache

COPY api/bert_sentiment bert_sentiment
COPY --chmod=755 docker_run.py docker_run.py

LABEL org.opencontainers.image.source=https://github.com/cdu-data-science-team/pxtextmining

ENTRYPOINT ["python3", "docker_run.py"]
166 changes: 90 additions & 76 deletions api/requirements.txt

Large diffs are not rendered by default.

Loading