-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Notebook demos #50
base: main
Are you sure you want to change the base?
Notebook demos #50
Conversation
fexfl
commented
Nov 19, 2024
- Added notebook folder with demos for usage and for performance
- Usage demonstration output is currently just printed, find a way to visualize this better in the future
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #50 +/- ##
=======================================
Coverage 93.45% 93.45%
=======================================
Files 4 4
Lines 382 382
=======================================
Hits 357 357
Misses 25 25 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some remarks and description for the demo notebook. Please address the remarks in the performance notebook. For the performance, we would also like to try batching for the transformers pipeline (see latest issue, that would be a new PR).
Can you make sure to use precommit hooks, so that the jupyter notebook output is cleared when you commit to the repo (execute pre-commit install
in your directory).
notebook/performance_demo.ipynb
Outdated
" out_list.append(email_dict)\n", | ||
"\n", | ||
" # timestamp after this email\n", | ||
" ts_list.append(time.time())" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add something to plot the timestamps and get the average? And include the time for model loading / preprocessing.
For example, two bar plots:
- One with the time per email, to see if there are any emails that take longer.
- One with the time for model loading, time for preprocessing (reading the eml files and spacy sentencizing), and average time of processing per email (if that is legible on the same plot).
Quality Gate passedIssues Measures |