Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Notebook demos #50

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
Open

Notebook demos #50

wants to merge 10 commits into from

Conversation

fexfl
Copy link
Collaborator

@fexfl fexfl commented Nov 19, 2024

  • Added notebook folder with demos for usage and for performance
  • Usage demonstration output is currently just printed, find a way to visualize this better in the future

@fexfl fexfl requested a review from iulusoy November 19, 2024 14:09
@fexfl
Copy link
Collaborator Author

fexfl commented Nov 19, 2024

This addresses #40 and #42, but further work is needed to visualize the results better

Copy link

codecov bot commented Nov 19, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 93.45%. Comparing base (b9d14f8) to head (446f34c).

Additional details and impacted files
@@           Coverage Diff           @@
##             main      #50   +/-   ##
=======================================
  Coverage   93.45%   93.45%           
=======================================
  Files           4        4           
  Lines         382      382           
=======================================
  Hits          357      357           
  Misses         25       25           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@iulusoy iulusoy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some remarks and description for the demo notebook. Please address the remarks in the performance notebook. For the performance, we would also like to try batching for the transformers pipeline (see latest issue, that would be a new PR).

Can you make sure to use precommit hooks, so that the jupyter notebook output is cleared when you commit to the repo (execute pre-commit install in your directory).

" out_list.append(email_dict)\n",
"\n",
" # timestamp after this email\n",
" ts_list.append(time.time())"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add something to plot the timestamps and get the average? And include the time for model loading / preprocessing.
For example, two bar plots:

  • One with the time per email, to see if there are any emails that take longer.
  • One with the time for model loading, time for preprocessing (reading the eml files and spacy sentencizing), and average time of processing per email (if that is legible on the same plot).

Copy link

sonarcloud bot commented Dec 3, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants