Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor c2st and adding a unit test for metrics #503

Merged
merged 20 commits into from
Jan 28, 2022

Conversation

psteinb
Copy link
Contributor

@psteinb psteinb commented Jun 3, 2021

I wanted to understand the ins and outs of c2st better. I am putting this here. potentially my code can help to improve it. Also, c2st is untested currently.

@codecov-commenter
Copy link

codecov-commenter commented Jun 3, 2021

Codecov Report

Merging #503 (cd68e87) into main (9ed18ca) will decrease coverage by 1.01%.
The diff coverage is 83.33%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #503      +/-   ##
==========================================
- Coverage   67.96%   66.94%   -1.02%     
==========================================
  Files          56       67      +11     
  Lines        4117     4396     +279     
==========================================
+ Hits         2798     2943     +145     
- Misses       1319     1453     +134     
Flag Coverage Δ
unittests 66.94% <83.33%> (-1.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
sbi/utils/metrics.py 40.42% <83.33%> (+3.52%) ⬆️
sbi/analysis/plot.py 46.17% <0.00%> (-27.16%) ⬇️
sbi/analysis/conditional_density.py 51.51% <0.00%> (-23.49%) ⬇️
sbi/inference/snle/snle_base.py 81.81% <0.00%> (-15.78%) ⬇️
sbi/inference/snre/snre_base.py 82.56% <0.00%> (-14.24%) ⬇️
sbi/inference/snpe/snpe_a.py 61.86% <0.00%> (-4.65%) ⬇️
sbi/inference/snpe/snpe_base.py 83.80% <0.00%> (-3.32%) ⬇️
sbi/utils/user_input_checks_utils.py 93.78% <0.00%> (-2.94%) ⬇️
sbi/utils/torchutils.py 57.35% <0.00%> (-2.65%) ⬇️
sbi/inference/base.py 77.53% <0.00%> (-0.89%) ⬇️
... and 49 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9ed18ca...cd68e87. Read the comment docs.

@psteinb psteinb mentioned this pull request Jun 3, 2021
3 tasks
@psteinb
Copy link
Contributor Author

psteinb commented Jul 15, 2021

It would be cool, if anyone could have a look at this PR. In the metrics_test.py file, I propose an updated version of the c2st function which runs considerably faster than the current implementation. Moreover, the choice of classifyer is flexible. If people like @michaeldeistler @janfb agree, I'd go forward and swap alt_c2st in the unit test file with the one in sbi/utils/metrics.py and remove the Draft label.

@janfb
Copy link
Contributor

janfb commented Jul 15, 2021

Thanks for the ping @psteinb !

I think this is very valuable input you give 👍 , but I would need to look into it a bit, which could happen next week the earliest.

Also, I think it would make sense to @jan-matthis input on this topic.

@psteinb
Copy link
Contributor Author

psteinb commented Jul 15, 2021

sure thing, I just found some time today to invest in this. The difference in runtime for the test cases I could conceive (and there are likely more) is striking. Looking forward to your feedback!

@psteinb
Copy link
Contributor Author

psteinb commented Jul 16, 2021

I merged main into this branch, so that the CI passes.

@psteinb
Copy link
Contributor Author

psteinb commented Nov 9, 2021

Just wanting to pick this up here.

@janfb and @jan-matthis should I update this PR and bring the randomforest based c2st into sbi by replacing the old NN based implementation?

@psteinb psteinb marked this pull request as ready for review January 24, 2022 10:37
@psteinb
Copy link
Contributor Author

psteinb commented Jan 25, 2022

This version of c2st is likely to speed up some unit tests, see also #575. I am currently studying the effect of using a RF instead of an MLP here

@psteinb psteinb changed the title adding a unit test for metrics draft: adding a unit test for metrics Jan 25, 2022
@psteinb psteinb changed the title draft: adding a unit test for metrics Draft: adding a unit test for metrics Jan 25, 2022
@psteinb psteinb changed the title Draft: adding a unit test for metrics Refactor c2st and adding a unit test for metrics Jan 26, 2022
@psteinb
Copy link
Contributor Author

psteinb commented Jan 26, 2022

As suggested by @janfb, I ran a small analysis on a synthetic dataset with c2st using a NN (as is currently done) versus c2st using a random forest (with sklearns default parameters). For the complete details of this analysis, consult this notebook.

Bottom line for me:

  • the rf based implementation gives approximately the same degree of results than the NN version
  • the speedup using a rf can be upto 32x

@psteinb
Copy link
Contributor Author

psteinb commented Jan 26, 2022

@janfb or anyone else ready for review.

Copy link
Contributor

@michaeldeistler michaeldeistler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the effort, this is great! Two major points:

  1. Is there a reason to change the default classifier?
  2. Overall, these new tests should not exceed five minutes (3 minutes would be even better) (otherwise our total test time will explode in the long run). Could you check this and remove tests if necessary?

Aside: all tests are failing atm because of a missing import

tests/metrics_test.py Show resolved Hide resolved
sbi/utils/metrics.py Outdated Show resolved Hide resolved
sbi/utils/metrics.py Outdated Show resolved Hide resolved
tests/metrics_test.py Outdated Show resolved Hide resolved
tests/metrics_test.py Outdated Show resolved Hide resolved
tests/metrics_test.py Outdated Show resolved Hide resolved
@psteinb
Copy link
Contributor Author

psteinb commented Jan 26, 2022

1. Is there a reason to change the default classifier?

Depends on the input data. There is no free lunch so I thought, having the classifier be configurable makes sense.

2. Overall, these new tests should not exceed five minutes (3 minutes would be even better) (otherwise our total test time will explode in the long run). Could you check this and remove tests if necessary?

Fair enough, I'll look into this. Based on my analysis here using a smaller dataset size should be fine. Those 3 minutes apply to all non-slow tests?

Aside: all tests are failing atm because of a missing import

I'll see to it.

@michaeldeistler
Copy link
Contributor

Yes, making it configurable is great. But I would stick to the MLP as default.

3-5 minutes applies to all tests, including slow tests. (sorry about being picky here, but I really actively trying to keep test time managable)

Thanks a lot!

@janfb
Copy link
Contributor

janfb commented Jan 26, 2022

thanks, this is great!

I think the speed up we gain from using random forest vs MLP is a very strong argument for changing the defaults, @michaeldeistler . You do not? What is your argument for keeping the MLP classifier?

- reducing the sample count based on studies in
github.com/psteinb/c2st
- expanding the bounds for assert statements to comply to
uncertainties obtained there
- renaming tests to better match their scope
Copy link
Contributor

@janfb janfb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall great, thanks a lot! Some things need to be changed as indicated in the comments, and I think the tests can be reduced a lot with a bit of refactoring.

sbi/utils/metrics.py Outdated Show resolved Hide resolved
sbi/utils/metrics.py Outdated Show resolved Hide resolved
sbi/utils/metrics.py Outdated Show resolved Hide resolved
sbi/utils/metrics.py Show resolved Hide resolved
tests/metrics_test.py Show resolved Hide resolved
- 2 functions available: c2st and c2st_scores
- use early stopping for MLP implementation
- c2st is the easier interface
  + allows changing the classifier using a string argument
  + returns the mean score from cross validation
- c2st_scores is more versatile
  + can consume sklearn compatible classifier class
  + returns all scores from the cross validation
@michaeldeistler
Copy link
Contributor

Yes I'm perfectly fine with swapping out the classifier default. But please run all tests (including slow) and make sure that they pass. Thanks!

@psteinb
Copy link
Contributor Author

psteinb commented Jan 27, 2022

Thanks for the feedback. I introduced early stopping for the MLP classifier and am currently rerunning my mini-study. It does reduce the MLP runtime considerably iff the two ensembles are (far) apart from each other and are easy to classify. If that is not the case, the runtime is still rather long. All tests pass on my machine and in the CI. Runtimes look good to me:

============================================================================= slowest durations =============================================================================
4.09s call     tests/metrics_test.py::test_same_distributions_mlp
3.92s call     tests/metrics_test.py::test_diff_distributions_flexible_mlp
2.00s call     tests/metrics_test.py::test_same_distributions
1.56s call     tests/metrics_test.py::test_c2st_scores
1.42s call     tests/metrics_test.py::test_onesigma_apart_distributions
0.63s call     tests/metrics_test.py::test_diff_distributions

I rerequested the review.

@psteinb psteinb requested a review from janfb January 27, 2022 11:00
Copy link
Contributor

@janfb janfb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very good, just suggested some refactoring, especially for the tests.

sbi/utils/metrics.py Outdated Show resolved Hide resolved
sbi/utils/metrics.py Show resolved Hide resolved
sbi/utils/metrics.py Show resolved Hide resolved
sbi/utils/metrics.py Show resolved Hide resolved
sbi/utils/metrics.py Outdated Show resolved Hide resolved
sbi/utils/metrics.py Show resolved Hide resolved
tests/metrics_test.py Outdated Show resolved Hide resolved
@psteinb psteinb requested a review from janfb January 27, 2022 16:59
@janfb
Copy link
Contributor

janfb commented Jan 27, 2022

I think it would be great to simplify the tests, and to simplify c2st(..) to justify the separation from c2st_scores (see new comment above!).

Then I would also let @michaeldeistler look at this again, and finally I think it would be great if you could do an interactive rebase and group these rather many commits to obtain just a couple of commits. Thanks!

Copy link
Contributor

@michaeldeistler michaeldeistler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to go from my side. Feel free to merge it @janfb

Thanks a lot @psteinb!

- refactored unit tests into parametrized tests
- all tests below 5s runtime, one pytest call takes below 45s on a 4c laptop
- refactored c2st to offer simplistic interface
- refactored c2st_scores to offer more control
- renamed parameter scoring to metric to prevent confusion with z_score
- removed elif statement
- added type hints
@janfb
Copy link
Contributor

janfb commented Jan 28, 2022

Thanks for improving the tests! Great effort 🚀

@psteinb
Copy link
Contributor Author

psteinb commented Jan 28, 2022

You want me to squash those commits in this PR and rebase onto main? I am totally happy if you want to just squash-merge. Just let me know.

@janfb
Copy link
Contributor

janfb commented Jan 28, 2022

No, it's like it is if we squash-merge. only if we were to rebase and merge then I would prefer some rewriting of the commits.
👍

@janfb janfb merged commit ca61658 into sbi-dev:main Jan 28, 2022
@psteinb
Copy link
Contributor Author

psteinb commented Jan 28, 2022

Horray! 🕺

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants