Refactor c2st and adding a unit test for metrics #503

psteinb · 2021-06-03T15:18:55Z

I wanted to understand the ins and outs of c2st better. I am putting this here. potentially my code can help to improve it. Also, c2st is untested currently.

codecov-commenter · 2021-06-03T15:37:51Z

Codecov Report

Merging #503 (cd68e87) into main (9ed18ca) will decrease coverage by 1.01%.
The diff coverage is 83.33%.

@@            Coverage Diff             @@
##             main     #503      +/-   ##
==========================================
- Coverage   67.96%   66.94%   -1.02%     
==========================================
  Files          56       67      +11     
  Lines        4117     4396     +279     
==========================================
+ Hits         2798     2943     +145     
- Misses       1319     1453     +134

Flag	Coverage Δ
unittests	`66.94% <83.33%> (-1.02%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
sbi/utils/metrics.py	`40.42% <83.33%> (+3.52%)`	⬆️
sbi/analysis/plot.py	`46.17% <0.00%> (-27.16%)`	⬇️
sbi/analysis/conditional_density.py	`51.51% <0.00%> (-23.49%)`	⬇️
sbi/inference/snle/snle_base.py	`81.81% <0.00%> (-15.78%)`	⬇️
sbi/inference/snre/snre_base.py	`82.56% <0.00%> (-14.24%)`	⬇️
sbi/inference/snpe/snpe_a.py	`61.86% <0.00%> (-4.65%)`	⬇️
sbi/inference/snpe/snpe_base.py	`83.80% <0.00%> (-3.32%)`	⬇️
sbi/utils/user_input_checks_utils.py	`93.78% <0.00%> (-2.94%)`	⬇️
sbi/utils/torchutils.py	`57.35% <0.00%> (-2.65%)`	⬇️
sbi/inference/base.py	`77.53% <0.00%> (-0.89%)`	⬇️
... and 49 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9ed18ca...cd68e87. Read the comment docs.

psteinb · 2021-07-15T15:08:44Z

It would be cool, if anyone could have a look at this PR. In the metrics_test.py file, I propose an updated version of the c2st function which runs considerably faster than the current implementation. Moreover, the choice of classifyer is flexible. If people like @michaeldeistler @janfb agree, I'd go forward and swap alt_c2st in the unit test file with the one in sbi/utils/metrics.py and remove the Draft label.

janfb · 2021-07-15T15:21:04Z

Thanks for the ping @psteinb !

I think this is very valuable input you give 👍 , but I would need to look into it a bit, which could happen next week the earliest.

Also, I think it would make sense to @jan-matthis input on this topic.

psteinb · 2021-07-15T15:31:49Z

sure thing, I just found some time today to invest in this. The difference in runtime for the test cases I could conceive (and there are likely more) is striking. Looking forward to your feedback!

psteinb · 2021-07-16T07:31:08Z

I merged main into this branch, so that the CI passes.

psteinb · 2021-11-09T14:38:09Z

Just wanting to pick this up here.

@janfb and @jan-matthis should I update this PR and bring the randomforest based c2st into sbi by replacing the old NN based implementation?

psteinb · 2022-01-25T15:28:08Z

This version of c2st is likely to speed up some unit tests, see also #575. I am currently studying the effect of using a RF instead of an MLP here

psteinb · 2022-01-26T08:52:24Z

As suggested by @janfb, I ran a small analysis on a synthetic dataset with c2st using a NN (as is currently done) versus c2st using a random forest (with sklearns default parameters). For the complete details of this analysis, consult this notebook.

Bottom line for me:

the rf based implementation gives approximately the same degree of results than the NN version
the speedup using a rf can be upto 32x

psteinb · 2022-01-26T11:09:43Z

@janfb or anyone else ready for review.

michaeldeistler

Thanks a lot for the effort, this is great! Two major points:

Is there a reason to change the default classifier?
Overall, these new tests should not exceed five minutes (3 minutes would be even better) (otherwise our total test time will explode in the long run). Could you check this and remove tests if necessary?

Aside: all tests are failing atm because of a missing import

tests/metrics_test.py

sbi/utils/metrics.py

tests/metrics_test.py

psteinb · 2022-01-26T13:37:03Z

1. Is there a reason to change the default classifier?

Depends on the input data. There is no free lunch so I thought, having the classifier be configurable makes sense.

2. Overall, these new tests should not exceed five minutes (3 minutes would be even better) (otherwise our total test time will explode in the long run). Could you check this and remove tests if necessary?

Fair enough, I'll look into this. Based on my analysis here using a smaller dataset size should be fine. Those 3 minutes apply to all non-slow tests?

Aside: all tests are failing atm because of a missing import

I'll see to it.

michaeldeistler · 2022-01-26T13:39:04Z

Yes, making it configurable is great. But I would stick to the MLP as default.

3-5 minutes applies to all tests, including slow tests. (sorry about being picky here, but I really actively trying to keep test time managable)

Thanks a lot!

janfb · 2022-01-26T14:18:43Z

thanks, this is great!

I think the speed up we gain from using random forest vs MLP is a very strong argument for changing the defaults, @michaeldeistler . You do not? What is your argument for keeping the MLP classifier?

- reducing the sample count based on studies in github.com/psteinb/c2st - expanding the bounds for assert statements to comply to uncertainties obtained there - renaming tests to better match their scope

janfb

overall great, thanks a lot! Some things need to be changed as indicated in the comments, and I think the tests can be reduced a lot with a bit of refactoring.

sbi/utils/metrics.py

tests/metrics_test.py

- 2 functions available: c2st and c2st_scores - use early stopping for MLP implementation - c2st is the easier interface + allows changing the classifier using a string argument + returns the mean score from cross validation - c2st_scores is more versatile + can consume sklearn compatible classifier class + returns all scores from the cross validation

michaeldeistler · 2022-01-27T10:52:55Z

Yes I'm perfectly fine with swapping out the classifier default. But please run all tests (including slow) and make sure that they pass. Thanks!

psteinb · 2022-01-27T11:00:39Z

Thanks for the feedback. I introduced early stopping for the MLP classifier and am currently rerunning my mini-study. It does reduce the MLP runtime considerably iff the two ensembles are (far) apart from each other and are easy to classify. If that is not the case, the runtime is still rather long. All tests pass on my machine and in the CI. Runtimes look good to me:

============================================================================= slowest durations =============================================================================
4.09s call     tests/metrics_test.py::test_same_distributions_mlp
3.92s call     tests/metrics_test.py::test_diff_distributions_flexible_mlp
2.00s call     tests/metrics_test.py::test_same_distributions
1.56s call     tests/metrics_test.py::test_c2st_scores
1.42s call     tests/metrics_test.py::test_onesigma_apart_distributions
0.63s call     tests/metrics_test.py::test_diff_distributions

I rerequested the review.

janfb

Looks very good, just suggested some refactoring, especially for the tests.

sbi/utils/metrics.py

tests/metrics_test.py

janfb · 2022-01-27T17:16:53Z

I think it would be great to simplify the tests, and to simplify c2st(..) to justify the separation from c2st_scores (see new comment above!).

Then I would also let @michaeldeistler look at this again, and finally I think it would be great if you could do an interactive rebase and group these rather many commits to obtain just a couple of commits. Thanks!

michaeldeistler

Good to go from my side. Feel free to merge it @janfb

Thanks a lot @psteinb!

- refactored unit tests into parametrized tests - all tests below 5s runtime, one pytest call takes below 45s on a 4c laptop - refactored c2st to offer simplistic interface - refactored c2st_scores to offer more control - renamed parameter scoring to metric to prevent confusion with z_score - removed elif statement - added type hints

janfb · 2022-01-28T10:34:28Z

Thanks for improving the tests! Great effort 🚀

psteinb · 2022-01-28T10:38:39Z

You want me to squash those commits in this PR and rebase onto main? I am totally happy if you want to just squash-merge. Just let me know.

janfb · 2022-01-28T10:41:02Z

No, it's like it is if we squash-merge. only if we were to rebase and merge then I would prefer some rewriting of the commits.
👍

psteinb · 2022-01-28T10:43:43Z

Horray! 🕺

psteinb added 3 commits June 3, 2021 17:16

adding a unit test for metrics

9195930

more explicit docstring

a3c0a7d

document clf_kwargs that can be provided by user

a0e3435

psteinb mentioned this pull request Jun 3, 2021

add sbc utils and tests. #501

Merged

3 tasks

demonstrated flexible API change for c2st

a399222

fixing the seed for comparing both implementations

4f6d1e7

Merge remote-tracking branch 'upstream/main' into unit-testing-c2st

48d3880

swapped in RF based c2st into main package

3550241

psteinb marked this pull request as ready for review January 24, 2022 10:37

psteinb changed the title ~~adding a unit test for metrics~~ draft: adding a unit test for metrics Jan 25, 2022

psteinb changed the title ~~draft: adding a unit test for metrics~~ Draft: adding a unit test for metrics Jan 25, 2022

psteinb changed the title ~~Draft: adding a unit test for metrics~~ Refactor c2st and adding a unit test for metrics Jan 26, 2022

psteinb added 2 commits January 26, 2022 12:08

revised setup of unit tests

527a0dd

more details in the docstring

eef25d8

added return value description

086c842

michaeldeistler requested changes Jan 26, 2022

View reviewed changes

psteinb added 2 commits January 26, 2022 14:42

adding missing import

4ec2392

help black format the code better

d47acfb

reducing the runtime of tests

9003bb9

- reducing the sample count based on studies in github.com/psteinb/c2st - expanding the bounds for assert statements to comply to uncertainties obtained there - renaming tests to better match their scope

janfb reviewed Jan 26, 2022

View reviewed changes

sbi/utils/metrics.py Outdated Show resolved Hide resolved

sbi/utils/metrics.py Outdated Show resolved Hide resolved

sbi/utils/metrics.py Outdated Show resolved Hide resolved

sbi/utils/metrics.py Show resolved Hide resolved

tests/metrics_test.py Show resolved Hide resolved

psteinb added 4 commits January 26, 2022 15:44

constistent naming scheme and removing one more test

4f0976a

seeding all test cases

3e5f7d6

removing double bracket

785f540

psteinb requested a review from michaeldeistler January 27, 2022 10:58

psteinb requested a review from janfb January 27, 2022 11:00

janfb requested changes Jan 27, 2022

View reviewed changes

psteinb requested a review from janfb January 27, 2022 16:59

remove unneeded import

cd68e87

michaeldeistler approved these changes Jan 28, 2022

View reviewed changes

psteinb added 2 commits January 28, 2022 11:11

refactored reoccurring test case configs out

5456e09

janfb approved these changes Jan 28, 2022

View reviewed changes

janfb merged commit ca61658 into sbi-dev:main Jan 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor c2st and adding a unit test for metrics #503

Refactor c2st and adding a unit test for metrics #503

psteinb commented Jun 3, 2021

codecov-commenter commented Jun 3, 2021 •

edited

Loading

psteinb commented Jul 15, 2021

janfb commented Jul 15, 2021

psteinb commented Jul 15, 2021

psteinb commented Jul 16, 2021

psteinb commented Nov 9, 2021

psteinb commented Jan 25, 2022

psteinb commented Jan 26, 2022

psteinb commented Jan 26, 2022

michaeldeistler left a comment

psteinb commented Jan 26, 2022

michaeldeistler commented Jan 26, 2022

janfb commented Jan 26, 2022

janfb left a comment

michaeldeistler commented Jan 27, 2022

psteinb commented Jan 27, 2022

janfb left a comment

janfb commented Jan 27, 2022 •

edited

Loading

michaeldeistler left a comment

janfb commented Jan 28, 2022

psteinb commented Jan 28, 2022

janfb commented Jan 28, 2022

psteinb commented Jan 28, 2022

Refactor c2st and adding a unit test for metrics #503

Refactor c2st and adding a unit test for metrics #503

Conversation

psteinb commented Jun 3, 2021

codecov-commenter commented Jun 3, 2021 • edited Loading

Codecov Report

psteinb commented Jul 15, 2021

janfb commented Jul 15, 2021

psteinb commented Jul 15, 2021

psteinb commented Jul 16, 2021

psteinb commented Nov 9, 2021

psteinb commented Jan 25, 2022

psteinb commented Jan 26, 2022

psteinb commented Jan 26, 2022

michaeldeistler left a comment

Choose a reason for hiding this comment

psteinb commented Jan 26, 2022

michaeldeistler commented Jan 26, 2022

janfb commented Jan 26, 2022

janfb left a comment

Choose a reason for hiding this comment

michaeldeistler commented Jan 27, 2022

psteinb commented Jan 27, 2022

janfb left a comment

Choose a reason for hiding this comment

janfb commented Jan 27, 2022 • edited Loading

michaeldeistler left a comment

Choose a reason for hiding this comment

janfb commented Jan 28, 2022

psteinb commented Jan 28, 2022

janfb commented Jan 28, 2022

psteinb commented Jan 28, 2022

codecov-commenter commented Jun 3, 2021 •

edited

Loading

janfb commented Jan 27, 2022 •

edited

Loading