Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restructure explain() for iterative estimation with convergence detection, verbose arguments ++ #396

Merged
merged 177 commits into from
Oct 6, 2024

Conversation

martinju
Copy link
Member

@martinju martinju commented Jun 7, 2024

Very early draft. Lots of cleanup and moving things around remains, but the general overall structure will probably be close to what we got here.

To be done in this PR (some may be removed here and handled in separate PRs):

  • Add iterative kernelshap estimation: Sampling some coalitions at a time and estimating shapley values. Iterate until convergence threshold is met (based on estimated sd).
    • for features
    • for groups
  • Add bootstrapping approach estimating for standard deviation
    • for features
    • for groups
  • Add paired sampling of coalitions
    • for features
    • for groups
  • Add reweighting of sampling shapley kernel weights
  • Restructure non-iterative (classical) approach as a special case of the iterative approach for code simplicity
  • Rewrite batch_computing to be computed within each iteration
  • Remove all traces of iterative arguments put outside of the iter_list
  • Consider moving form n_batches to something like max_batch_size + min_n_batches as input argument. The former controls the memory allocation (smaller means less memory consumption), while the latter ensures we get at least some progress updates while doing the updating. An internal function could then set the number of batches at each iteration to min(n_combinations,max(min_n_batches,ceiling(n_combinations/max_batch_size))))
  • Remove the stop() calls based on the number of features.
  • Add parallelization of the bootstrapping function
  • Check that the new code structure works with parallelization (via future.lapply)
  • Add argument verbose with arguments - Add verbose = c("basic","shapley","vS_details"), with "basic" as the default, showing what is currently going on in the function, the filename of the tempfile, and what iteration we are at (+ later estimate of the remaining computationt time) NULL or "" should give no printout at all, "shapley" means printing intermediate shapley estimates, "vs_details" means printing results while estimating the vS_functions (where this is done in more than a single step).
  • add save_intermediate = c("to_disk","in_memory") to allow the user so save results to file while doing the estimation. May choose to save everything to disk or just what is needed to continue adding coalitions if we are not yet happy with the results.
  • Add tests for the iterative approach
  • Set new defaults in explain(): paired_sampling =TRUE, shapley_reweighting = "on_N" ++
  • Carefully update all test checking files in at least 3 steps: 1. ensure classical shapley estimates gives the same answer (keeping the same order of setup_approach and shapley_setup), 2. update shapley values with the new ordering, 3. update the internal objects
  • Add new argument which is a list of iterative-specific arguments.
  • Update documentation
  • Update vignette
  • Some cleaning (fixing Annabelle's comments ++)
  • Add a few tests for verbose
  • Possibly change the locatiosn for some of the internal parameters to make them easier to acces (like the saving path)

Note: All non-exact methods fails now (also the Shapley values estimates) since shapley_setup is now called after setup_approach. All tests for Shapley values pass if these calls are but back to the original order (but we don't want that in the future).

@martinju
Copy link
Member Author

martinju commented Jul 4, 2024

Just some notes for myself on where to catch up after the holiday:

  1. Carefully check that the structure and content of the output from explain is as we want it to be at the current stage.
  2. Accept all tests with the new structure such that future edits can be checked against the tests
  3. Add some basic tests for the new convergence stuff.
  4. Consider updating all tests with the new defaults (paired sampling and reweighting)
  5. THEN go ahead and add the other features: paralellization of the boostrapping, verbose argument, disk saving, same functionality for groups etc.

Depending on how things go as I get back on this in august, @LHBO might take a look at the code structure some time after point 3 is done.

@martinju
Copy link
Member Author

martinju commented Oct 4, 2024

Hi @aredelmeier @LHBO @jonlachmann

This is closing in on a merge. Just a few things misisng now, I think. I hope to be able to merge this some time this weekend.

Here is what remains:

  • Update documentation
  • Update vignette
  • Some cleaning (fixing Annabelle's comments ++)
  • Add a few tests for verbose
  • Possibly change the locatiosn for some of the internal parameters to make them easier to acces (like the saving path)

@jonlachmann You may want to merge this into your forecast fixing branch.

@LHBO If you want to , I actually think you can safely start on the asymmetric stuff from the current stage. What remains will not change much of the code.

@martinju martinju changed the base branch from master to shapr_1.0.0 October 6, 2024 07:24
@martinju martinju changed the base branch from shapr_1.0.0 to shapr-1.0.0 October 6, 2024 13:23
@martinju martinju marked this pull request as ready for review October 6, 2024 13:23
@martinju martinju changed the title Restructure explain() for iterative estimation with convergence detection ++ Restructure explain() for iterative estimation with convergence detection, verbose arguments ++ Oct 6, 2024
@martinju martinju merged commit 692be6b into shapr-1.0.0 Oct 6, 2024
@LHBO LHBO mentioned this pull request Oct 8, 2024
8 tasks
LHBO pushed a commit to LHBO/shapr that referenced this pull request Oct 13, 2024
…tion, verbose arguments ++ (NorskRegnesentral#396)

Dealing with the leftovers in separate PRS into NorskRegnesentral#402.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants