-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New features for estimating observed and predicted distributions (other than KDE) #400
Comments
Hey @yentyu. Thanks for proposing this. I totally agree that this change that you are proposing is the best way to do things. @eecsu and I had discussions about this back in the summer. I hardcoded the method of using KDEs on the full space to calculate the ratios, because that was the main way that we were doing it in our work. I agree that it should instead be done using |
Step 1: I totally agree that we should have a proper kde representation to go along with that. |
Step 2: |
Step 3:
Seeing your comment, this is actually a bug that I did not catch as I was modifying the way the |
@eecsu @smattis
We would like to add different options for how the observed and predicted distributions are estimated (i.e., with Bayesian GMM) when calculating R in the data-consistent framework. In order to accomplish this in a fashion which allows for future development, I've come up with an outline of how these features can be added to the code in a smooth and consistent way. @eecsu and I just wanted to start a discussion to make sure the general approach seems reasonable and we aren't overlooking anything that might break things or cause big problems with the package.
Here's the general idea. Currently, the discretization object points to 3 different
sample_set_base
objects: an input (or initial), a output (or predicted), and an observed. We want to use these objects to save the estimated pdfs for observed and predicted to their respectivesample_set_base
object using the_prob_type
and_prob_parameters
and then call the base object'sevaluate_pdf
method when computing R. This has some nice benefits:_prob_type
to the base objectHere is the specific outline of the changes this would involve:
Step 1: Update
sample_set_base
objectevaluate_pdf
function for KDEs so that it can evaluate multidimensional KDE pdfs (instead of relying on the evaluation of a set of marignals)scipy.stats.gaussian_kde
objects will need to be saved as the_prob_parameters
of the base object and the_prob_type
as "kde". For the mixture modeling (Bayesian or otherwise), we just need to save the output parameters from thesklearn.mixture
, i.e. the means, covariances, weights (in this order) as the_prob_parameters
with_prob_type
as "gmm".Step 2: Create
generate_densities
functionnew Create a function with
discretization
object as input and options for computing either the observed or predicted distribution (or both) as well as the TYPE of pdf. Options will include:Options should include the passing of arguments related to utilizing these different distribution estimates
Output of the function should return the requested kdes and clusters (see current implementation of
generate_output_kdes
Function should also SAVE the parameters to the appropriate
sample_set_base
objects associated with thediscretization
object. For instance, if generating the predicted distribution, the function should use the following to save the pdf:Step 3: Change
invert_...
Methodschange Most current inversion methods call
generate_output_kdes
and then callinvert
. Butinvert
ALSO callsgenerate_output_kdes
. This seems like unnecessary computation. We should remove the call togenerate_output_kdes
except in theinvert
function.new Change the
invert
method to call the newgenerate_densities
function defined above to generate the output distributions.sample_set_base
object of the discretization.The
invert
method should use the savedsample_set_base
observed and predicted pdfs associated with thediscretization
to compute the r-values.The text was updated successfully, but these errors were encountered: