-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-45042: ComputeExposureSummaryStats needs to be able to turn off "update" code #977
base: main
Are you sure you want to change the base?
Conversation
0fe9c25
to
b50b68d
Compare
b50b68d
to
5a66a3b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need to strip this down to only have "doX" configs for the slowest components. The rest of the basic values we should always try to calculate. There's no point in adding "doX" settings for things that we'll always want to compute.
doUpdatePsfStats = pexConfig.Field( | ||
dtype=bool, | ||
default=True, | ||
doc="Do update the psf statistics. Note that the PSF and apCorr model fidelity " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For all of the "doX", don't repeat the "do" in the docstring. For example:
doc="Do update the psf statistics. Note that the PSF and apCorr model fidelity " | |
doc="Update the psf statistics? Note that the PSF and apCorr model fidelity " |
"delta metrics are tied up in the same function as the more basic PSF metrics (e.g. " | ||
"psfSigma and size/shape residuals). If the basic ones are still desired, but the " | ||
"computation-intesive delta metrics are not, leave this as true and set " | ||
"config.psfGridSampling to None. Set to False if speed is of the essence.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the psfSigma metrics slow at all? I think we only need to turn off the calculations that really matter. psfSigma seems like such a basic value that it should always be calculated if possible. Otherwise, what else is even computed here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I removed that as an option and added a note in the comments which parameters are always (i.e. non-optionally) run when this task is called. I also changed the grid-based ones to doUpdateX
-style config options (which I do think is more obvious than setting the sampling sizes to None
. The ap_pipe
commit was updated to reflect this.)
self.log.info("Note: not computing grid-based maxDistToNearestPsf model fidelity. Setting " | ||
"it to NaN.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to say "setting it to NaN", that's the default value of all non-computed things.
self.log.info("Note: not computing grid-based PSF & ApCorr model fidelity metrics. Setting " | ||
"psfTraceRadiusDelta, psfApFluxDelta, & psfApCorrSigmaScaledDelta to NaN.") | ||
else: | ||
if image_mask is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is image_mask optional? I can't think of any case where we'd not have access to the mask, and certainly the three places that we call this method, we pass a mask.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The mask being an optional input is pre-existing from over 2 years ago (e654c55), so I'm not inclined to question or change it (making it non-optional would change the API, and since it can be None
, we need to check for it). I added a log info to note if these metrics were not asked to be turned off but no mask was provided. If you're really bothered by it as is, please feel free to pursue a deprecation!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please ask about that on slack and file a ticket if there's no known reason: as I said, all three places in the stack that currently call this pass mask
, and it doesn't make sense for it to be optional to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm gonna push back on this again. It's pre-existing and unrelated to this ticket. I am not bothered by it, so would not feel inclined to weigh in on, let alone initiate, a discussion about it. If you feel strongly about it, proceed as you will!
) | ||
summary.maxDistToNearestPsf = float(maxDistToNearestPsf) | ||
if self.config.psfSampling is None: | ||
self.log.info("Note: not computing grid-based maxDistToNearestPsf model fidelity. Setting " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand what "model fidelity" means here in context?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above. I will make sure to fix both if we decide to change the descriptive.
summary.psfApCorrSigmaScaledDelta = float(psfApCorrSigmaScaledDelta) | ||
if self.config.psfGridSampling is None: | ||
self.log.info("Note: not computing grid-based PSF & ApCorr model fidelity metrics. Setting " | ||
"psfTraceRadiusDelta, psfApFluxDelta, & psfApCorrSigmaScaledDelta to NaN.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to say you're setting them to NaN, that's the default for uncomputed values.
) | ||
summary.psfApCorrSigmaScaledDelta = float(psfApCorrSigmaScaledDelta) | ||
if self.config.psfGridSampling is None: | ||
self.log.info("Note: not computing grid-based PSF & ApCorr model fidelity metrics. Setting " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what "model fidelity" means here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah...I struggled with the best descriptive to use here. I'm looking for a concise way to describe the robustness/reliability/(I landed on)fidelity of the PSF/apCorr model (largely in the context of potentially dangerous extrapolations that seem to be allowed in some of the modeling algorithms and reveals itself as a larger-than-realistic variation of fit parameters across a single detector). Do you have a suggestion/preference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not be explicit about it? "PSF & ApCorr model minimum/maximum range metrics"? From the docstrings on those functions, that seems like what they are, no?
cf48856
to
53829ff
Compare
doUpdateMaxDistToNearestPsfStats = pexConfig.Field( | ||
dtype=bool, | ||
default=True, | ||
doc="Update the grid-based maximun distance to the nearest PSF star fidelity statistic " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doc="Update the grid-based maximun distance to the nearest PSF star fidelity statistic " | |
doc="Update the grid-based maximum distance to the nearest PSF star fidelity statistic " |
doUpdateWcsStats = pexConfig.Field( | ||
dtype=bool, | ||
default=True, | ||
doc="Update the wcs statistics? Set to False if speed is of the essence.", | ||
) | ||
doUpdatePhotoCalibStats = pexConfig.Field( | ||
dtype=bool, | ||
default=True, | ||
doc="Update the photoCalib statistics? Set to False if speed is of the essence.", | ||
) | ||
doUpdateBackgroundStats = pexConfig.Field( | ||
dtype=bool, | ||
default=True, | ||
doc="Update the background statistics? Set to False if speed is of the essence.", | ||
) | ||
doUpdateMaskedImageStats = pexConfig.Field( | ||
dtype=bool, | ||
default=True, | ||
doc="Update the masked image (i.e. skyNoise & meanVar) statistics? Set to False " | ||
"if speed is of the essence.", | ||
) | ||
doUpdateEffectiveTimeStats = pexConfig.Field( | ||
dtype=bool, | ||
default=True, | ||
doc="Update the effective time statistics? Set to False if speed is of the essence.", | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should remove all of these except the grid-based do
flags: the statistics computed on a grid are the ones that take a long time, but the rest should all be fast enough that we'd never want to disable them, I'd think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh, just want to double check that you really don't want control over any but the grid-based ones. When we discussed this (way back when!), I was left with the strong feeling that you wanted control over almost all of them for AP (e.g. in the ticket description you mention update_wcs_stats
and update_masked_image_deltas
taking >1s per image as being a problem). Does even if AP decides to keep many of them at present, does it hurt to have the option already available in case opinions on that change (for AP or other caller of this task)?
If you do want these removed, just for the record, can you specify on the ticket that you no longer consider any but the grid-based computations an issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've waffled back and forth on this a bit. The reason to not have doX
configs for things that there would be no real reason to turn off is that it makes the code and config listings more complicated for no real benefit. I'll go add a comment on the ticket about this.
I've been doing some profiling but wanted to run on some OR4 data (which is more realistic than DC2) to get a better sense of the slow points.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, thanks. I’ll hang tight!
) | ||
summary.psfApCorrSigmaScaledDelta = float(psfApCorrSigmaScaledDelta) | ||
if self.config.psfGridSampling is None: | ||
self.log.info("Note: not computing grid-based PSF & ApCorr model fidelity metrics. Setting " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not be explicit about it? "PSF & ApCorr model minimum/maximum range metrics"? From the docstrings on those functions, that seems like what they are, no?
self.assertTrue(summary.ra, nan) | ||
self.assertTrue(summary.dec, nan) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assertTrue
? I don't think this is doing what you think it is:
https://docs.python.org/3/library/unittest.html#unittest.TestCase.assertTrue
Did you mean self.assertTrue(np.isnan(summary.ra))
, and similarly for all the tests below? Note that you cannot test equality for NaN, because it's not equal to itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching that!
53829ff
to
10b42ef
Compare
10b42ef
to
8f70e30
Compare
Computation of these statistics can be time consuming, so this adds an option to turn any of them off for scenarios where they are not useful (which they would be for, e.g., downstream processing decision making) and/or processing time minimization is of the essence.
8f70e30
to
e978344
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finally following up on this after some ComCam profiling: I think the only things we need to put in doX
blocks are maximum_nearest_psf_distance
and compute_psf_image_deltas
inside update_psf_stats
, and update_masked_image_stats
. Those take a second or so each. The remaining items don't even show up in a view of the profile, they're so quick.
No description provided.