Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-45042: ComputeExposureSummaryStats needs to be able to turn off "update" code #977

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

laurenam
Copy link
Contributor

No description provided.

Copy link
Contributor

@parejkoj parejkoj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to strip this down to only have "doX" configs for the slowest components. The rest of the basic values we should always try to calculate. There's no point in adding "doX" settings for things that we'll always want to compute.

doUpdatePsfStats = pexConfig.Field(
dtype=bool,
default=True,
doc="Do update the psf statistics. Note that the PSF and apCorr model fidelity "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For all of the "doX", don't repeat the "do" in the docstring. For example:

Suggested change
doc="Do update the psf statistics. Note that the PSF and apCorr model fidelity "
doc="Update the psf statistics? Note that the PSF and apCorr model fidelity "

Comment on lines 47 to 50
"delta metrics are tied up in the same function as the more basic PSF metrics (e.g. "
"psfSigma and size/shape residuals). If the basic ones are still desired, but the "
"computation-intesive delta metrics are not, leave this as true and set "
"config.psfGridSampling to None. Set to False if speed is of the essence.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the psfSigma metrics slow at all? I think we only need to turn off the calculations that really matter. psfSigma seems like such a basic value that it should always be calculated if possible. Otherwise, what else is even computed here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I removed that as an option and added a note in the comments which parameters are always (i.e. non-optionally) run when this task is called. I also changed the grid-based ones to doUpdateX-style config options (which I do think is more obvious than setting the sampling sizes to None. The ap_pipe commit was updated to reflect this.)

Comment on lines 472 to 473
self.log.info("Note: not computing grid-based maxDistToNearestPsf model fidelity. Setting "
"it to NaN.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to say "setting it to NaN", that's the default value of all non-computed things.

self.log.info("Note: not computing grid-based PSF & ApCorr model fidelity metrics. Setting "
"psfTraceRadiusDelta, psfApFluxDelta, & psfApCorrSigmaScaledDelta to NaN.")
else:
if image_mask is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is image_mask optional? I can't think of any case where we'd not have access to the mask, and certainly the three places that we call this method, we pass a mask.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mask being an optional input is pre-existing from over 2 years ago (e654c55), so I'm not inclined to question or change it (making it non-optional would change the API, and since it can be None, we need to check for it). I added a log info to note if these metrics were not asked to be turned off but no mask was provided. If you're really bothered by it as is, please feel free to pursue a deprecation!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please ask about that on slack and file a ticket if there's no known reason: as I said, all three places in the stack that currently call this pass mask, and it doesn't make sense for it to be optional to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm gonna push back on this again. It's pre-existing and unrelated to this ticket. I am not bothered by it, so would not feel inclined to weigh in on, let alone initiate, a discussion about it. If you feel strongly about it, proceed as you will!

)
summary.maxDistToNearestPsf = float(maxDistToNearestPsf)
if self.config.psfSampling is None:
self.log.info("Note: not computing grid-based maxDistToNearestPsf model fidelity. Setting "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand what "model fidelity" means here in context?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above. I will make sure to fix both if we decide to change the descriptive.

summary.psfApCorrSigmaScaledDelta = float(psfApCorrSigmaScaledDelta)
if self.config.psfGridSampling is None:
self.log.info("Note: not computing grid-based PSF & ApCorr model fidelity metrics. Setting "
"psfTraceRadiusDelta, psfApFluxDelta, & psfApCorrSigmaScaledDelta to NaN.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to say you're setting them to NaN, that's the default for uncomputed values.

)
summary.psfApCorrSigmaScaledDelta = float(psfApCorrSigmaScaledDelta)
if self.config.psfGridSampling is None:
self.log.info("Note: not computing grid-based PSF & ApCorr model fidelity metrics. Setting "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what "model fidelity" means here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah...I struggled with the best descriptive to use here. I'm looking for a concise way to describe the robustness/reliability/(I landed on)fidelity of the PSF/apCorr model (largely in the context of potentially dangerous extrapolations that seem to be allowed in some of the modeling algorithms and reveals itself as a larger-than-realistic variation of fit parameters across a single detector). Do you have a suggestion/preference?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not be explicit about it? "PSF & ApCorr model minimum/maximum range metrics"? From the docstrings on those functions, that seems like what they are, no?

@laurenam laurenam force-pushed the tickets/DM-45042 branch 2 times, most recently from cf48856 to 53829ff Compare September 23, 2024 00:41
doUpdateMaxDistToNearestPsfStats = pexConfig.Field(
dtype=bool,
default=True,
doc="Update the grid-based maximun distance to the nearest PSF star fidelity statistic "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
doc="Update the grid-based maximun distance to the nearest PSF star fidelity statistic "
doc="Update the grid-based maximum distance to the nearest PSF star fidelity statistic "

Comment on lines 61 to 92
doUpdateWcsStats = pexConfig.Field(
dtype=bool,
default=True,
doc="Update the wcs statistics? Set to False if speed is of the essence.",
)
doUpdatePhotoCalibStats = pexConfig.Field(
dtype=bool,
default=True,
doc="Update the photoCalib statistics? Set to False if speed is of the essence.",
)
doUpdateBackgroundStats = pexConfig.Field(
dtype=bool,
default=True,
doc="Update the background statistics? Set to False if speed is of the essence.",
)
doUpdateMaskedImageStats = pexConfig.Field(
dtype=bool,
default=True,
doc="Update the masked image (i.e. skyNoise & meanVar) statistics? Set to False "
"if speed is of the essence.",
)
doUpdateEffectiveTimeStats = pexConfig.Field(
dtype=bool,
default=True,
doc="Update the effective time statistics? Set to False if speed is of the essence.",
)
Copy link
Contributor

@parejkoj parejkoj Sep 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should remove all of these except the grid-based do flags: the statistics computed on a grid are the ones that take a long time, but the rest should all be fast enough that we'd never want to disable them, I'd think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, just want to double check that you really don't want control over any but the grid-based ones. When we discussed this (way back when!), I was left with the strong feeling that you wanted control over almost all of them for AP (e.g. in the ticket description you mention update_wcs_stats and update_masked_image_deltas taking >1s per image as being a problem). Does even if AP decides to keep many of them at present, does it hurt to have the option already available in case opinions on that change (for AP or other caller of this task)?

If you do want these removed, just for the record, can you specify on the ticket that you no longer consider any but the grid-based computations an issue?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've waffled back and forth on this a bit. The reason to not have doX configs for things that there would be no real reason to turn off is that it makes the code and config listings more complicated for no real benefit. I'll go add a comment on the ticket about this.

I've been doing some profiling but wanted to run on some OR4 data (which is more realistic than DC2) to get a better sense of the slow points.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks. I’ll hang tight!

)
summary.psfApCorrSigmaScaledDelta = float(psfApCorrSigmaScaledDelta)
if self.config.psfGridSampling is None:
self.log.info("Note: not computing grid-based PSF & ApCorr model fidelity metrics. Setting "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not be explicit about it? "PSF & ApCorr model minimum/maximum range metrics"? From the docstrings on those functions, that seems like what they are, no?

Comment on lines 124 to 125
self.assertTrue(summary.ra, nan)
self.assertTrue(summary.dec, nan)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assertTrue? I don't think this is doing what you think it is:

https://docs.python.org/3/library/unittest.html#unittest.TestCase.assertTrue

Did you mean self.assertTrue(np.isnan(summary.ra)), and similarly for all the tests below? Note that you cannot test equality for NaN, because it's not equal to itself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching that!

Computation of these statistics can be time consuming, so this adds
an option to turn any of them off for scenarios where they are not
useful (which they would be for, e.g., downstream processing decision
making) and/or processing time minimization is of the essence.
Copy link
Contributor

@parejkoj parejkoj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally following up on this after some ComCam profiling: I think the only things we need to put in doX blocks are maximum_nearest_psf_distance and compute_psf_image_deltas inside update_psf_stats, and update_masked_image_stats. Those take a second or so each. The remaining items don't even show up in a view of the profile, they're so quick.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants