-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use essreduce loaders #114
Conversation
@nvaytet Can you target the PR on the namespace pkg branch? Then the diff only shows the relevant changes. Merging that branch will retarget the PR automatically onto main. |
@jl-wynen I forgot to remove the old loaders... |
src/ess/loki/io.py
Outdated
# Note here we specify the name instead of using | ||
# ess.reduce.nexus.extract_detector_data because we need the name to put the | ||
# events back into the original data group. | ||
key = f'{detector_name}_events' | ||
events = _preprocess_data( | ||
out[key], | ||
sample_position=raw_sample['position'], | ||
source_position=raw_source['position'], | ||
) | ||
if detector_name in DETECTOR_BANK_RESHAPING: | ||
events = DETECTOR_BANK_RESHAPING[detector_name](events) | ||
out[key] = events |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain your comment, and why we need this? Can't these transformations be applied after extracting the detector data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you're saying we could insert an additional step after the RawData[Sample]
has been extracted, where we would reshape and patch (add necessary coordinates, etc.) the data, and output something called PreparedData
that would be just before the MaskedData
in the chain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
src/ess/loki/io.py
Outdated
out = nexus.load_monitor(file_path=file_path, monitor_name=monitor_name) | ||
key = f'{monitor_name}_events' | ||
out[key] = _preprocess_data(out[key], source_position=raw_source['position']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar question to above, why do we need to preprocess before extracting the monitor data? This makes this brittle, as we still hard-code some naming convention.
ESSreduce has been released |
74b5d6c
to
38fcbac
Compare
…before the normalization
docs/user-guide/isis/zoom.ipynb
Outdated
"providers = providers + (\n", | ||
" isis.data.transmission_from_background_run,\n", | ||
" isis.data.transmission_from_sample_run,\n", | ||
")\n", | ||
"pipeline = sciline.Pipeline(providers, params=params)\n", | ||
"pipeline.set_param_table(masks)" | ||
"# pipeline.set_param_table(masks)\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commented code?
requirements/base.in
Outdated
plopp | ||
pythreejs | ||
sciline>=24.2.0 | ||
sciline>=23.9.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the version regression?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know, @jl-wynen ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok I fixed it in the pyproject.toml
.
src/ess/isissans/components.py
Outdated
@@ -33,7 +28,7 @@ def apply_component_user_offsets_to_raw_data( | |||
data: RawData[ScatteringRunType], | |||
sample_offset: Optional[SampleOffset], | |||
detector_bank_offset: Optional[DetectorBankOffset], | |||
) -> RawDataWithComponentUserOffsets[ScatteringRunType]: | |||
) -> PatchedData[ScatteringRunType]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In #59 (comment) you argued that the naming I chose, ConfiguredRawData
, was unclear. I would argue that PatchedData
is considerably less clear than that, since "patching" has multiple meanings. In particular, we are not patching sections of data together. How about going back to something like ConfiguredData
after all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we are adding missing coordinates, variances on the events, and possibly adding offsets to some of the coordinates (in the case of ISIS data).
Either we find a name that covers all 3 well enough (I don't really get the above from ConfiguredData
, but PatchedData
is indeed quite general), or we should make them into separate steps?
I don't like making separate steps because it means that all of these steps could be optional, and we'd have to have e.g. a dummy step for applying position offsets in loki
, just because it is needed in isis
.
I bascially wanted a name that would mean "ready for the rest of the reduction workflow", but was never able to find a good name :-(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about just dropping the Raw
prefix? This function would turn RawData
into Data
. But I still don't see what is so wrong with configured-data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried ConfiguredAndCompleteData
but I still don't really like it. Now I'm thinking of ConfiguredReducibleData
to hint that it has the necessary pieces to proceed with the rest of the reduction.
But I still don't see what is so wrong with configured-data.
I think Configured
captures the notion that some configurations were applied by the user, e.g. positions offsets, but not so much the fetching of the sample/source positions so that we can later perform conversion to wavelength. That's why I thought we were "patching" different parts of the NeXus file together. But I agree that PatchedData
is just as vague.
Regarding why I thought we could be more specific in your PR was because it was to describe only the step where user offsets were applied to the data. Now that different things can happen in this step, we do need to go back to something more general.
src/ess/sans/i_of_q.py
Outdated
return FinalSummedQ[ScatteringRunType, IofQPart]( | ||
reduce(_merge_events_or_histograms, data.values()) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider using https://scipp.github.io/generated/functions/scipp.reduce.html.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried something like
reducer = sc.reduce(data.values())
out = reducer.bins.concat() if reducer.bins is not None else reducer.sum()
but reducer.bins
is never None
, even if the elements inside data.values()
are dense data.
So I went back to functools.reduce
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't you use it as if data.values()[0].bins is None
? Or all(x.bins is None for x in data.values())
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear: Doing it in the way you chose is extremely inefficient if you have many files (please give it a try!). You are basically adding an
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I thought about getting the first element in data.values()
but I didn't think it was worth it. I was under the impression that using either functools
or sc.reduce
was equivalent, also in terms of performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I have to qualify my statement: When summing, I think the current implementation of sc.reduce
is actually at a disadvantage, compared to functools.reduce
, but for concatenation it is
def apply_pixel_masks( | ||
data: RawData[ScatteringRunType], | ||
masks: Optional[sciline.Series[PixelMaskFilename, PixelMask]], | ||
data: TofData[ScatteringRunType], | ||
masked_ids: Optional[sciline.Series[PixelMaskFilename, MaskedDetectorIDs]], | ||
) -> MaskedData[ScatteringRunType]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not totally happy with this change, as it will prevent using manual masks that are constructed, e.g., using a condition of the pixel positions, etc. Can the MaskedDetectorIDs
-> PixelMask
conversion be done in a separate step?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue was that it was using the RawData[SampleRun]
to get the detector ids, and using it also for the BackgroundRun
. This led to an infinite recursion error when trying to set a parameter table for using events from multiple runs, I think because it was trying to get a sciline.Series[Filename[BackgroundRun], RawData[SampleRun]]
which it could not materialize.
I could have a PixelMask[ScatteringRunType]
as an alternative? I think that would work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it will prevent using manual masks that are constructed, e.g., using a condition of the pixel positions
I think in this case you would have to use a different provider anyway, because the masks are sciline.Series[PixelMaskFilename, MaskedDetectorIDs]
, i.e. they depend on PixelMaskFilename
, so most probably wouldn't be created from a condition?
FYI I'm still running into issues with PixelMask[ScatteringRunType]
.
…, just include the provider by default
Fixes #112
Note: CI is failing because
essreduce
does not exist as a package yetEDIT by JL:
Also fixes #106, #109, #110
Note:
We have changed the place where we merge events from multiple files in the workflow.
We used to merge the events early, into a single data array. This meant we had to make sure that sample and detector positions were the same for all runs.
Now we compute Q and wavelength for numerator and denominator for each file, and merge just before normalization.