-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Completely general binning system with fractional binning #54
Completely general binning system with fractional binning #54
Conversation
Just a couple of comments about bining - I apologise in advance if this isn't relevant, of use or indeed completely wrong! In terms of treating resolution:
I also have a particular concern about including fractions of a voxel:
See some toy code here (warranty not included, may be bugs!)
Prints out
To be clear, I don't know if the above code is actually what you're doing here, my point is this can go wrong and it's worth checking! |
Another aspect that wasn't mentioned by @RichardWaiteSTFC was that whenever you divide pixels to do fractional rebinning then the signal in adjacent pixels is no longer independent but becomes correlated. Thus to be 100% correct the error propagation needs to take into account covariance. Here's a demo of how it arises: import numpy as np
import uncertainties as unc
from uncertainties import ufloat
from uncertainties import unumpy as unp
a = ufloat(15, unp.sqrt(15))
b = ufloat(20, unp.sqrt(20))
c = ufloat(25, unp.sqrt(25))
d = 0.5*a + 0.5*b
e = 0.5*b + 0.5*c
print(repr(d))
print(repr(e))
print(np.array(unc.covariance_matrix([d, e]))) gives:
if Another consequence of taking these kinds of correlations into account is that actually all datapoints in a reduced SAS measurement are correlated! As well as this kind of fractional rebinning the reduction typically uses steps like normalising images/spectra by dividing by a common monitor count. Since this monitor count has an uncertainty then all resultant datapoints become correlated: a = ufloat(100, unp.sqrt(100))
b = ufloat(201, unp.sqrt(201))
monitor = ufloat(80, unp.sqrt(80))
print(repr(a/monitor))
print(repr(b/monitor))
# again, non-zero diagonal terms indicate that a/monitor and b/monitor are correlated
print(np.array(unc.covariance_matrix([a / monitor, b / monitor]))) gives
In many cases the correlation may be sufficiently small to be ignorable, but one should at least know about it. It's effects can be surprisingly large. In the monitor example I gave one can minimise the correlation by greatly increasing the monitor counts. Given that reduction contains many steps one has to take good care of how the error propagation is carried out. correlation is something that is conveniently ignored. Heybrock et al. wrote a paper on these kinds of effects. |
I only just saw this Andy. Sure, I get that. I have spoken to Simon and read the paper. I even have a project plan at ISIS to fix some of these things (it won't happen, but hey!). Fact is, we don't have that information. |
Correlation is taken into account wherever it is known about though. |
209ba83
into
85-add-interpolationrebinning
This should not have been closed, weird git stuff! |
I've completed the basic stuff needed to do fractional binning, it can take any kind of input binning and convert to any kind of output binning. I have only implemented the -1th order (existing rebinning system) and 0th order (fractional binning) interpolation so far, but there is room for higher orders in the future, most of the hard work is done.
As apparently there is some controversy over whether this is useful, I thought I would hang on with actually implementing all the new slicers in this way, until at least @butlerpd can see the results below. As you can see, the 0th order ("fractional binning") approach is much more stable, though as things are completely general, its a bit slower. The main cost of computation for orders >= 0 is working out the mapping between input and output bins, I have implemented various caching optimisations which mean that these calculations only need to be performed once, so applying a slicer to the same input/output mapping will be relatively fast.
slicer_demo.py
contains the code used to generate these plots...Here's a demo of the region sum applied to data sampled at random points
As you can see, the output of order 0 is pretty smooth compared with order -1.
Data:
Example regions (output bin)
Output for -1 and 0 order, value against bin size
Here's a demo of the region average for data sampled on a grid
As you can see, the output of order 0 is much smoother than order -1. By symmetry, we know that the ideal result would be a constant value.
Data:
Example regions (output bin)
Output for -1 and 0 order, value against bin size (should read average, but I forgot to change the title)