-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementing tension statistics #333
base: master
Are you sure you want to change the base?
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #333 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 36 37 +1
Lines 3076 3104 +28
=========================================
+ Hits 3076 3104 +28 ☔ View full report in Codecov by Sentry. |
Hi @DilyOng, many thanks for taking charge of incorporating this. Let's get it plumbed into anesthetic first, and then get feedback from others on if anything is missing. At the moment, this code is specialised to a specific naming scheme (which is what the union and intersection functions are doing), and for a wider grid. I think we should re-organise this so that in the first instance it is more similar to @AdamOrmondroyd's suspiciousness package, but retaining the class/cacheing structure of Tasks:
|
I think after that it would also be good to implement a function in addition to (or possibly in place of!) the class for producing a |
Please remember to remove ( |
…ation and testing it with correlated gaussian likelihoods. Found a problem with the function anesthetic.examples.perfect_ns.correlated_gaussian. The generated likelihood gaussian in the parameters is not normalised and the evidence is not unity. Need to take into account the LogLmax.
…ed_gaussian. Within the correlated_gaussian function, changed logLike function. Changed the function's description to match the fact that evidence is not unity.
…kelihood test case with the tests folder.
…nction tension_stats() for calculating tension statistics. Rewrote the test_tension_stats.py in tests to match the format of other files. It tests mock datasets with guassian likelihood. Both compatiable and incompatiable datasets have passed the test.
…dd a file for datasets pairwise_comparison, but not completed
…the theoretical logR, logS and logI values sit within 3 std of the numerical solution's distribution from anesthetic, instead of testing between minimum and maximum values of the distribution.
… computation to save computing time for high-nsamples runs
…ng stats to tension stats
…d, inlcuding anesthetic/tension.py and tests/test_tension.py.
OK @DilyOng they seem to be running for me -- try correcting the version number in the README and init.py back to 2.10.0 to see if it's working now. |
…nesthetic/_version.py
@williamjameshandley It's all done - All checks have passed! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I think this is ready for merging -- any further suggestions can be presented as additional pull requests.
Many thanks @DilyOng. Please press 'squash and merge'.
(Actually I think @AdamOrmondroyd also has to confirm the changes before merging) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @williamjameshandley and @DilyOng,
I have already been testing this branch out a bit. In that context I realised that the current state is not quite flexible enough. Currently, we consider two datasets A and B separately, and the joint dataset AB. However, there are places where we will want to look at more than two datasets at the same time, e.g. A, B, and C. So we might want a more flexible... Thoughts?
I already have some modifications that could address this locally. Would it be ok for me to highjack this PR with these changes?
Our comments crossed. The suggestions in my previous comment would be major change (in the semantic versioning lingo) to the user interface of these tension functions... |
anesthetic/tension.py
Outdated
- ``I``: information ratio | ||
|
||
.. math:: | ||
I = exp(D_{KL}^{A} + D_{KL}^{B} - D_{KL}^{AB}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not quite what I meant. I was suggesting the following re-definition of equation (9) in the Quantifying tensions paper (note the lack of exp
):
I = D_A + D_B - D_AB
such that equation (10) becomes:
logS = logR - I
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lukashergt if we're doing arbitrary numbers of datasets, then we'll need to tweak these equations too, something like
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sidenote: oooh, neat, I didn't know that Markdown can by now handle math input :)
That said, for docstrings I would go for maximal readability even without rendering, so I'd say a simple math example is enough. Leave the rest to papers, or if really necessary, write a dedicated documentation page...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lukashergt Thank you very much for clarifying. I have changed the relevant lines.
anesthetic/tension.py
Outdated
samples['logR'] = statsAB['logZ'] - statsA['logZ'] - statsB['logZ'] | ||
samples.set_label('logR', r'$\ln\mathcal{R}$') | ||
|
||
samples['I'] = np.exp(statsA['D_KL'] + statsB['D_KL'] - statsAB['D_KL']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accordingly, this should be without exp
, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lukashergt Updated!
tests/test_tension.py
Outdated
|
||
assert s.logS.mean() == approx(s.logR.mean() - s.logI.mean(), | ||
assert s.logS.mean() == approx(s.logR.mean() - np.log(s.I).mean(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And accordingly this should not have the np.log
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lukashergt Updated.
import numpy as np | ||
|
||
|
||
def stats(A, B, AB, nsamples=None, beta=None): # noqa: D301 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lukashergt @DilyOng for multiple datasets, I think unpacking will neatly handle arbitrary numbers of datasets, something like
def stats(h0, *h1, nsamples=None, beta=None):
```h0 = null hypothesis = AB, h1stats = alternative hypothesis = A, B etc```
...
samples['logR'] = h0stats['logZ'] - sum(_h1stats['logZ'] for _h1stats in h1stats)
...
which can be called tension.stats(abcde, a, b, c, d, e)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's about what I have. I call them joint
and separate
, which I find a bit more descriptive...
Parameters | ||
---------- | ||
A : :class:`anesthetic.samples.Samples` | ||
:class:`anesthetic.samples.NestedSamples` object from a sampling run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is contradictory: does A have to be Samples
or NestedSamples
?
anesthetic/tension.py
Outdated
- ``I``: information ratio | ||
|
||
.. math:: | ||
I = exp(D_{KL}^{A} + D_{KL}^{B} - D_{KL}^{AB}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lukashergt if we're doing arbitrary numbers of datasets, then we'll need to tweak these equations too, something like
?
…he previous log I, the logarithm is incorporated in I. Updated all relevant lines in anesthetic/tension.py and tests/test_tension.py.
Description
This is a work in progress pull request aiming to address #325 and as a learning exercise on how to do pull request.
Checklist:
flake8 anesthetic tests
)pydocstyle --convention=numpy anesthetic
)python -m pytest
)