Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing tension statistics #333

Open
wants to merge 69 commits into
base: master
Choose a base branch
from
Open

Implementing tension statistics #333

wants to merge 69 commits into from

Conversation

DilyOng
Copy link
Collaborator

@DilyOng DilyOng commented Aug 24, 2023

Description

This is a work in progress pull request aiming to address #325 and as a learning exercise on how to do pull request.

Checklist:

  • I have performed a self-review of my own code
  • My code is PEP8 compliant (flake8 anesthetic tests)
  • My code contains compliant docstrings (pydocstyle --convention=numpy anesthetic)
  • New and existing unit tests pass locally with my changes (python -m pytest)
  • I have added tests that prove my fix is effective or that my feature works
  • I have appropriately incremented the semantic version number in both README.rst and anesthetic/_version.py

@codecov
Copy link

codecov bot commented Aug 24, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 100.00%. Comparing base (ccb2e76) to head (e42b048).

Additional details and impacted files
@@            Coverage Diff            @@
##            master      #333   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           36        37    +1     
  Lines         3076      3104   +28     
=========================================
+ Hits          3076      3104   +28     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@williamjameshandley
Copy link
Collaborator

Hi @DilyOng, many thanks for taking charge of incorporating this. Let's get it plumbed into anesthetic first, and then get feedback from others on if anything is missing.

At the moment, this code is specialised to a specific naming scheme (which is what the union and intersection functions are doing), and for a wider grid.

I think we should re-organise this so that in the first instance it is more similar to @AdamOrmondroyd's suspiciousness package, but retaining the class/cacheing structure of tension_calculator.

Tasks:

  • Create a class TensionCalculator in a new file anesthetic/tension.py (note CamelCase rather than under_score naming)
  • This class should __init__ with A, B and AB which are assumed to be NestedSamples, and cache self.A = A.stats(nsamples) (same for B and AB, which computes the nested sampling statistics (see also the docs)
  • This should then implement logR logS d D_KL and p
  • You should then create a tests/test_tension.py in the same style as the other test files, which uses anesthetic.examples.perfect_ns.correlated_gaussian functions to create mock A, B and AB, alongside equations 14 to 25 from 1902.04029 to test that the tension statitistics code gets the correct answer.

@williamjameshandley
Copy link
Collaborator

williamjameshandley commented Aug 24, 2023

I think after that it would also be good to implement a function in addition to (or possibly in place of!) the class for producing a Samples object containing columns of logR, D_KL, d, S, p, as an analogue to the output of NestedSamples.stats.

tension_calculator.py Outdated Show resolved Hide resolved
@AdamOrmondroyd
Copy link
Collaborator

Please remember to remove (git remove) the .DSstore files you've added (they are to do with macOS file management, so not relevant to the repo)

lukashergt and others added 12 commits September 29, 2023 17:20
…ation and testing it with correlated gaussian likelihoods. Found a problem with the function anesthetic.examples.perfect_ns.correlated_gaussian. The generated likelihood gaussian in the parameters is not normalised and the evidence is not unity. Need to take into account the LogLmax.
…ed_gaussian. Within the correlated_gaussian function, changed logLike function. Changed the function's description to match the fact that evidence is not unity.
…nction tension_stats() for calculating tension statistics. Rewrote the test_tension_stats.py in tests to match the format of other files. It tests mock datasets with guassian likelihood. Both compatiable and incompatiable datasets have passed the test.
…dd a file for datasets pairwise_comparison, but not completed
@williamjameshandley
Copy link
Collaborator

It would be good to get this finalised and merged now that #348 is complete -- any thoughts @DilyOng

DilyOng and others added 3 commits March 4, 2024 18:42
…the theoretical logR, logS and logI values sit within 3 std of the numerical solution's distribution from anesthetic, instead of testing between minimum and maximum values of the distribution.
anesthetic/tension.py Outdated Show resolved Hide resolved
anesthetic/tension.py Outdated Show resolved Hide resolved
anesthetic/tension_pvalue.py Outdated Show resolved Hide resolved
anesthetic/tension.py Outdated Show resolved Hide resolved
@williamjameshandley
Copy link
Collaborator

OK @DilyOng they seem to be running for me -- try correcting the version number in the README and init.py back to 2.10.0 to see if it's working now.

@DilyOng
Copy link
Collaborator Author

DilyOng commented Feb 4, 2025

OK @DilyOng they seem to be running for me -- try correcting the version number in the README and init.py back to 2.10.0 to see if it's working now.

@williamjameshandley It's all done - All checks have passed!

Copy link
Collaborator

@williamjameshandley williamjameshandley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I think this is ready for merging -- any further suggestions can be presented as additional pull requests.

Many thanks @DilyOng. Please press 'squash and merge'.

@williamjameshandley
Copy link
Collaborator

(Actually I think @AdamOrmondroyd also has to confirm the changes before merging)

Copy link
Collaborator

@lukashergt lukashergt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @williamjameshandley and @DilyOng,

I have already been testing this branch out a bit. In that context I realised that the current state is not quite flexible enough. Currently, we consider two datasets A and B separately, and the joint dataset AB. However, there are places where we will want to look at more than two datasets at the same time, e.g. A, B, and C. So we might want a more flexible... Thoughts?

I already have some modifications that could address this locally. Would it be ok for me to highjack this PR with these changes?

@lukashergt
Copy link
Collaborator

OK, I think this is ready for merging -- any further suggestions can be presented as additional pull requests.

Our comments crossed. The suggestions in my previous comment would be major change (in the semantic versioning lingo) to the user interface of these tension functions...

- ``I``: information ratio

.. math::
I = exp(D_{KL}^{A} + D_{KL}^{B} - D_{KL}^{AB})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not quite what I meant. I was suggesting the following re-definition of equation (9) in the Quantifying tensions paper (note the lack of exp):

I = D_A + D_B - D_AB

such that equation (10) becomes:

logS = logR - I

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lukashergt if we're doing arbitrary numbers of datasets, then we'll need to tweak these equations too, something like

$$I = \sum_i {\mathcal{D}_\mathrm{KL}}_i - {\mathcal{D}_\mathrm{KL}}_{H_0}$$

?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sidenote: oooh, neat, I didn't know that Markdown can by now handle math input :)

That said, for docstrings I would go for maximal readability even without rendering, so I'd say a simple math example is enough. Leave the rest to papers, or if really necessary, write a dedicated documentation page...?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lukashergt Thank you very much for clarifying. I have changed the relevant lines.

samples['logR'] = statsAB['logZ'] - statsA['logZ'] - statsB['logZ']
samples.set_label('logR', r'$\ln\mathcal{R}$')

samples['I'] = np.exp(statsA['D_KL'] + statsB['D_KL'] - statsAB['D_KL'])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accordingly, this should be without exp, too.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lukashergt Updated!


assert s.logS.mean() == approx(s.logR.mean() - s.logI.mean(),
assert s.logS.mean() == approx(s.logR.mean() - np.log(s.I).mean(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And accordingly this should not have the np.log.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lukashergt Updated.

import numpy as np


def stats(A, B, AB, nsamples=None, beta=None): # noqa: D301
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lukashergt @DilyOng for multiple datasets, I think unpacking will neatly handle arbitrary numbers of datasets, something like

def stats(h0, *h1, nsamples=None, beta=None):
    ```h0 = null hypothesis = AB, h1stats = alternative hypothesis = A, B etc```
    ...
    samples['logR'] = h0stats['logZ'] - sum(_h1stats['logZ'] for _h1stats in h1stats)
    ...

which can be called tension.stats(abcde, a, b, c, d, e)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's about what I have. I call them joint and separate, which I find a bit more descriptive...

Parameters
----------
A : :class:`anesthetic.samples.Samples`
:class:`anesthetic.samples.NestedSamples` object from a sampling run
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is contradictory: does A have to be Samples or NestedSamples?

- ``I``: information ratio

.. math::
I = exp(D_{KL}^{A} + D_{KL}^{B} - D_{KL}^{AB})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lukashergt if we're doing arbitrary numbers of datasets, then we'll need to tweak these equations too, something like

$$I = \sum_i {\mathcal{D}_\mathrm{KL}}_i - {\mathcal{D}_\mathrm{KL}}_{H_0}$$

?

…he previous log I, the logarithm is incorporated in I. Updated all relevant lines in anesthetic/tension.py and tests/test_tension.py.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants