-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementing tension statistics #333
base: master
Are you sure you want to change the base?
Changes from 1 commit
c0946bd
70c0aa4
ee1db5b
ffd5fae
a3d06b5
4a5d4b5
38d31c2
932a201
4046bc3
de17f04
6ab8e4f
3698669
aa0c177
2f472b5
5209ad2
6fb5fe0
151a91d
b0994b3
aec6a25
46ec095
be72001
03b9b1c
9dd5ca1
e82f125
979f621
84c8e6e
9909190
6723f4c
9739361
25ab2c1
1691a02
110029a
89fab36
8f2e60a
290f257
815a7e5
ab2d14e
c11c5f6
d1f1c2a
fbd733e
d4659bd
877d79d
e568bfd
4947c69
7a10785
970f431
69c932a
7e90ac5
75d3067
740eacf
d236f34
79838dd
73d6180
5675e25
a7b7fb5
e3bf650
9dae22f
8c70f2f
88b7ec6
51c0975
0649d53
6b07788
ce82e13
d31b6e0
d2db456
7de2543
7717b04
71325d7
e42b048
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,7 @@ | ||
"""Tension statistics between two datasets.""" | ||
from anesthetic.samples import Samples | ||
from scipy.stats import chi2 | ||
import numpy as np | ||
|
||
|
||
def stats(A, B, AB, nsamples=None, beta=None): # noqa: D301 | ||
|
@@ -13,10 +14,10 @@ def stats(A, B, AB, nsamples=None, beta=None): # noqa: D301 | |
.. math:: | ||
\log R = \log Z_{AB} - \log Z_{A} - \log Z_{B} | ||
|
||
- ``logI``: information ratio | ||
- ``I``: information ratio | ||
|
||
.. math:: | ||
\log I = D_{KL}^{A} + D_{KL}^{B} - D_{KL}^{AB} | ||
I = exp(D_{KL}^{A} + D_{KL}^{B} - D_{KL}^{AB}) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is not quite what I meant. I was suggesting the following re-definition of equation (9) in the Quantifying tensions paper (note the lack of
such that equation (10) becomes:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @lukashergt if we're doing arbitrary numbers of datasets, then we'll need to tweak these equations too, something like ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sidenote: oooh, neat, I didn't know that Markdown can by now handle math input :) That said, for docstrings I would go for maximal readability even without rendering, so I'd say a simple math example is enough. Leave the rest to papers, or if really necessary, write a dedicated documentation page...? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @lukashergt Thank you very much for clarifying. I have changed the relevant lines. |
||
|
||
- ``logS``: suspiciousness | ||
|
||
|
@@ -65,7 +66,7 @@ def stats(A, B, AB, nsamples=None, beta=None): # noqa: D301 | |
------- | ||
samples : :class:`anesthetic.samples.Samples` | ||
DataFrame containing the following tension statistics in columns: | ||
['logR', 'logI', 'logS', 'd_G', 'p'] | ||
['logR', 'I', 'logS', 'd_G', 'p'] | ||
""" | ||
columns = ['logZ', 'D_KL', 'logL_P', 'd_G'] | ||
if set(columns).issubset(A.drop_labels().columns): | ||
|
@@ -89,8 +90,8 @@ def stats(A, B, AB, nsamples=None, beta=None): # noqa: D301 | |
samples['logR'] = statsAB['logZ'] - statsA['logZ'] - statsB['logZ'] | ||
samples.set_label('logR', r'$\ln\mathcal{R}$') | ||
|
||
samples['logI'] = statsA['D_KL'] + statsB['D_KL'] - statsAB['D_KL'] | ||
samples.set_label('logI', r'$\ln\mathcal{I}$') | ||
samples['I'] = np.exp(statsA['D_KL'] + statsB['D_KL'] - statsAB['D_KL']) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Accordingly, this should be without There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @lukashergt Updated! |
||
samples.set_label('I', r'$\mathcal{I}$') | ||
|
||
samples['logS'] = statsAB['logL_P'] - statsA['logL_P'] - statsB['logL_P'] | ||
samples.set_label('logS', r'$\ln\mathcal{S}$') | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -46,14 +46,14 @@ def test_tension_stats_compatible_gaussian(): | |
logS_exact = d / 2 - dmu_cov_dmu_AB / 2 | ||
assert s.logS.mean() == approx(logS_exact, abs=3*s.logS.std()) | ||
|
||
logI_exact = logV - d / 2 - slogdet(2*np.pi*(covA+covB))[1] / 2 | ||
assert s.logI.mean() == approx(logI_exact, abs=3*s.logI.std()) | ||
I_exact = np.exp(logV - d / 2 - slogdet(2*np.pi*(covA+covB))[1] / 2) | ||
assert s.I.mean() == approx(I_exact, abs=3*s.I.std()) | ||
|
||
assert s.logS.mean() == approx(s.logR.mean() - s.logI.mean(), | ||
assert s.logS.mean() == approx(s.logR.mean() - np.log(s.I).mean(), | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. And accordingly this should not have the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @lukashergt Updated. |
||
abs=3*s.logS.std()) | ||
|
||
assert s.get_labels().tolist() == ([r'$\ln\mathcal{R}$', | ||
r'$\ln\mathcal{I}$', | ||
r'$\mathcal{I}$', | ||
r'$\ln\mathcal{S}$', | ||
r'$d_\mathrm{G}$', | ||
r'$p$']) | ||
|
@@ -106,14 +106,14 @@ def test_tension_stats_incompatible_gaussian(): | |
logS_exact = d / 2 - dmu_cov_dmu_AB / 2 | ||
assert s.logS.mean() == approx(logS_exact, abs=3*s.logS.std()) | ||
|
||
logI_exact = logV - d / 2 - slogdet(2*np.pi*(covA+covB))[1] / 2 | ||
assert s.logI.mean() == approx(logI_exact, abs=3*s.logI.std()) | ||
I_exact = np.exp(logV - d / 2 - slogdet(2*np.pi*(covA+covB))[1] / 2) | ||
assert s.I.mean() == approx(I_exact, abs=3*s.I.std()) | ||
|
||
assert s.logS.mean() == approx(s.logR.mean() - s.logI.mean(), | ||
assert s.logS.mean() == approx(s.logR.mean() - np.log(s.I).mean(), | ||
abs=3*s.logS.std()) | ||
|
||
assert s.get_labels().tolist() == ([r'$\ln\mathcal{R}$', | ||
r'$\ln\mathcal{I}$', | ||
r'$\mathcal{I}$', | ||
r'$\ln\mathcal{S}$', | ||
r'$d_\mathrm{G}$', | ||
r'$p$']) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lukashergt @DilyOng for multiple datasets, I think unpacking will neatly handle arbitrary numbers of datasets, something like
which can be called
tension.stats(abcde, a, b, c, d, e)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's about what I have. I call them
joint
andseparate
, which I find a bit more descriptive...