Mutual Information nodes - questions #112

niederle · 2018-07-31T08:36:00Z

@Meyenhofer

I played around with both MI-nodes and I have a few theory questions (I hope you can remember):

when would I need to change the method and what does the bias result value tell me
what does the sigma result value tell me
when would I change the log-value

And one question concerning the 'Group MI' node:
I had expected, the MI between two groups gives an idea how similar the histograms are. I created a dummy dataset, 3000 entries for 'library with gaussian data (m=0, sd = 1) and 100 entries 'control' with the same gaussian parameters. I would expect a high MI, as both histograms should be very similar (shapewise). Instead, I get a low value. I made another test with 3000 entries for each group and the value is still low. But if I sort values within the groups from low to high, I get a high MI. From the calculation point it makes sense (as it is based on the join histogram). But I don't get how it could be useful to find out whether one group carries information the reference does not (as mentioned in the help of that node).

Any help to understand these nodes is appreciated :-)

fmeyenhofer · 2018-07-31T14:27:30Z

I probably also should have linked the matlab implementation by the author somewhere...
As Moddemeijer explains in this article:

MI tends towards 0 if X and Y are independent
MI between X and Y reaches maximum if there is a bijective relation between the two variables (i.e. dependent)

Concerning your questions:

when would I need to change the method and what does the bias result value tell me?

The bias is the difference between expected value and estimated value (see wikipedia). So the unbiased method tries to compensate for the bias, subtracting the bias from the estimate.
MSSE (Mean Square Error estimate) further transforms the initial unbiased estimate. What that means in details you will need to dive into the literature and make a few more tests.

what does the sigma result value tell me

That is the MI's standard error

when would I change the log-value

The log base can be understood as the unit of measurement. Common choices are 'e' (nats), 2 (bit) and 10 (Hartley)

Concerning your experiment:
What is low?
The problem with MI is that it heavily depends on the binning and the maximum MI may be different for different parameters. So the first thing to try is to add a case with very different distributions to you experiment and compare the MI values of both cases.

niederle · 2018-08-01T13:24:11Z

Thanks a lot for all you answers! That helps already.

Concerning my test: I will link a workflow which calculates the MI for 8 different conditions

library 3000 values, control 3000 values OR library 3000 values, control 100 values
per group sorted values OR unsorted
library and control get values from the same distribution (m=0, sd = 0) OR library distribution (m=0, sd =1) and control distribution (m=10, sd =1) differ
I used biased calculation but the relation of the 8 MI-values is more or less the same for unbiased calculation (though different range).
Would you mind to have a look at it if these values meets your expectations?

https://cloud.mpi-cbg.de/index.php/s/YeCzUmGzIBTaf29

niederle added node - group mutual information node - parameter mutual information medium priority labels Jul 31, 2018

niederle assigned fmeyenhofer Jul 31, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mutual Information nodes - questions #112

Mutual Information nodes - questions #112

niederle commented Jul 31, 2018

fmeyenhofer commented Jul 31, 2018

niederle commented Aug 1, 2018

Mutual Information nodes - questions #112

Mutual Information nodes - questions #112

Comments

niederle commented Jul 31, 2018

fmeyenhofer commented Jul 31, 2018

niederle commented Aug 1, 2018