Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mutual Information nodes - questions #112

Open
niederle opened this issue Jul 31, 2018 · 2 comments
Open

Mutual Information nodes - questions #112

niederle opened this issue Jul 31, 2018 · 2 comments

Comments

@niederle
Copy link
Collaborator

@Meyenhofer

I played around with both MI-nodes and I have a few theory questions (I hope you can remember):

  1. when would I need to change the method and what does the bias result value tell me
  2. what does the sigma result value tell me
  3. when would I change the log-value

And one question concerning the 'Group MI' node:
I had expected, the MI between two groups gives an idea how similar the histograms are. I created a dummy dataset, 3000 entries for 'library with gaussian data (m=0, sd = 1) and 100 entries 'control' with the same gaussian parameters. I would expect a high MI, as both histograms should be very similar (shapewise). Instead, I get a low value. I made another test with 3000 entries for each group and the value is still low. But if I sort values within the groups from low to high, I get a high MI. From the calculation point it makes sense (as it is based on the join histogram). But I don't get how it could be useful to find out whether one group carries information the reference does not (as mentioned in the help of that node).

Any help to understand these nodes is appreciated :-)

@fmeyenhofer
Copy link
Collaborator

I probably also should have linked the matlab implementation by the author somewhere...
As Moddemeijer explains in this article:

  • MI tends towards 0 if X and Y are independent
  • MI between X and Y reaches maximum if there is a bijective relation between the two variables (i.e. dependent)

Concerning your questions:

when would I need to change the method and what does the bias result value tell me?

The bias is the difference between expected value and estimated value (see wikipedia). So the unbiased method tries to compensate for the bias, subtracting the bias from the estimate.
MSSE (Mean Square Error estimate) further transforms the initial unbiased estimate. What that means in details you will need to dive into the literature and make a few more tests.

what does the sigma result value tell me

That is the MI's standard error

when would I change the log-value

The log base can be understood as the unit of measurement. Common choices are 'e' (nats), 2 (bit) and 10 (Hartley)

Concerning your experiment:
What is low?
The problem with MI is that it heavily depends on the binning and the maximum MI may be different for different parameters. So the first thing to try is to add a case with very different distributions to you experiment and compare the MI values of both cases.

@niederle
Copy link
Collaborator Author

niederle commented Aug 1, 2018

Thanks a lot for all you answers! That helps already.

Concerning my test: I will link a workflow which calculates the MI for 8 different conditions

  1. library 3000 values, control 3000 values OR library 3000 values, control 100 values
  2. per group sorted values OR unsorted
  3. library and control get values from the same distribution (m=0, sd = 0) OR library distribution (m=0, sd =1) and control distribution (m=10, sd =1) differ
    I used biased calculation but the relation of the 8 MI-values is more or less the same for unbiased calculation (though different range).
    Would you mind to have a look at it if these values meets your expectations?

https://cloud.mpi-cbg.de/index.php/s/YeCzUmGzIBTaf29

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants