-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Calculation of LvR and correlation coefficients of ground state #14
Comments
Could it be the condition for neurons to have spiked at least once where this goes wrong? So that you are still including some silent neurons in the calculation? |
Well-spotted! |
So it's working but not leaving 2000 neurons |
Yes that sounds right. I agree the relevant wording in the paper is ambiguous at best... |
So were the stats calculated using a longer simulation? |
No, I don't think so. The paper states that for chi=1, the simulation duration was 10.5 s. I think the calculation started with 2000 neurons, from which the ones that spiked at least once were taken, leaving fewer than 2000 neurons in some cases. |
Ahh ok so how come our recalculation of the stats using the same code and (presumably) the same data don't match? I don't see any non-determinism in that code |
I think that @albada is right in saying that for some populations, there were probably fewer than 2000 neurons entering the calculation, simply because these population are small in the first place and then have low rates. For what it's worth, I wouldn't call these deviations significant. Keep in mind that these values are very low, 10^-3 - 10^-4, on a scale of 0 to 1. If your simulations do not produce exactly the same spikes, these deviations can easily occur, but they're not significant, in my opinion. I think, unless you're using the exact same configuration for your compute system (MPI processes, threads) and the same NEST version (+ some other dependencies that influence the random numbers), it's unlikely that you can produce the same spikes. |
Hey @mschmidt87 - thanks for looking at this. My concern is that, as a test, we're calculating these metrics from the published spiking data using the published code and we don't get the published correlation coefficients |
Okay, sorry, I missed that point, I thought you had run your own simulations. Let me take a look at that tomorrow then, I'll get back to you. |
Thank you so much! |
I can reproduce your finding for I couldn't find the problem in the analysis code (I tried to change the part where we subsample etc., but no success). Since I produced the json files with the code that is in the repo and the file hasn't been modified, I am suspecting that this might perhaps be a version problem of the dependencies. I am now using Python 3.8 and numpy 1.18.1 as well as the latest master of |
Glad to hear you can reproduce. Wouldn't it be possible that the spike rate would result in the same mean rates but different correlatation and irregularity (in the extreme case you could generate spike trains from populations of poisson sources with the same mean rates)? Nonetheless, I can try and investigate older versions today. correlation_toolbox doesn't look to have changed a huge amount at least. The data was pushed on the 8/1/2018 so, assuming you created it around then, I can try and bisect numpy versions. |
I've tried using numpy 1.13.3 and 1.11.3 (which seems like a good guess for era-appropriate bleeding edge and less bleeding edge) on both python3 and python2 and using the older version of |
Thanks for your tests. Of course, it is possible to produce the exact same rates with different 2nd order statistics, but to achieve that with two different runs of the same simulation (with different RNG seeds), which is what I would have suspected, is extremely likely, i.e. can be excluded. I'll think about it a little more. |
That is very true |
Hey @mschmidt87, any thoughts? |
I've done a little bit of code archeology and found a change in the LvR calculation. If I calculate the LvR before with and without this change: from correlation_toolbox import helper as ch
import numpy as np
# **NOTE** this is heavily based off the analysis code from the paper
def load(filename, duration_s, num, check):
tmin = 500.
subsample = 2000
resolution = 1.
tref= 2.0
spikes = np.load(filename)
# calc lvr
i_min = np.searchsorted(spikes[:, 1], tmin)
i_max = np.searchsorted(spikes[:, 1], duration_s * 1000.0)
LvR = np.array([])
data_array = spikes[i_min:i_max]
for i in np.unique(data_array[:, 0]):
intervals = np.diff(data_array[np.where(data_array[:, 0] == i)[0], 1])
if intervals.size > 1:
val = np.sum((1. - 4 * intervals[0:-1] * intervals[1:] / (intervals[0:-1] + intervals[
1:]) ** 2) * (1 + 4 * tref / (intervals[0:-1] + intervals[1:])))
LvR = np.append(LvR, val * 3 / (intervals.size - 1.))
else:
LvR = np.append(LvR, 0.0)
# CHANGE HERE
if check and len(LvR) < num:
LvR = np.append(LvR, np.zeros(num - len(LvR)))
return np.mean(LvR)
# Values from the json file on gnode
dataset_fef_6e_lvr = 0.4178813296671444
dataset_fef_5i_lvr = 0.9737456740769203
duration_s = 10.5
# Population sizes
num_fef_5i = 3721
num_fef_6e = 16128
# Load data
nest_fef_5i_lvr = load("33fb5955558ba8bb15a3fdce49dfd914682ef3ea-spikes-FEF-5I.npy", duration_s, num_fef_5i, False)
nest_fef_6e_lvr = load("33fb5955558ba8bb15a3fdce49dfd914682ef3ea-spikes-FEF-6E.npy", duration_s, num_fef_6e, False)
print("FEF 5I LvR - NEST:%f, Dataset:%f" % (nest_fef_5i_lvr, dataset_fef_5i_lvr))
print("FEF 6E LvR - NEST:%f, Dataset:%f" % (nest_fef_6e_lvr, dataset_fef_6e_lvr)) With
With
Which is significantly closer to the values in the published JSON. As this change was after the original submission date, might the published LvR data have been calculated prior to this change? |
I think you are probably right that this padding of silent neurons with 0.0 values to the results had not been used in the publication and the published data. The manuscript says in the methods: This does not mention the padding and your calculations indicate that at least the new results are closer to the bublished with I guess it's a question of definition how silent neurons should be accounted for in the calculations, but I would say now that assigning them value of 0 does not make much sense. I think that @akorgor applied your code to more populations and can confirm your findings. |
thanks for looking into this some more @mschmidt87! It's still weird that the value for FEF 5I fits exactly but the one for FEF 6E doesn't though - again that seems unlikely to be caused by two different runs of the same simulation. |
@akorgor has kindly shared the full set of spike trains with me so I thought I'd share this (very bad) plot of the distribution of LvR metrics for 6E: |
I found that if in the code snippet above on the LvR calculation |
For the correlation coefficient calculation however, I could not find the reason for the deviations until now. The gnode values are for the majority of the populations (15/24) larger than the ones from the recalculation. The differences range from -0.00025 to +0.00025 and do not seem to depend on the total number of neurons in the population or the number of spiking neurons. Except for FEF 6E there are always more than |
Thank you @akorgor for that detective work (and obvisouly sorry for the misleading code and created confusion). I think that this these choices are definitely debatable, however, since the deviations are not too large from the data shown in the paper, I would suggest to just edit the code in the repo such that the results match and stick to the algorithm used for the paper. |
What does the lower panel show? I think it can't be the values from the published paper, because the values for 6E are mostly well below 1 (Fig. 3F). |
The lower panel is the values in the json file on gnode (from the simulation where chi=1.9) |
Thanks to you awesome help with #12, I've been comparing some of our simulation runs with the
33fb5955558ba8bb15a3fdce49dfd914682ef3ea
dataset and are having some issues especially with the populations with low firing rates. To try and narrow down where this is coming from, I tried re-using the analysis code from the repository on the FEF spike data from the33fb5955558ba8bb15a3fdce49dfd914682ef3ea
dataset as follows:The output shows fairly significant differences between the versions from the json file in the same directory:
Am I doing something dumb or missing something?
Thanks in advance for your help!
The text was updated successfully, but these errors were encountered: