Formula for Gini coefficient #92
-
Hi Peter, I wanted to ask about the calculation used in the code, for example, to calculate Gini coefficient of firing rates (https://github.com/petersenpeter/CellExplorer/blob/master/ProcessCellMetrics.m#L1412). The formula seems simplistic, and I am trying to figure out how to derive it from the original definition. I believe this diverges from the formula on Wikipedia for Gini coefficient calculated on sorted values or its unbiased estimator (https://en.wikipedia.org/wiki/Gini_coefficient#Alternative_expressions). I appreciate your work on the tool and find it extremely helpful in viewing and reasoning about my data! Kind regards, |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 1 reply
-
Hi Prez The Gini calculation follows the definition on the Wikipedia page (I use the third definition 1-2B): Gini = A/(A + B) = 2A = 1 − 2B where B is the cumulative sorted distribution of X values across the population, which could be the average firing rates across the population (L1408-L1412). |
Beta Was this translation helpful? Give feedback.
-
Thanks for clarifying; I can understand the reasoning now. A concern with the current implementation might be that the calculated values are somewhat off. For example:
The values get closer to the expected coefficient when the vector is x longer, so this might be less of an issue for real-case scenarios. |
Beta Was this translation helpful? Give feedback.
-
Hi again Prez I gave this another go. Please see the new calc_gini function: You are correct that equal bins with perfect equality should give 0. I believe I made a mistake in the normalization with the number of samples. I normalized the sum (line 3 below) with the number of cumulated values, n, where I should have normalized by n+1.
After this correction the perfect equality gives a gini coefficient = 0. Perfect inequality can only be achieved by a very large n, and you will always have a small binning error. Let me know what you think. |
Beta Was this translation helpful? Give feedback.
Hi Prez
The Gini calculation follows the definition on the Wikipedia page (I use the third definition 1-2B):
Gini = A/(A + B) = 2A = 1 − 2B
where B is the cumulative sorted distribution of X values across the population, which could be the average firing rates across the population (L1408-L1412).