-
-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Join ComplexityMeasures.jl? #6
Comments
Warmest congratulations on the release of ComplexityMeasures.jl and on the preprint of your paper about the package. As highlighted in the preprint, it is a wonderful resource that has been cleverly designed and will offer many advantages to its users. You and your team should be commended for the monumental amount of time and effort that has gone into it. 1: ComparisonBidimensional Multivariate Overall comparison In Table 1, it states that there are 38 complexity measures in EntropyHub. Perhaps we've misinterpreted how the like-for-like comparison is made, but we believe this number could be slightly inaccurate. Three more examples have been added to the EntropyHub docs recently, bringing the total to 13 (not 10). 2: Joining ComplexityMeasures.jlIt's very kind for you to invite the EntropyHub developers to join ComplexityMeasures.jl. Like you, we have invested an immense amount of time and effort to produce a resource that makes complexity analysis more efficient. Personally speaking, my time is so limited these days that any contribution I could give to CM.jl would be too insignificant to have any productive influence. Abandoning EntropyHub is not something I wish to do for a few reasons. As you note, there is a publication associated with EntropyHub and we wish to keep all things consistent with that paper. Additionally, EntropyHub already We sincerely thank you again for your proposal and wish you every success with CM.jl!! |
Hi there,
with this issue we would like to do three things:
1: Comparison
Our paper was just put on arXiv: https://arxiv.org/abs/2406.05011. The associated code base that does the performance comparison is here: https://github.com/Datseris/ComplexityMeasuresPaper . We compared performance of the Julia version, assuming that this would be the most performant implementation of the software. Please provide us with numbers from Python or MATLAB versions if you believe these should be faster.
Please also let us know if you think the overall comparison with EntropyHub in our Table 1 is accurate or you believe it is unfair (and if so, how should we fix it). In particular, it was difficult for us to compare against and accurately estimate the total number of measures in EntropyHub.
Mainly because it was unclear to us what "Multivariate" entropies, or "Bidimensional entropies" are (some personal feedback here from George Datseris: it would be nice if there would be explanations in the docs for what these quantities actually are, beyond citing the articles).
Bidimensional
In ComplexityMeasures.jl we have spatial outcome spaces, so that one can estimate the entropy of permutation patterns but on spatial data (2D or arbitrarily-high D), given a stencil. This is done in the standard way one estimates the permutation entropy, but instead of timeseries, the "stencil" (which for normal permutation entropy would be a view of length-m) is iterated over the 2D image. We were confused by the statements in the docs that one would "run out of memory". In our implementations for spatial complexity measures, practically no memory is allocated, and one would be able to estimate the spatial permutation entropy for arbitrarily large matrices. This makes us suspect: perhaps we are not talking about the same thing...?
Multivariate
In ComplexityMeasures.jl we make no distinction between uni-variate and multi-variate timeseries. Everything that can be discretized / cast into a symbol, from which to estimate probabilities to give to the Shannon formula, is valid input that we just call "timeseries". When estimating the permutation entropy, we cast the input timeseries into a sequence of ordinal patterns.
In EntropyHub, we are not sure what "multivariate permutation entropy" means, exactly, from a computing point of view. We are mainly confused by the documentation statement "the permutation entropy for the M multivariate sequences in Data". A multivariate input has only a single multivariate sequence, that's kind of why its called "multivariate". Otherwise the input would be many univariate timeseries from which one picks "M" multivariate sequences. Can you please explain what we have misunderstood and whether, or how, the multivariate permutation entropy is different than the univariate permutation entropy?
2: Join ComplexityMeasures.jl
More than 7 years ago, the DynamicalSystems.jl project started with the goal of making nonlinear dynamics and nonlinear timeseries analysis accessible in the sense of giving people a well-tested, well-documented, universal software, that is also easy to become a developer of and contribute there, all the while following as open of a development approach as possible. ComplexityMeasures.jl shares these goals. In our paper we highlight our approach to open development of code, and we take special care to design the source code so that it is very easy for a newcomer to contribute.
There is clearly a lot of overlap between ComplexityMeasures.jl and EntropyHub: most measures are implemented in both. We believe it is best for the wider community of academics and non-academics interested in complexity and entropy applications, that a single, overarching software exists, that includes all positive aspects of all current disparate software while addressing and eliminating as many negative aspects as possible.
EntropyHub clearly had a lot of thought and development effort go into it, as is reflected by the nunerous measures it provides, some of which are not in ComplexityMeasures.jl yet. Additionally, given the clarification of the questions of the previous section, there may be much more functionality and methods from the literature in EntropyHub that doesn't yet exist in ComplexityMeasures.jl that we've simply missed. The developer(s) of EntropyHub have clearly spent their fair share of time in entropic/complexity timeseries analysis and have generated significant expertise. Having such a developer in ComplexityMeasures.jl would increase productivity for the community as a whole, and generate even more discussions and suggestions for new measures in the already extensive open list of possible additions to the software (https://github.com/JuliaDynamics/ComplexityMeasures.jl/issues). Lastly, EntropyHub has experience in providing a software for multiple languages, which we have little experience with. Unifying the two would had big positive impact on users of Python and Matlab. Note: there is currently a large amount of code duplication, not only individually in the EntropyHub.jl repo, but in essence, the code is duplicated thrice when considering the Python and Matlab version too. In our wished unified future the Python version should be a wrapper of the Julia code via PythonCall.jl, and similarly for Matlab. This would allow Python and Matlab users to harvest the power of the highly performant Julia code, while severely reducing maintenance effort across three separate libraries (this would be especially impactful on code reliability, as only one code base would need to be tested)
ComplexityMeasures.jl, on the other hand, has the main advantage of its fundamentally novel design of orthogonalizing how to create/compute a complexity measure, which is the main point of our paper, Section 2. It took us about 2 years of intense research to conclude in this design. Please do have a read in Section 2, as we believe this design has some truly unique advantages over the traditional approach of one function per estimator.
Some other advantages of ComplexityMeasures.jl are the large performance improvements over EntropyHub, smaller source code (per measure), larger documentation, an extensive test suite, and software development knowledge (stemming from 7+ years of building JuliaDynamics). For example, not having a plotting dependency making compilation of the package much faster, and this can now be managed with package extensions (c.f. #5).
Porting all of these advantages over to EntropyHub would take much more effort, and more redesigning, than porting EntropyHub's advantages to ComplexityMeasures.jl, which already has implemented an extendable design.
We definitely believe that unifying our efforts, instead of both of us "reinventing the wheel" all the time by implementing features existing in the other software is really the best outcome for the community. That is why we hope you will consider joining ComplexityMeasures.jl in favor of further developing EntropyHub.
3: The alterative
Being realistic, we understand that the above is unlikely to happen, because EntropyHub already has a publication associated with it. That is why we propose a middle ground solution to stop the reinventing the wheel problem, allow us to join hands, while still allowing you to have and promote EntropyHub.
Make EntropyHub.jl a wrapper of ComplexityMeasures.jl. All of its source code would be wiped out, and replaced by wrapper functions that have the same name as they currently do, but they call the corresponding ComplexityMeasures.jl implementation. For example, the 200 lines of source code of the permutation entropy would be replaced by
Additionally, in this way, the actual estimation of the permutation entropy would be tested against our existing test suite.
When you want to add a new method, you implement it normally in ComplexityMeasures.jl following our Developer's Documentation, and then make a wrapper function in EntropyHub. We follow agile development practices: even the tiniest addition to ComplexityMeasures.jl generates instantly a new package version, so any change in ComplexityMeasures.jl would immediately be reflected in EntropyHub.
But despite this possible solution, we still believe that having a single unified software is the best way forwards.
We hope you consider our proposal, and we stress again that we want to make the comparison in our paper as fair as possible; if we missed anything, or mis-represented EntropyHub in any way, please do let us know.
best,
George and Kristian (cc @kahaaga)
The text was updated successfully, but these errors were encountered: