Join ComplexityMeasures.jl? #6

Datseris · 2024-06-10T07:06:56Z

Hi there,

with this issue we would like to do three things:

1. share with you the paper we just wrote for ComplexityMeasures.jl and ask for your feedback and corrections (we compare vs EntropyHub),
1. invite you to join ComplexityMeasures.jl, and
1. offer an alternative if 2) isn't really an option for you!

1: Comparison

Our paper was just put on arXiv: https://arxiv.org/abs/2406.05011. The associated code base that does the performance comparison is here: https://github.com/Datseris/ComplexityMeasuresPaper . We compared performance of the Julia version, assuming that this would be the most performant implementation of the software. Please provide us with numbers from Python or MATLAB versions if you believe these should be faster.

Please also let us know if you think the overall comparison with EntropyHub in our Table 1 is accurate or you believe it is unfair (and if so, how should we fix it). In particular, it was difficult for us to compare against and accurately estimate the total number of measures in EntropyHub.
Mainly because it was unclear to us what "Multivariate" entropies, or "Bidimensional entropies" are (some personal feedback here from George Datseris: it would be nice if there would be explanations in the docs for what these quantities actually are, beyond citing the articles).

Bidimensional

In ComplexityMeasures.jl we have spatial outcome spaces, so that one can estimate the entropy of permutation patterns but on spatial data (2D or arbitrarily-high D), given a stencil. This is done in the standard way one estimates the permutation entropy, but instead of timeseries, the "stencil" (which for normal permutation entropy would be a view of length-m) is iterated over the 2D image. We were confused by the statements in the docs that one would "run out of memory". In our implementations for spatial complexity measures, practically no memory is allocated, and one would be able to estimate the spatial permutation entropy for arbitrarily large matrices. This makes us suspect: perhaps we are not talking about the same thing...?

Multivariate

In ComplexityMeasures.jl we make no distinction between uni-variate and multi-variate timeseries. Everything that can be discretized / cast into a symbol, from which to estimate probabilities to give to the Shannon formula, is valid input that we just call "timeseries". When estimating the permutation entropy, we cast the input timeseries into a sequence of ordinal patterns.

In EntropyHub, we are not sure what "multivariate permutation entropy" means, exactly, from a computing point of view. We are mainly confused by the documentation statement "the permutation entropy for the M multivariate sequences in Data". A multivariate input has only a single multivariate sequence, that's kind of why its called "multivariate". Otherwise the input would be many univariate timeseries from which one picks "M" multivariate sequences. Can you please explain what we have misunderstood and whether, or how, the multivariate permutation entropy is different than the univariate permutation entropy?

2: Join ComplexityMeasures.jl

More than 7 years ago, the DynamicalSystems.jl project started with the goal of making nonlinear dynamics and nonlinear timeseries analysis accessible in the sense of giving people a well-tested, well-documented, universal software, that is also easy to become a developer of and contribute there, all the while following as open of a development approach as possible. ComplexityMeasures.jl shares these goals. In our paper we highlight our approach to open development of code, and we take special care to design the source code so that it is very easy for a newcomer to contribute.

There is clearly a lot of overlap between ComplexityMeasures.jl and EntropyHub: most measures are implemented in both. We believe it is best for the wider community of academics and non-academics interested in complexity and entropy applications, that a single, overarching software exists, that includes all positive aspects of all current disparate software while addressing and eliminating as many negative aspects as possible.

EntropyHub clearly had a lot of thought and development effort go into it, as is reflected by the nunerous measures it provides, some of which are not in ComplexityMeasures.jl yet. Additionally, given the clarification of the questions of the previous section, there may be much more functionality and methods from the literature in EntropyHub that doesn't yet exist in ComplexityMeasures.jl that we've simply missed. The developer(s) of EntropyHub have clearly spent their fair share of time in entropic/complexity timeseries analysis and have generated significant expertise. Having such a developer in ComplexityMeasures.jl would increase productivity for the community as a whole, and generate even more discussions and suggestions for new measures in the already extensive open list of possible additions to the software (https://github.com/JuliaDynamics/ComplexityMeasures.jl/issues). Lastly, EntropyHub has experience in providing a software for multiple languages, which we have little experience with. Unifying the two would had big positive impact on users of Python and Matlab. Note: there is currently a large amount of code duplication, not only individually in the EntropyHub.jl repo, but in essence, the code is duplicated thrice when considering the Python and Matlab version too. In our wished unified future the Python version should be a wrapper of the Julia code via PythonCall.jl, and similarly for Matlab. This would allow Python and Matlab users to harvest the power of the highly performant Julia code, while severely reducing maintenance effort across three separate libraries (this would be especially impactful on code reliability, as only one code base would need to be tested)

ComplexityMeasures.jl, on the other hand, has the main advantage of its fundamentally novel design of orthogonalizing how to create/compute a complexity measure, which is the main point of our paper, Section 2. It took us about 2 years of intense research to conclude in this design. Please do have a read in Section 2, as we believe this design has some truly unique advantages over the traditional approach of one function per estimator.
Some other advantages of ComplexityMeasures.jl are the large performance improvements over EntropyHub, smaller source code (per measure), larger documentation, an extensive test suite, and software development knowledge (stemming from 7+ years of building JuliaDynamics). For example, not having a plotting dependency making compilation of the package much faster, and this can now be managed with package extensions (c.f. #5).

Porting all of these advantages over to EntropyHub would take much more effort, and more redesigning, than porting EntropyHub's advantages to ComplexityMeasures.jl, which already has implemented an extendable design.

We definitely believe that unifying our efforts, instead of both of us "reinventing the wheel" all the time by implementing features existing in the other software is really the best outcome for the community. That is why we hope you will consider joining ComplexityMeasures.jl in favor of further developing EntropyHub.

3: The alterative

Being realistic, we understand that the above is unlikely to happen, because EntropyHub already has a publication associated with it. That is why we propose a middle ground solution to stop the reinventing the wheel problem, allow us to join hands, while still allowing you to have and promote EntropyHub.

Make EntropyHub.jl a wrapper of ComplexityMeasures.jl. All of its source code would be wiped out, and replaced by wrapper functions that have the same name as they currently do, but they call the corresponding ComplexityMeasures.jl implementation. For example, the 200 lines of source code of the permutation entropy would be replaced by

function PermEn(Sig::AbstractArray{T,1} where T<:Real; m::Int=2, tau::Int=1, 
        Typex::String="none", tpx::Union{Real,Nothing}=nothing, Logx::Real=2, Norm::Bool=false)
    
    # decide the type based on the string
    if Typex == "none"
        ospace = OrdinalPatterns(; m, τ = tau)
    else
        # more here, utilizing tpx
    end
    est = Shannon(; base = Logx))
    if Norm
        return information_normalized(est, ospace, Sig)
    else
        return information(est, ospace, Sig)
    end
end

Additionally, in this way, the actual estimation of the permutation entropy would be tested against our existing test suite.

When you want to add a new method, you implement it normally in ComplexityMeasures.jl following our Developer's Documentation, and then make a wrapper function in EntropyHub. We follow agile development practices: even the tiniest addition to ComplexityMeasures.jl generates instantly a new package version, so any change in ComplexityMeasures.jl would immediately be reflected in EntropyHub.

But despite this possible solution, we still believe that having a single unified software is the best way forwards.

We hope you consider our proposal, and we stress again that we want to make the comparison in our paper as fair as possible; if we missed anything, or mis-represented EntropyHub in any way, please do let us know.

best,
George and Kristian (cc @kahaaga)

MattWillFlood · 2024-07-30T20:15:33Z

Warmest congratulations on the release of ComplexityMeasures.jl and on the preprint of your paper about the package. As highlighted in the preprint, it is a wonderful resource that has been cleverly designed and will offer many advantages to its users. You and your team should be commended for the monumental amount of time and effort that has gone into it.

1: Comparison

Bidimensional
The bidimensional entropies in EntropyHub (e.g. SampEn2D or PermEn2D) are just the same as the measures in ComplexityMeasures.jl for 2D outcome spaces. Instead of the term stencil, we use template. The memory issue we mention is related to the way our functions are written (apologies for any confusion there). We'll have to look at CM.jl to see how we can reduce memory allocation as you've achieved!

Multivariate
Functions in EntropyHub are differentiated by the type of data they process. So the Base functions process univariate sequences, Cross functions process two univariate sequences, Bidimensional process 2D univariate matrices. Multivariate functions take an NxM matrix representing the M multiple sequences of N samples making up the mutivariate series (e.g. M channels of an EEG recording). We'll correct that phrasing in the docs. Again, apologies for the confusion.

Overall comparison
Generally speaking, I think the comparisons made with EntropyHub are fair. There are only a couple of things that I would suggest to consider.

In Table 1, it states that there are 38 complexity measures in EntropyHub. Perhaps we've misinterpreted how the like-for-like comparison is made, but we believe this number could be slightly inaccurate.
In EntropyHub, there are currently 20 standard complexity measures (what we've termed "Base" entropies). Each of these 20 measures can be combined with each of the 4 Multiscale entropy functions (in just two lines of code) to estimate their multiscale variant. In our view, that would mean there are at least 80 potential complexity measures that can be implemented. Similarly, each of the 5 multivariate entropies can be combined with the 2 multivariate multiscale functions to add another 10 measures as well as 6 bidimensional entropy measures.
Not including the cross-entropy and multiscale cross-entropy measures for absolute consistency with CM.jl, there are at present 96 (80+10+6) measures rather than 38.

Three more examples have been added to the EntropyHub docs recently, bringing the total to 13 (not 10).

2: Joining ComplexityMeasures.jl

It's very kind for you to invite the EntropyHub developers to join ComplexityMeasures.jl. Like you, we have invested an immense amount of time and effort to produce a resource that makes complexity analysis more efficient. Personally speaking, my time is so limited these days that any contribution I could give to CM.jl would be too insignificant to have any productive influence.

Abandoning EntropyHub is not something I wish to do for a few reasons. As you note, there is a publication associated with EntropyHub and we wish to keep all things consistent with that paper. Additionally, EntropyHub already
forms a susbstantial core of other packages (e.g. NeuroKit2 has copied a nearly all entropy functions from EntropyHub into their toolkit https://neuropsychology.github.io/NeuroKit/functions/complexity.html#entropy).
In any case, given the claims made in the preprint about the power of CM.jl, then with or without EntropyHub, a single unified software for complexity analysis may materialise regardless...

We sincerely thank you again for your proposal and wish you every success with CM.jl!!

MattWillFlood closed this as completed Jul 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Join ComplexityMeasures.jl? #6

Join ComplexityMeasures.jl? #6

Datseris commented Jun 10, 2024

MattWillFlood commented Jul 30, 2024

Join ComplexityMeasures.jl? #6

Join ComplexityMeasures.jl? #6

Comments

Datseris commented Jun 10, 2024

1: Comparison

Bidimensional

Multivariate

2: Join ComplexityMeasures.jl

3: The alterative

MattWillFlood commented Jul 30, 2024

1: Comparison

2: Joining ComplexityMeasures.jl