Fitting user-specified PDF, e.g. power spectral density #62

smartass101 · 2018-10-02T14:20:44Z

I really like your software, it makes it easier to judge the hype of powerlaws in datasets.
However, right now it focuses on fitting full datasets, creating their PDF and CDF on the fly. I'd like to use in situations where I already have a PDF (defined at several points) - or generally a distribution function of some sort - and fit its shape in some range.
An example is the power spectral density of fluctuations in turbulent plasmas, where there is an ongoing discussion whether they are powerlaws or exponentials.

I'd be wiling to contribute modifications to powerlaw which would make this optional sue-case possible. But I would greatly appreciate if you could point out how best to approach this issue.

jeffalstott · 2018-10-02T23:12:10Z

Thanks, Ondrej! For your needs, this is the relevant paper: https://projecteuclid.org/euclid.aoas/1396966280 I haven't implemented it, but there may be an implementation somewhere here: http://tuvalu.santafe.edu/~aaronc/powerlaws/ If a good implementation were created for powerlaw, I'd happily bring it on board.

…

On Tue, Oct 2, 2018 at 10:20 AM Ondrej Grover ***@***.***> wrote: I really like your software, it makes it easier to judge the hype of powerlaws in datasets. However, right now it focuses on fitting full datasets, creating their PDf and CDF on the fly. I'd like to use in situations where I already have a PDF (defined at several points) - or generally a distribution function of some sort - and fits shape in some range. An example is the power spectral density of turbulent plasmas, where there is an ongoing discussion <https://dx.doi.org/10.1103/PhysRevLett.107.185003> whether they are powerlaws or exponentials. I'd be wiling to contribute modifications to powerlaw which would make this optional sue-case possible. But I would greatly appreciate if you could point out how best to approach this issue. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#62>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA6_rwpkcC2lQ7V6ndeNkTO_JlVtoSrbks5ug3Y9gaJpZM4XEI98> .

smartass101 · 2018-10-03T06:44:35Z

Thank you for the reply.
My naive hope was that it would suffice to simply enable the user to specify the cdf and bins directly, i.e. set self.fitting_cdf_bins, self.fitting_cdf without the actual data as done [here](self.fitting_cdf_bins, self.fitting_cdf). Then I would probably have to change operations later on to operate on the CDF instead of the data itself.
Perhaps a reasonable approach would be to wrap the data in some object which would expose methods such as cdf, this would separate whatever source of the information on the data distribution from the actual calculation with the distribution.
But perhaps I have missed some part where access to actual data is necessary.
What do you think about this approach?

smartass101 · 2018-10-03T06:51:10Z

I also found out that their implementation of the operations on binned data is available at http://tuvalu.santafe.edu/~aaronc/powerlaws/bins/

jeffalstott · 2018-10-03T19:16:56Z

The methods currently in powerlaw do not do fitting based on the binned data; they work directly on the data points themselves. Binning is done only for visualizing PDFs (in a sense there is no binning for CDFs, which is actually a major reason to use them for visualization, as they do less damage to the data in presentation).

…

On Wed, Oct 3, 2018 at 2:44 AM Ondrej Grover ***@***.***> wrote: Thank you for the reply. My naive hope was that it would suffice to simply enable the user to specify the cdf and bins directly, i.e. set self.fitting_cdf_bins, self.fitting_cdf without the actual data as done [here](self.fitting_cdf_bins, self.fitting_cdf). Then I would probably have to change operations later on to operate on the CDF instead of the data itself. Perhaps a reasonable approach would be to wrap the data in some object which would expose methods such as cdf, this would separate whatever source of the information on the data distribution from the actual calculation with the distribution. But perhaps I have missed some part where access to actual data is necessary. What do you think about this approach? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#62 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA6_r_GX72BB3SSY38M4tKXzolSJM_YRks5uhFzbgaJpZM4XEI98> .

smartass101 · 2018-10-05T06:29:28Z

I've been reading that article and I began to realize that it may not be directly applicable to the PSD case. The reason is that most algorithms (FFT or wavelet) do not give the PSD as a histogram, but rather actual point-wise estimates, i.e. PSD(f_k) for all f_k. The f_k can be spaced either linearly (usually the case with FFT-based algorithms) or logarithmically (often the case in continuous wavelet analysis).

A dirty (probably not completely wrong, but neither right) workaround would be to generate surrogate datasets based on the pdf given by the PSD. I've seen it done e.g. here.

Perhaps I should get in touch with Clauset and ask him for guidance in this.

smartass101 · 2018-10-05T08:23:11Z

Clauset seems to be on sabbatical. I had another idea, perhaps I could simply use the Kolmogorov-Smirnov test to determine the "distance" between the PSD and a given distribution. Chi^2 might be an alternative. But that would mean determining the fitted parameters an f_k_min at the same, time, not sure if that would be a problem.

smartass101 · 2018-10-05T08:26:20Z

Mentioning directly @aaronclauset in case you have time (and interest) to comment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fitting user-specified PDF, e.g. power spectral density #62

Fitting user-specified PDF, e.g. power spectral density #62

smartass101 commented Oct 2, 2018 •

edited

Loading

jeffalstott commented Oct 2, 2018 via email

smartass101 commented Oct 3, 2018

smartass101 commented Oct 3, 2018

jeffalstott commented Oct 3, 2018 via email

smartass101 commented Oct 5, 2018

smartass101 commented Oct 5, 2018

smartass101 commented Oct 5, 2018

Fitting user-specified PDF, e.g. power spectral density #62

Fitting user-specified PDF, e.g. power spectral density #62

Comments

smartass101 commented Oct 2, 2018 • edited Loading

jeffalstott commented Oct 2, 2018 via email

smartass101 commented Oct 3, 2018

smartass101 commented Oct 3, 2018

jeffalstott commented Oct 3, 2018 via email

smartass101 commented Oct 5, 2018

smartass101 commented Oct 5, 2018

smartass101 commented Oct 5, 2018

smartass101 commented Oct 2, 2018 •

edited

Loading