Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fitting user-specified PDF, e.g. power spectral density #62

Open
smartass101 opened this issue Oct 2, 2018 · 7 comments
Open

Fitting user-specified PDF, e.g. power spectral density #62

smartass101 opened this issue Oct 2, 2018 · 7 comments

Comments

@smartass101
Copy link

smartass101 commented Oct 2, 2018

I really like your software, it makes it easier to judge the hype of powerlaws in datasets.
However, right now it focuses on fitting full datasets, creating their PDF and CDF on the fly. I'd like to use in situations where I already have a PDF (defined at several points) - or generally a distribution function of some sort - and fit its shape in some range.
An example is the power spectral density of fluctuations in turbulent plasmas, where there is an ongoing discussion whether they are powerlaws or exponentials.

I'd be wiling to contribute modifications to powerlaw which would make this optional sue-case possible. But I would greatly appreciate if you could point out how best to approach this issue.

@jeffalstott
Copy link
Owner

jeffalstott commented Oct 2, 2018 via email

@smartass101
Copy link
Author

Thank you for the reply.
My naive hope was that it would suffice to simply enable the user to specify the cdf and bins directly, i.e. set self.fitting_cdf_bins, self.fitting_cdf without the actual data as done [here](self.fitting_cdf_bins, self.fitting_cdf). Then I would probably have to change operations later on to operate on the CDF instead of the data itself.
Perhaps a reasonable approach would be to wrap the data in some object which would expose methods such as cdf, this would separate whatever source of the information on the data distribution from the actual calculation with the distribution.
But perhaps I have missed some part where access to actual data is necessary.
What do you think about this approach?

@smartass101
Copy link
Author

I also found out that their implementation of the operations on binned data is available at http://tuvalu.santafe.edu/~aaronc/powerlaws/bins/

@jeffalstott
Copy link
Owner

jeffalstott commented Oct 3, 2018 via email

@smartass101
Copy link
Author

I've been reading that article and I began to realize that it may not be directly applicable to the PSD case. The reason is that most algorithms (FFT or wavelet) do not give the PSD as a histogram, but rather actual point-wise estimates, i.e. PSD(f_k) for all f_k. The f_k can be spaced either linearly (usually the case with FFT-based algorithms) or logarithmically (often the case in continuous wavelet analysis).

A dirty (probably not completely wrong, but neither right) workaround would be to generate surrogate datasets based on the pdf given by the PSD. I've seen it done e.g. here.

Perhaps I should get in touch with Clauset and ask him for guidance in this.

@smartass101
Copy link
Author

Clauset seems to be on sabbatical. I had another idea, perhaps I could simply use the Kolmogorov-Smirnov test to determine the "distance" between the PSD and a given distribution. Chi^2 might be an alternative. But that would mean determining the fitted parameters an f_k_min at the same, time, not sure if that would be a problem.

@smartass101
Copy link
Author

Mentioning directly @aaronclauset in case you have time (and interest) to comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants