Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confusing in use of searchsorted #39

Open
yiyuezhuo opened this issue Mar 26, 2017 · 1 comment
Open

Confusing in use of searchsorted #39

yiyuezhuo opened this issue Mar 26, 2017 · 1 comment

Comments

@yiyuezhuo
Copy link

https://github.com/jeffalstott/powerlaw/blob/master/powerlaw.py#L1890
I think that if you keep original sorted data location, the use of searchsorted is useful.
But below code use unique_indices to take subset of CDF too.
I feel confusing since it's equivalence to call arange(n)/n directly,
and obviously the arange way is faster than searchsorted way, it's proved by below test:

def f1(data, n):
    CDF = searchsorted(data, data,side='left')/n
    unique_data, unique_indices = unique(data, return_index=True)
    data=unique_data
    CDF = CDF[unique_indices]
    return CDF


def f2(data, n):
    unique_data, unique_indices = unique(data, return_index=True)
    return (np.arange(n)/n)[unique_indices]


data = [0,1,1,2,2,2,3,3,3,3]

n = len(data)

%timeit f1(data,n)
The slowest run took 4.35 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 85.3 µs per loop

%timeit f2(data,n)
10000 loops, best of 3: 54 µs per loop

f1(data,n)
Out[47]: array([ 0. ,  0.1,  0.3,  0.6])

f2(data,n)
Out[48]: array([ 0. ,  0.1,  0.3,  0.6])

So I can't understand how the 'clever' claimed in comment hold. Perhaps I miss some corner cases?...

@jeffalstott
Copy link
Owner

I think you're right! Can you make the change, make sure all the examples look right, then make a pull request and we'll put it in?

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants