Confusing in use of `searchsorted` #39

yiyuezhuo · 2017-03-26T02:24:16Z

https://github.com/jeffalstott/powerlaw/blob/master/powerlaw.py#L1890
I think that if you keep original sorted data location, the use of searchsorted is useful.
But below code use unique_indices to take subset of CDF too.
I feel confusing since it's equivalence to call arange(n)/n directly,
and obviously the arange way is faster than searchsorted way, it's proved by below test:

def f1(data, n):
    CDF = searchsorted(data, data,side='left')/n
    unique_data, unique_indices = unique(data, return_index=True)
    data=unique_data
    CDF = CDF[unique_indices]
    return CDF


def f2(data, n):
    unique_data, unique_indices = unique(data, return_index=True)
    return (np.arange(n)/n)[unique_indices]


data = [0,1,1,2,2,2,3,3,3,3]

n = len(data)

%timeit f1(data,n)
The slowest run took 4.35 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 85.3 µs per loop

%timeit f2(data,n)
10000 loops, best of 3: 54 µs per loop

f1(data,n)
Out[47]: array([ 0. ,  0.1,  0.3,  0.6])

f2(data,n)
Out[48]: array([ 0. ,  0.1,  0.3,  0.6])

So I can't understand how the 'clever' claimed in comment hold. Perhaps I miss some corner cases?...

The text was updated successfully, but these errors were encountered:

jeffalstott · 2017-03-26T23:50:42Z

I think you're right! Can you make the change, make sure all the examples look right, then make a pull request and we'll put it in?

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Confusing in use of `searchsorted` #39

Confusing in use of `searchsorted` #39

yiyuezhuo commented Mar 26, 2017

jeffalstott commented Mar 26, 2017

Confusing in use of searchsorted #39

Confusing in use of searchsorted #39

Comments

yiyuezhuo commented Mar 26, 2017

jeffalstott commented Mar 26, 2017

Confusing in use of `searchsorted` #39

Confusing in use of `searchsorted` #39