You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
https://github.com/jeffalstott/powerlaw/blob/master/powerlaw.py#L1890
I think that if you keep original sorted data location, the use of searchsorted is useful.
But below code use unique_indices to take subset of CDF too.
I feel confusing since it's equivalence to call arange(n)/n directly,
and obviously the arange way is faster than searchsorted way, it's proved by below test:
def f1(data, n):
CDF = searchsorted(data, data,side='left')/n
unique_data, unique_indices = unique(data, return_index=True)
data=unique_data
CDF = CDF[unique_indices]
return CDF
def f2(data, n):
unique_data, unique_indices = unique(data, return_index=True)
return (np.arange(n)/n)[unique_indices]
data = [0,1,1,2,2,2,3,3,3,3]
n = len(data)
%timeit f1(data,n)
The slowest run took 4.35 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 85.3 µs per loop
%timeit f2(data,n)
10000 loops, best of 3: 54 µs per loop
f1(data,n)
Out[47]: array([ 0. , 0.1, 0.3, 0.6])
f2(data,n)
Out[48]: array([ 0. , 0.1, 0.3, 0.6])
So I can't understand how the 'clever' claimed in comment hold. Perhaps I miss some corner cases?...
The text was updated successfully, but these errors were encountered:
https://github.com/jeffalstott/powerlaw/blob/master/powerlaw.py#L1890
I think that if you keep original sorted data location, the use of
searchsorted
is useful.But below code use
unique_indices
to take subset of CDF too.I feel confusing since it's equivalence to call
arange(n)/n
directly,and obviously the
arange
way is faster thansearchsorted
way, it's proved by below test:So I can't understand how the 'clever' claimed in comment hold. Perhaps I miss some corner cases?...
The text was updated successfully, but these errors were encountered: