Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

python - Divide certain columns by another column in pandas - Stack Overflow #97

Open
kemistep opened this issue Nov 19, 2020 · 0 comments

Comments

@kemistep
Copy link
Contributor

For performance, I would suggest using the underlying array data and array-slicing as the two columns to be modified come in sequence to use view into it -

a = df.values
df.iloc[:,1:3] = a[:,1:3]/a[:,0,None] 

To eloborate a bit more on the array-slicing part, with a[:,[1,2]] would have forced a copy there and would have slowed it down. a[:,[1,2]] on the dataframe side is equivalent to df[['open','close']] and that I am guessing is slowing things down too. df.iloc[:,1:3] is thus improving upon it.

Sample run -

In [64]: df
Out[64]: 
    prev   open  close  volume
0  20.77  20.87  19.87  962816
1  19.87  19.89  19.56  668076
2  19.56  19.96  20.10  578987
3  20.10  20.40  20.53  418597

In [65]: a = df.values
    ...: df.iloc[:,1:3] = a[:,1:3]/a[:,0,None]
    ...: 

In [66]: df
Out[66]: 
    prev      open     close  volume
0  20.77  1.004815  0.956668  962816
1  19.87  1.001007  0.984399  668076
2  19.56  1.020450  1.027607  578987
3  20.10  1.014925  1.021393  418597 

Runtime test

Approaches -

def numpy_app(df): 
    a = df.values
    df.iloc[:,1:3] = a[:,1:3]/a[:,0,None]
    return df

def pandas_app1(df): 
    df[['open','close']] = df[['open','close']].div(df['prev'].values,axis=0)
    return df 

Timings -

In [44]: data = np.random.randint(15, 25, (100000,4)).astype(float)
    ...: df1 = pd.DataFrame(data, columns=(('prev','open','close','volume')))
    ...: df2 = df1.copy()
    ...: 

In [45]: %timeit pandas_app1(df1)
    ...: %timeit numpy_app(df2)
    ...: 
100 loops, best of 3: 2.68 ms per loop
1000 loops, best of 3: 885 µs per loop 

https://stackoverflow.com/questions/45899613/divide-certain-columns-by-another-column-in-pandas

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant