You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Part of the problem seems to be you're trying to write a serial algorithm on the GPU, your outer loop has ti.loop_config(serialize=True). For it to be fast on GPU your outer loop needs to be the parallel one.
Other things which can help (though I don't think are relevant here) is to use data which resides on GPU to avoid transfer, e.g. torch tensor on the device can avoid copying.
Hi just to add to @oliver-batchelor 's comment, the benefits of the GPU are only really felt with large data sizes. The CPU may actually be faster for small amounts of data because it has a faster clock. Basically the GPU is faster at doing massive amounts of small tasks in parallel, but for each individual task it will likely not be any faster than the CPU (and may be slower)
Try increasing your data size by a lot and see how the performance changes :)
import taichi as ti
import numpy as np
ti.init(arch=ti.gpu)
benchmark = True
N = 15000
if benchmark:
a_numpy = np.random.randint(0,100,N,dtype=np.int32)
b_numpy = np.random.randint(0,100,N,dtype=np.int32)
else:
a_numpy = np.array([0,1,0,2,4,3,1,2,1],dtype=np.int32)
b_numpy = np.array([4,0,1,4,5,3,1,2],dtype=np.int32)
f = ti.field(dtype=ti.i32,shape=(N+1,N+1))
@ti.kernel
def compute_lcs(a: ti.types.ndarray(),b: ti.types.ndarray()) -> ti.i32:
len_a,len_b = a.shape[0],b.shape[0]
ti.loop_config(serialize=True)
for i in range(1,len_a + 1):
for j in range(1,len_b + 1):
f[i,j] = ti.max(f[i-1,j-1] + (a[i-1] == b[j-1]),ti.max(f[i-1,j],f[i,j -1]))
return f[len_a,len_b]
print(compute_lcs(a_numpy,b_numpy))
![image](https://private-user-images.githubusercontent.com/7225969/366388448-17764816-c358-4699-b5fa-d86d6a9bcba6.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk4ODU2MTgsIm5iZiI6MTczOTg4NTMxOCwicGF0aCI6Ii83MjI1OTY5LzM2NjM4ODQ0OC0xNzc2NDgxNi1jMzU4LTQ2OTktYjVmYS1kODZkNmE5YmNiYTYucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI1MDIxOCUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNTAyMThUMTMyODM4WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9OWM4YjA4NjZhN2Y3ZGUyOTk2ZWJiZjBlMzBhN2RjNmZkMTUzYzczYTczNTQwZDA2MTI1YzkzMGJlODFjMDY2YSZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QifQ.GrcQCm2fvxv2xV07omivUVRnMDn0QmkuxHY1apYgTPw)
The following is not to start the cpu
<img width="528" alt="image" src="https://github.com/user-attachments/assets/03d62b88-11b2-4a14-a4b3-
The text was updated successfully, but these errors were encountered: