-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUDA Debug/Performance TODOS #12
Comments
|
@richard-evans according to this: Jxx_vals_h.push_back( -::atoms::v_exchange_list[iid].Jij[0]);
Jyy_vals_h.push_back( -::atoms::v_exchange_list[iid].Jij[1]);
Jzz_vals_h.push_back( -::atoms::v_exchange_list[iid].Jij[2]); I can guess the xx, yy, and zz values of the exchange for the vector case, but, what would be the indices for xy, xz, ...etc for the tensor case? |
IIRC there is separate exchange lists for isotropic, exchange and tensor. So to do the case for xy,xz,.. you probably want something like: I would need to check the src files though. |
@mattoaellis since you're here, did you try CUSP to do the matrix multiplication I'm looking at this at the moment http://cusplibrary.github.io/classcusp_1_1csr__matrix__view.html#a2a5a8f5d37b23c657002ad4c31509688 |
Yeah, I've used CUSP in the past for matrix multiplication. It's pretty neat for setting up and easy calculations. It has both the diagonal format and Ellpack which are both useful for the exchange calculation. |
Hum, it seems to be on the clusters, no extra downloads required, furthermore seems like DIA is the way to go and is very easy to do transformations with this. |
I think that was because we were using it before. Out of interest what version is it? I would certainly recommend it and downloading an extra library is not that hard but I don't think we can count on it being pre-installed on every cluster.
In CUSP it is super easy to set up the matrix on the host in a simple format i.e csr which I think the serial version uses anyway then jsut convert to our chosen format.
|
Jeje, I'm on it, I'll just amalgamate the library now, as we did with CUB, we can clean up the mess before we merge to development 💯 , is it safe to do this: cusp::array1d_view <...> spin_view (some_thrust_device_vector.begin(), some_thrust_device_vector.end()); |
Yeah, the cusp arrays are just slightly expanded thrust vectors so they are all compatible. Annoyingly thrust is (mostly) compatible with cusp arrays but not vice-versa. Why the array view in this case? Since the array goes from begin() to end() why not array1d? |
Ah I think I get it. Is it because the view does not copy the data but jsut references the underlying data? |
Yeah, just to keep the spin arrays intact so that the changes do not conflict with other pieces of the code 😸 |
I've just tested that and it seems to work fine creating a array view from a thrust device vector and passing that to cusp::multiply. |
I'm, integrating it, but seems like including the cusp headers has conflicts with currand, did you run into those?
Those are not there if I omit the cusp headers 👎 |
Seems like this could be the issue, however, we are working with cuda 6.5, and reversing orders here and there does not help it 😢 http://stackoverflow.com/questions/23352122/including-thrust-sort-h-before-curand-kernel-h-gives-compilation-error |
I've never had any problem with curand, cusp and thrust all together. As the post says it is a version issue; I remember hearing some worrying reports about bugs like that. Can you drop back to CUDA 5 to test out? Otherwise it may require moving the order the headers are included in. |
Ah I do remember having some issues with CUSP at one point (not the same as yours) when I moved to a newer version but they were just warnings rather than errors and it ran jsut fine. Check whether the version of CUSP you are including is the latest one. If not you could downlaod the library locally. |
I downloaded the |
I've tested out cuda 5.5 and 6.5 using the latest version of CUSP v0.5.1 and I don't seem to have any problems. Which CUSP and Curand headers are you using exactly? |
@mattoaellis is there a nice way to do the |
@odarbelaeze CuRAND or CuSparse? There is a more generalised multiply routine which allows you to specify details about the operation That is in CUSP v0.5.1 but I don't know if it is v0.4. Mind you for the operation you specify then you can just fold b*B into the diagonals of A. I have done that in the past for computing the anisotropy in addition to the exchange interaction. |
Now it runs, and yields essentially the same results, however, it's considerably slower, although all the data is already in the GPU, CUSP decides that it needs to move stuff to the host memory and back 👎 while performing the field calculation, that is:
|
Yuck - thats not what we wanted. So is the solution to do DIA by hand and then pass this to cusparse? Or can cusp do the conversion once and store that, and then just pass this as a DIA matrix?
Dr. Richard F L Evans www-users.york.ac.uk/~rfle500/ |
That is rather strange. |
That could be possible, since cusp handles the transformations easily, however, seems like the DIA format is not available in CuSPARSE http://docs.nvidia.com/cuda/cusparse/#matrix-formats, we could experiment with the ELL format, or try to set up the arrays as |
@mattoaellis the |
Yep, that has worked in the past for me. |
@mattoaellis @richard-evans Good news, changing the types of the vectors to |
Good news! Lets see how things turn out...
|
I did a real quick curie temperature benchmark, # With CuSPARSE CSR
real 3m21.703s
user 2m16.917s
sys 1m3.840s
# With CUSP DIA
real 2m5.137s
user 1m30.466s
sys 0m34.178s
real 19m29.640s
user 19m25.569s
sys 0m0.468s I'll edit this with the CPU serial results when they're available, furthermore, we are putting a lot of nice flags on the |
Ok nice! so ~ 5x faster for exchange calculation. Yes I deleted the NVCC variable in my PR. So maybe next thing is to reprofile and see where the new hotspots are. R
|
I need better cpus in Colombia 😞
|
cu::grid_size
variable at initialisation (according to the plan)That's one thing, let's see what else comes up and register it here.
The text was updated successfully, but these errors were encountered: