GPU_COMPILE_NOTES.txt

# Compile instructions
The entire project can be built using CMake. There is also an INSTALL target
that will copy the public headers and all executables and libraries according
to the CMake variable CMAKE_INSTALL_PREFIX. Note that this does not copy
the required DLLs, which must be done manually.

## GPU
The cache variable CUDA_NVCC_FLAGS is used to define NVCC compilation flags.
By default,it is set up for JIT (just-in-time) compilation for compute_30 
(Kepler) and compute_50 (Maxwell). Currently, Kepler is the lowest supported
architecture because of usage of grids with large x-dimensions. (If necessary,
it should be easy to modify the code for older hardware).

It is also possible to change this to compile for a specific combination of
virtual and real architecture, e.g. for K4000M (Kepler, sm_30):
    CUDA_NVCC_FLAGS=-gencode=arch=compute_30,code=sm_30

NOTE: The compute_XX are virtual architectures, while sm_XX are real arch.
When compiling for a specific architecture, it is also possible to add the
flag --resource-usage to show details about hardware resources (this is not
possible when using JIT).

NOTE: The flags in the CMake cache variable CUDA_NVCC_FLAGS must be delimited
with semicolons.

The flag -use_fast_math is also added by default.