CUDA Patterns GEMM Transpose Convolution Histogram Reduction Todo Add OpenMP CPU version for comparison. CUB CUTLASS