Skip to content
Ocelot edited this page Jan 13, 2024 · 2 revisions

Veil includes a full OpenCL runtime pipeline for running general compute on different devices. Unlike Compute Shaders, OpenCL is not specific to the GPU and allows faster, more feature-rich compute programs.

Setup

The API allows fairly low-level access to OpenCL, but there are some general conventions that should be followed. Most methods throw a CLException to allow the user to define how errors should be handled. This is a specific design choice that makes development an order of magnitude easier.

  • Each mod should generally request a single static OpenCL environment. This isn't required, but it prevents unexpectedly running code on multiple devices.
  • The environment should never be freed by the mod. Veil will free the environment when needed.
  • Most calls should be wrapped in a try/catch for CL errors

Environment

The environment provides access to all other functionality. VeilOpenCL#getEnvironment() provides multiple methods to request environments with specific properties. The default method is fine for most use cases.

NOTE: Some computers may not have support for any OpenCL devices. In that case the environment will return null.

Some devices may also return null if requesting specific features that aren't supported. In that case a more generalized environment should be requested.

Example

import foundry.veil.opencl.CLEnvironment;
import foundry.veil.opencl.VeilOpenCL;

public class ModClass {

    public static final CLEnvironment ENVIRONMENT = VeilOpenCL.get().getEnvironment();
}

Using the Environment

Veil does not have any built-in API for loading CL programs from a resource pack. This makes it much easier to run the same program on the client and server. There is nothing stopping the user from implementing their own version though.

CLEnvironment#loadProgram must be called before attempting to create a kernel or an exception will be raised then CLEnvironment#createKernel can be called. These two lines are the most error-prone and exceptions must be properly handled.

Kernels should be freed when they aren't needed anymore. Once all kernels for a specific program have been freed, then the program will be freed. Programs can be explicitly freed with CLEnvironment#freeProgram.

NOTE: Loading another program under the same name will free the old program and all associated kernels

Example

import foundry.veil.opencl.CLEnvironment;
import foundry.veil.opencl.CLKernel;
import net.minecraft.resources.ResourceLocation;

public class ModCL {

    public static void doThing() {
        CLEnvironment environment = ModClass.ENVIRONMENT;
        environment.loadProgram(new ResourceLocation("modid", "coolprogram"), """
                void kernel cool_kernel(global const int* A, global const int* B, global const int* C, global int* D) {
                    int i = get_global_id(0);
                    D[i] = A[i] + B[i] + C[i];
                }
                                
                void kernel cooler_kernel(global const int* A, global const int* B) {
                    int i = get_global_id(0);
                    B[i] = sqrt(A[i]);
                }
                """);

        CLKernel kernel1 = environment.createKernel(new ResourceLocation("modid", "coolprogram"), "cool_kernel");
        // Multiple kernels can be created from the same program
        CLKernel kernel2 = environment.createKernel(new ResourceLocation("modid", "coolprogram"), "cooler_kernel");

        // do stuff

        // These should be freed when they aren't needed anymore. A try with resources is the best way to create a single-use kernel
        kernel1.free();
        kernel2.free();
    }
}

Kernels

Kernels allow code to be executed from a program. They allow the user to upload and read data to/from the CL device using CL buffers and loaded parameters. Single primitives can be loaded into parameters slots with the methods in CLKernel.

To load more than one value into a parameter slot, a CLBuffer must be used. These function as arbitrary blocks of memory on the CL device that can be read and written to.

Example

import foundry.veil.opencl.CLEnvironment;
import foundry.veil.opencl.CLException;
import foundry.veil.opencl.CLKernel;
import foundry.veil.opencl.CLBuffer;
import foundry.veil.opencl.VeilOpenCL;
import org.lwjgl.system.MemoryStack;

import java.nio.IntBuffer;

import static org.lwjgl.opencl.CL10.*;

public class Test {

    public static void run() throws CLException {
        CLEnvironment environment = ModClass.ENVIRONMENT;
        environment.loadProgram(new ResourceLocation("modid", "testprogram"), """
                void kernel example_kernel(global const int EpicParameter, global const int* A, global const int* B, global const int* C, global int* D) {
                    int i = get_global_id(0);
                    D[i] = A[i] + B[i] + C[i];
                }
                """);

        try (CLKernel kernel = environment.createKernel(new ResourceLocation("modid", "coolprogram"), "cool_kernel")) {
            // These can either be freed manually, or automatically when the kernel is freed
            CLBuffer bufferA = kernel.createBuffer(CL_MEM_READ_ONLY, Integer.BYTES * 4); // size is in bytes
            CLBuffer bufferB = kernel.createBuffer(CL_MEM_READ_ONLY, Integer.BYTES * 4);
            CLBuffer bufferC = kernel.createBuffer(CL_MEM_READ_ONLY, Integer.BYTES * 4);
            CLBuffer bufferD = kernel.createBuffer(CL_MEM_WRITE_ONLY, Integer.BYTES * 4);

            try (MemoryStack stack = MemoryStack.stackPush()) {
                IntBuffer dataA = stack.ints(1, 2, 3, 4);
                IntBuffer dataB = stack.ints(5, 6, 7, 8);
                IntBuffer dataC = stack.ints(9, 10, 11, 12);
                IntBuffer dataD = stack.mallocInt(4); // This is the output, so we don't fill with data

                // The async methods are better, but the synchronous call can be used to wait until complete
                bufferA.writeAsync(0, dataA, null);
                bufferB.writeAsync(0, dataB, null);
                bufferC.writeAsync(0, dataC, null);

                // Set the parameter pointers
                kernel.setInt(0, 4);
                kernel.setPointers(1, dataA);
                kernel.setPointers(2, dataB);
                kernel.setPointers(3, dataC);
                kernel.setPointers(4, dataD);

                // Run the actual program
                kernel.execute(4, 1);

                // This is not required, but will block until all commands have been processed
                environment.finish();

                // This will block until the data is present
                bufferD.read(0, dataD);

                System.out.printf("%d, %d, %d, %d%n", dataD.get(0), dataD.get(1), dataD.get(2), dataD.get(3));
            }
        }
    }
}

Listening to Events

OpenCL provides a way of subscribing to events fired by the compute device. CLEnvironment#getEventDispatcher() returns the event dispatcher for the environment and can subscribe to any returned CL event. In most cases this functionality can be ignored, but is exposed to allow full programmer freedom.

Example

import foundry.veil.opencl.CLEnvironment;
import foundry.veil.opencl.CLException;
import foundry.veil.opencl.VeilOpenCL;
import org.jetbrains.annotations.Nullable;
import org.lwjgl.PointerBuffer;
import org.lwjgl.system.MemoryStack;

import java.nio.IntBuffer;

import static org.lwjgl.opencl.CL10.CL_COMPLETE;
import static org.lwjgl.opencl.CL10.clEnqueueWriteBuffer;

// Code from CLKernel.java
public void writeAsync(long offset, IntBuffer data, @Nullable Runnable onComplete) throws CLException {
    try (MemoryStack stack = MemoryStack.stackPush()) {
        PointerBuffer event = onComplete != null ? stack.mallocPointer(1) : null;
        VeilOpenCL.checkCLError(clEnqueueWriteBuffer(this.environment.getCommandQueue(), this.pointer, false, offset, data, null, event));
        if (event != null) {
            // This subscribes to the event, and fires the specified runnable callback
            this.environment.getEventDispatcher().listen(event.get(0), CL_COMPLETE, onComplete);
        }
    }
}
Clone this wiki locally