NKI Compiler sbuf.mod_alloc Unhandled Exception #1036

MBshara · 2024-11-19T03:39:08Z

neuronxcc.nki.compiler.sbuf.mod_alloc encounters unhandled exception when base_addr param is too large, i.e. cannot allocate more physical tiles.

ERROR:
[NLA001] Unhandled exception with message: Allocated memory out of bound {b_tile_sub0}@SB<0,400000>(128x4000)#Internal DebugInfo: <b_tile||UNDEF||[128, 1000, 1]> - Please open a support ticket at https://github.com/aws-neuron/aws-neuron-sdk/issues/new. You may also be able to obtain more information using the 'XLA_IR_DEBUG' and 'XLA_HLO_DEBUG' environment variables.

CODE:

# a_vec, b_vec are 256000 np.float32 ndarrays
@nki.jit
def vector_add_direct_allocation(a_vec, b_vec):
    # Create output tensor in HBM 
    out = nl.ndarray(a_vec.shape, dtype=a_vec.dtype, buffer=nl.shared_hbm)
    
    # Define constants for free dimension, physical tile count, and partition dimension
    FREE_DIM = 1000
    FREE_DIM_TILES = 100
    PARTITION_DIM = 128

    # Get the total number of vector rows
    M = a_vec.shape[0]

    # Define the size of each tile
    TILE_M = PARTITION_DIM * FREE_DIM

    # Reshape the the input vectors
    a_vec = a_vec.reshape((M // TILE_M, PARTITION_DIM, FREE_DIM))
    b_vec = b_vec.reshape((M // TILE_M, PARTITION_DIM, FREE_DIM))
    out = out.reshape((M // TILE_M, PARTITION_DIM, FREE_DIM))

    # Get the total number of tiles
    N_TILES = M // TILE_M

    # Initialize the starting byte offset for a_tensor
    current_offset = 0
    
    # Allocate space for the entire reshaped a_tensor with 4 physical tiles
    a_tile = nl.ndarray((N_TILES, nl.par_dim(PARTITION_DIM), FREE_DIM), dtype=a_vec.dtype,
            buffer=ncc.sbuf.mod_alloc(base_addr=current_offset, num_free_tiles=(FREE_DIM_TILES,)))

    # Increment the starting byte offset for b_tensor based on tile and feature size
    current_offset += FREE_DIM_TILES * FREE_DIM * 4
    
    # Allocate space for the entire reshaped b_tensor with 4 physical tiles
    b_tile = nl.ndarray((N_TILES, nl.par_dim(PARTITION_DIM), FREE_DIM), dtype=b_vec.dtype,
            buffer=ncc.sbuf.mod_alloc(base_addr=current_offset, num_free_tiles=(FREE_DIM_TILES,)))

The text was updated successfully, but these errors were encountered:

aws-serina-tan · 2024-11-19T05:10:52Z

Hi, thanks for the ticket! We can definitely improve the compiler error here.

The root cause of this exception is because SBUF on trn1 only has 192KiB per partition (i.e., 192KiB in the free dimension). Documented in arch guide: https://awsdocs-neuron.readthedocs-hosted.com/en/latest/general/nki/trainium_inferentia2_arch.html. In this code example, we are trying to allocate b_tile at free dim offset 400,000 = 100 * 1000 * 4, which is beyond 192KiB available memory.

JonathanHenson added documentation Improvements or additions to documentation compiler NKI and removed compiler labels Nov 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NKI Compiler sbuf.mod_alloc Unhandled Exception #1036

NKI Compiler sbuf.mod_alloc Unhandled Exception #1036

MBshara commented Nov 19, 2024

aws-serina-tan commented Nov 19, 2024

NKI Compiler sbuf.mod_alloc Unhandled Exception #1036

NKI Compiler sbuf.mod_alloc Unhandled Exception #1036

Comments

MBshara commented Nov 19, 2024

aws-serina-tan commented Nov 19, 2024