Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NKI Compiler sbuf.mod_alloc Unhandled Exception #1036

Open
MBshara opened this issue Nov 19, 2024 · 1 comment
Open

NKI Compiler sbuf.mod_alloc Unhandled Exception #1036

MBshara opened this issue Nov 19, 2024 · 1 comment
Labels
documentation Improvements or additions to documentation NKI

Comments

@MBshara
Copy link

MBshara commented Nov 19, 2024

neuronxcc.nki.compiler.sbuf.mod_alloc encounters unhandled exception when base_addr param is too large, i.e. cannot allocate more physical tiles.

ERROR:
[NLA001] Unhandled exception with message: Allocated memory out of bound {b_tile_sub0}@SB<0,400000>(128x4000)#Internal DebugInfo: <b_tile||UNDEF||[128, 1000, 1]> - Please open a support ticket at https://github.com/aws-neuron/aws-neuron-sdk/issues/new. You may also be able to obtain more information using the 'XLA_IR_DEBUG' and 'XLA_HLO_DEBUG' environment variables.

CODE:

# a_vec, b_vec are 256000 np.float32 ndarrays
@nki.jit
def vector_add_direct_allocation(a_vec, b_vec):
    # Create output tensor in HBM 
    out = nl.ndarray(a_vec.shape, dtype=a_vec.dtype, buffer=nl.shared_hbm)
    
    # Define constants for free dimension, physical tile count, and partition dimension
    FREE_DIM = 1000
    FREE_DIM_TILES = 100
    PARTITION_DIM = 128

    # Get the total number of vector rows
    M = a_vec.shape[0]

    # Define the size of each tile
    TILE_M = PARTITION_DIM * FREE_DIM

    # Reshape the the input vectors
    a_vec = a_vec.reshape((M // TILE_M, PARTITION_DIM, FREE_DIM))
    b_vec = b_vec.reshape((M // TILE_M, PARTITION_DIM, FREE_DIM))
    out = out.reshape((M // TILE_M, PARTITION_DIM, FREE_DIM))

    # Get the total number of tiles
    N_TILES = M // TILE_M

    # Initialize the starting byte offset for a_tensor
    current_offset = 0
    
    # Allocate space for the entire reshaped a_tensor with 4 physical tiles
    a_tile = nl.ndarray((N_TILES, nl.par_dim(PARTITION_DIM), FREE_DIM), dtype=a_vec.dtype,
            buffer=ncc.sbuf.mod_alloc(base_addr=current_offset, num_free_tiles=(FREE_DIM_TILES,)))

    # Increment the starting byte offset for b_tensor based on tile and feature size
    current_offset += FREE_DIM_TILES * FREE_DIM * 4
    
    # Allocate space for the entire reshaped b_tensor with 4 physical tiles
    b_tile = nl.ndarray((N_TILES, nl.par_dim(PARTITION_DIM), FREE_DIM), dtype=b_vec.dtype,
            buffer=ncc.sbuf.mod_alloc(base_addr=current_offset, num_free_tiles=(FREE_DIM_TILES,)))
@aws-serina-tan
Copy link

Hi, thanks for the ticket! We can definitely improve the compiler error here.

The root cause of this exception is because SBUF on trn1 only has 192KiB per partition (i.e., 192KiB in the free dimension). Documented in arch guide: https://awsdocs-neuron.readthedocs-hosted.com/en/latest/general/nki/trainium_inferentia2_arch.html. In this code example, we are trying to allocate b_tile at free dim offset 400,000 = 100 * 1000 * 4, which is beyond 192KiB available memory.

@JonathanHenson JonathanHenson added documentation Improvements or additions to documentation compiler NKI and removed compiler labels Nov 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation NKI
Projects
None yet
Development

No branches or pull requests

3 participants