[AD] Fix missing masks on symbolic calls inside loops #338

njroussel · 2025-02-06T23:34:56Z

When running AD through evaluated-mode loops, any dynamic dispatch will not have the loop's mask applied.

This PR fixes this issue by saving the loop's mask (top of the mask stack) at the time of tracing the dynamic dispatch into the CallOp custom operation. This allows the custom operation to then apply that saved masked whenever it gets traversed.

Without this patch, the issue would typically manifest itself as a CUDA error or segfault during kernel execution. Typically, lanes that should have been masked were accessing out-of-bounds memory.

A regression test was added in test_switch.py::test18_apply_loop_mask.

When the `CallOp` node is traversed in forward/reverse-mode it is possible that the mask stack no longer matches what it was during the initial tracing. The `CallOp` therfore stores an extra reference to the mask at the top of the mask and applies it when it is traversed

njroussel added 2 commits February 6, 2025 17:57

Test implicit loop masks on calls

ac48ab9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AD] Fix missing masks on symbolic calls inside loops #338

[AD] Fix missing masks on symbolic calls inside loops #338

njroussel commented Feb 6, 2025

[AD] Fix missing masks on symbolic calls inside loops #338

Are you sure you want to change the base?

[AD] Fix missing masks on symbolic calls inside loops #338

Conversation

njroussel commented Feb 6, 2025