You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi team, we are currently testing migrating our MQTransformer model from nvidia to trainium instance, and we are ancountering error Internal tensorizer error: BirCodeGenLoop during the training process. Could you help look into what happened?
2024-12-27 22:53:48.000781: 1268479 ERROR ||NEURON_CC_WRAPPER||: Failed compilation with ['neuronx-cc', 'compile', '--framework=XLA', '/tmp/ubuntu/neuroncc_compile_workdir/398c613e-505d-4256-8f66-7c1634efa75b/model.MODULE_796301862831790987+e30acd3a.hlo_module.pb', '--output', '/tmp/ubuntu/neuroncc_compile_workdir/398c613e-505d-4256-8f66-7c1634efa75b/model.MODULE_796301862831790987+e30acd3a.neff', '--target=trn1', '--verbose=35']: 2024-12-27T22:53:48Z [TEN404] (aten__scatter_scatter.134) Internal tensorizer error: BirCodeGenLoop: - Please open a support ticket at https://github.com/aws-neuron/aws-neuron-sdk/issues/new. You may also be able to obtain more information using the 'XLA_IR_DEBUG' and 'XLA_HLO_DEBUG' environment variables.
We use trn1.32xlarge and NeuronXLAStrategy /
The text was updated successfully, but these errors were encountered:
Thank you for reaching out! This could be a variety of issues (appears related to scatter) but it is difficult to tell exactly what is happening from the error message alone. Could you provide a minimal example which reproduces the error? This is important since it allows us to ensure that we are producing the exact same behavior you see without making assumptions about the code & model usage
Hi team, we are currently testing migrating our MQTransformer model from nvidia to trainium instance, and we are ancountering error
Internal tensorizer error: BirCodeGenLoop
during the training process. Could you help look into what happened?We use trn1.32xlarge and NeuronXLAStrategy /
The text was updated successfully, but these errors were encountered: