Skip to content

0.4.0

Compare
Choose a tag to compare
@YLGH YLGH released this 15 Mar 20:48

Train pipeline improvements

The train pipeline now allows the user to specify if they want all pipelined batches to be executed after exhausting the dataloader iterator. Normally when StopIteration is raised, the train pipeline will halt with the last 2 pipelined batches yet to be executed.
Core train pipeline logic has been refactored for better readability and maintainability.
The memcpy and data_dist streams have been set to high priority. We’ve seen kernel launches get delayed scheduling even with nothing on the GPU blocking the kernel. This will block the CPU unnecessarily, and we see perf gains after making this change.

FX + Script Inference Module

Sharded Quantized EmbeddingBagCollection and EmbeddingCollection are now torch.fx-able and torchscript-able ( via torch.script(torch.fx(module)) ), and can now be run with torchscript.

RecMetrics

Add include_logloss option to NE metric, to return the log of cross entropy loss on top of NE.
Add grouped AUC metric option. To use, toggle grouped_auc=True when instantiating AUC metric, and provide an additional grouping_keys tensor to specify the group_id for each element along the batch dimension in update method. The grouped AUC will then calculate AUCs per specified group, and return the averaged AUC.

Enable the grouped_auc during metric instantiation
auc = AUCMetric(world_size=4, my_rank=0, batch_size=64, tasks=["t1"], grouped_auc=True)
provide grouping keys during update
auc.update(predictions=predictions, labels=labels, weights=weights, grouping_keys=grouping_keys)

Full Changelog: https://github.com/pytorch/torchrec/commits/v0.4.0