[RMP] Quick Start for Session-Based Recommendation #927

EvenOldridge · 2023-04-25T16:54:46Z

Problem:

Merlin provides documentation and a number of example notebooks on how to use tools like NVTabular, Dataloader and Merlin Models. In order to build a pipeline for training and evaluation purposes, a Data Scientist needs to analyze that material, copy-and-paste code snippets demonstrating the API and glue that code together to implement scripts for experimentation and benchmarking.
It might also not be clear to the users the advanced API options featured by Merlin Models that can be mapped as a hyperparameter, and potentially improve models accuracy.

Goal:

This RMP provides a Quick-start for building a training pipeline for session-based recommendation.
It addresses the ranking models part of this larger RMP #732, in particular the steps 4-7 of the Data Scientist journey when experimenting with Merlin.

The Quick start for ranking is composed by:

Template scripts

Generic template script for preprocessing data for session-based recommendation
Generic template script for building and training models for session-based recommendation, exposing the main hyperparameters for ranking models .
It includes support to sequential models like YouTubeDNN, RNNs and Transformers (backed by HuggingFace library).

Documentation

Documentation of the scripts command line arguments
Documentation of best practices learned from our experimentation:
- Hyperparameter tuning: search space, most important hyperparameters and best hparams for REES46 dataset
- Intuitions of API options (building blocks, arguments) that can improve models accuracy

Constraints:

Preprocessing - The pre-processing template notebook will perform some basic feature encoding for categorical (e.g. categorify) and continuous variables (e.g. standardization). It will also group interactions by session, sorted by timestamp
The customer can expand the template with advanced preprocessing ops demonstrated in our examples.
Training - The training and evaluation script for Merlin Models should be totally configurable, taking as input the parquet files and schema, and a number of hyperparameters exposed via command line arguments. The output of this script should be the evaluation metrics, being optinally logged to Weights&Biases and Tensorboard.

Investigations

Starting Point:

Script for Transformers4Rec paper reproducibility using Transformers4Rec library
Script that @sararb has been porting from T4Rec repo to use Merlin Models API

Tasks

Dataset choice

Choose the dataset for the Quick-start for session-based recs #951

Preprocessing script

Modeling script

Experiments

Quick-Start experiments: Run a session-based recommendation HPO for the selected dataset for next-item prediction task #947

Documentation

Deployment and inference with Triton

Create Quick-start script to build and export a Triton ensemble (NVT + Models) using Systems
Create a Quick-start notebook demonstrating how to prepare an inference request to Triton
Create markdown documentation on how to use the quick-start deployment script
Starting point: https://github.com/NVIDIA-Merlin/models/blob/main/examples/usecases/transformers-next-item-prediction.ipynb

Tasks moved from #433 - Tensorflow support for session based recommendations integration in Merlin

Reproducibility of Transformers4Rec results and integration tests (23.01)

Reproduce selected results from Transformers4Rec paper with Merlin Models API models#806 - Reproduce selected results from Transformers4Rec paper to ensure the implementation is correct
Create integration tests for session-based recommendation models#807 - Integration tests based on #806 to ensure to track regressions in the API, accuracy and performance

Support of advanced sequential tasks and the definition of examples (22.11)

[RMP] Support of advanced sequential tasks and the definition of examples #472 - The main objective is to support advanced session-based tasks and create examples of common session-based and sequential-based architectures.

EvenOldridge added this to the Merlin 23.05 milestone Apr 25, 2023

EvenOldridge assigned gabrielspmoreira Apr 25, 2023

EvenOldridge added the roadmap label Apr 25, 2023

gabrielspmoreira changed the title ~~[RMP] Quick Start for Session Based Models~~ [RMP] Quick Start for Session-Based Recommender models Apr 25, 2023

gabrielspmoreira changed the title ~~[RMP] Quick Start for Session-Based Recommender models~~ [RMP] Quick Start for Session-Based Recommendation Apr 25, 2023

EvenOldridge assigned sararb Apr 26, 2023

EvenOldridge modified the milestones: Merlin 23.05, Merlin 23.06 Apr 26, 2023

viswa-nvidia modified the milestones: Merlin 23.05, Merlin 23.06, Merlin 23.07 May 2, 2023

EvenOldridge modified the milestones: Merlin 23.07, Merlin 23.08 Jun 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RMP] Quick Start for Session-Based Recommendation #927

[RMP] Quick Start for Session-Based Recommendation #927

EvenOldridge commented Apr 25, 2023 •

edited by oliverholworthy

Loading

[RMP] Quick Start for Session-Based Recommendation #927

[RMP] Quick Start for Session-Based Recommendation #927

Comments

EvenOldridge commented Apr 25, 2023 • edited by oliverholworthy Loading

Problem:

Goal:

Template scripts

Documentation

Constraints:

Investigations

Starting Point:

Tasks

Dataset choice

Preprocessing script

Modeling script

Experiments

Documentation

Deployment and inference with Triton

Tasks moved from #433 - Tensorflow support for session based recommendations integration in Merlin

Reproducibility of Transformers4Rec results and integration tests (23.01)

Support of advanced sequential tasks and the definition of examples (22.11)

EvenOldridge commented Apr 25, 2023 •

edited by oliverholworthy

Loading