Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RMP] Quick Start for Session-Based Recommendation #927

Open
21 tasks
EvenOldridge opened this issue Apr 25, 2023 · 0 comments
Open
21 tasks

[RMP] Quick Start for Session-Based Recommendation #927

EvenOldridge opened this issue Apr 25, 2023 · 0 comments
Assignees
Labels
Milestone

Comments

@EvenOldridge
Copy link
Member

EvenOldridge commented Apr 25, 2023

Problem:

Merlin provides documentation and a number of example notebooks on how to use tools like NVTabular, Dataloader and Merlin Models. In order to build a pipeline for training and evaluation purposes, a Data Scientist needs to analyze that material, copy-and-paste code snippets demonstrating the API and glue that code together to implement scripts for experimentation and benchmarking.
It might also not be clear to the users the advanced API options featured by Merlin Models that can be mapped as a hyperparameter, and potentially improve models accuracy.

Goal:

This RMP provides a Quick-start for building a training pipeline for session-based recommendation.
It addresses the ranking models part of this larger RMP #732, in particular the steps 4-7 of the Data Scientist journey when experimenting with Merlin.

The Quick start for ranking is composed by:

Template scripts

  • Generic template script for preprocessing data for session-based recommendation
  • Generic template script for building and training models for session-based recommendation, exposing the main hyperparameters for ranking models .
    It includes support to sequential models like YouTubeDNN, RNNs and Transformers (backed by HuggingFace library).

Documentation

  • Documentation of the scripts command line arguments
  • Documentation of best practices learned from our experimentation:
    • Hyperparameter tuning: search space, most important hyperparameters and best hparams for REES46 dataset
    • Intuitions of API options (building blocks, arguments) that can improve models accuracy

Constraints:

  • Preprocessing - The pre-processing template notebook will perform some basic feature encoding for categorical (e.g. categorify) and continuous variables (e.g. standardization). It will also group interactions by session, sorted by timestamp
    The customer can expand the template with advanced preprocessing ops demonstrated in our examples.
  • Training - The training and evaluation script for Merlin Models should be totally configurable, taking as input the parquet files and schema, and a number of hyperparameters exposed via command line arguments. The output of this script should be the evaluation metrics, being optinally logged to Weights&Biases and Tensorboard.

Investigations

Starting Point:

Tasks

Dataset choice

Preprocessing script

Modeling script

Experiments

Documentation

Deployment and inference with Triton

Tasks moved from #433 - Tensorflow support for session based recommendations integration in Merlin

Reproducibility of Transformers4Rec results and integration tests (23.01)

Support of advanced sequential tasks and the definition of examples (22.11)

@EvenOldridge EvenOldridge added this to the Merlin 23.05 milestone Apr 25, 2023
@gabrielspmoreira gabrielspmoreira changed the title [RMP] Quick Start for Session Based Models [RMP] Quick Start for Session-Based Recommender models Apr 25, 2023
@gabrielspmoreira gabrielspmoreira changed the title [RMP] Quick Start for Session-Based Recommender models [RMP] Quick Start for Session-Based Recommendation Apr 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants