How to reproduce the rec performance on INSPIRED dataset? #7

dayuyang1999 · 2023-08-22T16:19:54Z

Dear Author,

I am trying to reproduce the rec performance on INSPIRED dataset.

I use the hyperparameters you recommend and the "best" model as prompt-encoder. Unfortunately, I was not able to reproduce the performance on the paper.

---- Here I attached the loss and recall@1 on testset for prompt pre-training, conversational training, and recommendation training steps:

prompt pre-training

conversational training

recommendation training (as you can see, the best recall@1 I got is around 0.04, far from 0.09)

---- and here are the configuration I use for prompt pre-training, conversational training, and recommendation training steps:

python3 train_pre.py \
    --dataset inspired \
    --tokenizer microsoft/DialoGPT-small \
    --model microsoft/DialoGPT-small \
    --text_tokenizer roberta-base \
    --text_encoder roberta-base \
    --num_train_epochs 5 \
    --gradient_accumulation_steps 1 \
    --per_device_train_batch_size 64 \
    --per_device_eval_batch_size 128 \
    --num_warmup_steps 168 \
    --max_length 200 \
    --prompt_max_length 200 \
    --entity_max_length 32 \
    --learning_rate 6e-4 \
    --output_dir UniCRS/src/result_promptpretraining_inspired \
    --use_wandb \
    --project crs-prompt-pre-inspired \
    --name exp1 \
    --gpu 0

prompt pre-training

python3 train_conv.py \
    --dataset inspired \
    --tokenizer microsoft/DialoGPT-small \
    --model microsoft/DialoGPT-small \
    --text_tokenizer roberta-base \
    --text_encoder roberta-base \
    --n_prefix_conv 20 \
    --prompt_encoder UniCRS/src/result_promptpretraining_inspired/best/ \
    --num_train_epochs 10 \
    --gradient_accumulation_steps 1 \
    --ignore_pad_token_for_loss \
    --per_device_train_batch_size 8 \
    --per_device_eval_batch_size 16 \
    --num_warmup_steps 976 \
    --context_max_length 200 \
    --resp_max_length 183 \
    --prompt_max_length 200 \
    --entity_max_length 32 \
    --learning_rate 1e-4 \
    --output_dir UniCRS/src/result_convprompt_inspired \
    --use_wandb \
    --project crs-prompt-conv-inspired \
    --name exp1 \
    --gpu 0

conv training

python3 infer_conv.py \
    --dataset inspired \
    --split test \
    --tokenizer microsoft/DialoGPT-small \
    --model microsoft/DialoGPT-small \
    --text_tokenizer roberta-base \
    --text_encoder roberta-base \
    --n_prefix_conv 20 \
    --prompt_encoder UniCRS/src/result_convprompt_inspired/best \
    --per_device_eval_batch_size 64 \
    --context_max_length 200 \
    --resp_max_length 183 \
    --prompt_max_length 200 \
    --entity_max_length 32 \
    --gpu 1

conv infer

python3 train_rec.py \
    --dataset inspired_gen \
    --tokenizer microsoft/DialoGPT-small \
    --model microsoft/DialoGPT-small \
    --text_tokenizer roberta-base \
    --text_encoder roberta-base \
    --n_prefix_rec 10 \
    --prompt_encoder UniCRS/src/result_promptpretraining_inspired/best \
    --num_train_epochs 5 \
    --per_device_train_batch_size 64 \
    --per_device_eval_batch_size 64 \
    --gradient_accumulation_steps 1 \
    --num_warmup_steps 33 \
    --context_max_length 200 \
    --prompt_max_length 200 \
    --entity_max_length 32 \
    --learning_rate 1e-4 \
    --output_dir UniCRS/src/result_rec_inspired \
    --use_wandb \
    --project crs-prompt-rec-inspired \
    --name exp1 \
    --gpu 0

rec training

Thank you!

The text was updated successfully, but these errors were encountered:

wxl1999 · 2023-11-27T12:00:08Z

Sorry for the late reply! The problem comes from the pre-training stage. You should observe a continuous increase in performance and drop in loss since the answer is actually provided in the response.

loss
Recall

invoker-LL · 2023-12-06T16:54:32Z

Sorry for the late reply! The problem comes from the pre-training stage. You should observe a continuous increase in performance and drop in loss since the answer is actually provided in the response.

loss

Recall

Though my pretraining results is as good as above, the recommendation result is still lower too much compared to paper results.

pretraining：

recommendation:

But I can easily reproduce the UniCRS results in CFCRS code, what's their difference?

wxl1999 · 2023-12-07T06:43:10Z

@invoker-LL Hi, your problem is interesting. The metric test/loss is abnormal, since it keeps increasing. You can compare the code with UniCRS for more detail. According to my impression, the main difference is that the length of soft token prompt is set to 0 in CFCRS

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to reproduce the rec performance on INSPIRED dataset? #7

How to reproduce the rec performance on INSPIRED dataset? #7

dayuyang1999 commented Aug 22, 2023 •

edited

Loading

wxl1999 commented Nov 27, 2023

invoker-LL commented Dec 6, 2023

wxl1999 commented Dec 7, 2023

How to reproduce the rec performance on INSPIRED dataset? #7

How to reproduce the rec performance on INSPIRED dataset? #7

Comments

dayuyang1999 commented Aug 22, 2023 • edited Loading

wxl1999 commented Nov 27, 2023

invoker-LL commented Dec 6, 2023

wxl1999 commented Dec 7, 2023

dayuyang1999 commented Aug 22, 2023 •

edited

Loading