Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to reproduce the rec performance on INSPIRED dataset? #7

Open
dayuyang1999 opened this issue Aug 22, 2023 · 3 comments
Open

How to reproduce the rec performance on INSPIRED dataset? #7

dayuyang1999 opened this issue Aug 22, 2023 · 3 comments

Comments

@dayuyang1999
Copy link

dayuyang1999 commented Aug 22, 2023

Dear Author,

I am trying to reproduce the rec performance on INSPIRED dataset.

image

I use the hyperparameters you recommend and the "best" model as prompt-encoder. Unfortunately, I was not able to reproduce the performance on the paper.

---- Here I attached the loss and recall@1 on testset for prompt pre-training, conversational training, and recommendation training steps:
image
image

prompt pre-training

image
conversational training

image
image
recommendation training (as you can see, the best recall@1 I got is around 0.04, far from 0.09)

---- and here are the configuration I use for prompt pre-training, conversational training, and recommendation training steps:

python3 train_pre.py \
    --dataset inspired \
    --tokenizer microsoft/DialoGPT-small \
    --model microsoft/DialoGPT-small \
    --text_tokenizer roberta-base \
    --text_encoder roberta-base \
    --num_train_epochs 5 \
    --gradient_accumulation_steps 1 \
    --per_device_train_batch_size 64 \
    --per_device_eval_batch_size 128 \
    --num_warmup_steps 168 \
    --max_length 200 \
    --prompt_max_length 200 \
    --entity_max_length 32 \
    --learning_rate 6e-4 \
    --output_dir UniCRS/src/result_promptpretraining_inspired \
    --use_wandb \
    --project crs-prompt-pre-inspired \
    --name exp1 \
    --gpu 0 

prompt pre-training

python3 train_conv.py \
    --dataset inspired \
    --tokenizer microsoft/DialoGPT-small \
    --model microsoft/DialoGPT-small \
    --text_tokenizer roberta-base \
    --text_encoder roberta-base \
    --n_prefix_conv 20 \
    --prompt_encoder UniCRS/src/result_promptpretraining_inspired/best/ \
    --num_train_epochs 10 \
    --gradient_accumulation_steps 1 \
    --ignore_pad_token_for_loss \
    --per_device_train_batch_size 8 \
    --per_device_eval_batch_size 16 \
    --num_warmup_steps 976 \
    --context_max_length 200 \
    --resp_max_length 183 \
    --prompt_max_length 200 \
    --entity_max_length 32 \
    --learning_rate 1e-4 \
    --output_dir UniCRS/src/result_convprompt_inspired \
    --use_wandb \
    --project crs-prompt-conv-inspired \
    --name exp1 \
    --gpu 0

conv training

python3 infer_conv.py \
    --dataset inspired \
    --split test \
    --tokenizer microsoft/DialoGPT-small \
    --model microsoft/DialoGPT-small \
    --text_tokenizer roberta-base \
    --text_encoder roberta-base \
    --n_prefix_conv 20 \
    --prompt_encoder UniCRS/src/result_convprompt_inspired/best \
    --per_device_eval_batch_size 64 \
    --context_max_length 200 \
    --resp_max_length 183 \
    --prompt_max_length 200 \
    --entity_max_length 32 \
    --gpu 1

conv infer

python3 train_rec.py \
    --dataset inspired_gen \
    --tokenizer microsoft/DialoGPT-small \
    --model microsoft/DialoGPT-small \
    --text_tokenizer roberta-base \
    --text_encoder roberta-base \
    --n_prefix_rec 10 \
    --prompt_encoder UniCRS/src/result_promptpretraining_inspired/best \
    --num_train_epochs 5 \
    --per_device_train_batch_size 64 \
    --per_device_eval_batch_size 64 \
    --gradient_accumulation_steps 1 \
    --num_warmup_steps 33 \
    --context_max_length 200 \
    --prompt_max_length 200 \
    --entity_max_length 32 \
    --learning_rate 1e-4 \
    --output_dir UniCRS/src/result_rec_inspired \
    --use_wandb \
    --project crs-prompt-rec-inspired \
    --name exp1 \
    --gpu 0

rec training

Thank you!

@wxl1999
Copy link
Owner

wxl1999 commented Nov 27, 2023

Sorry for the late reply! The problem comes from the pre-training stage. You should observe a continuous increase in performance and drop in loss since the answer is actually provided in the response.

  • loss
    image

  • Recall
    image

@invoker-LL
Copy link

Sorry for the late reply! The problem comes from the pre-training stage. You should observe a continuous increase in performance and drop in loss since the answer is actually provided in the response.

  • loss
    image
  • Recall
    image

Though my pretraining results is as good as above, the recommendation result is still lower too much compared to paper results.

pretraining:
image

recommendation:
image

But I can easily reproduce the UniCRS results in CFCRS code, what's their difference?

@wxl1999
Copy link
Owner

wxl1999 commented Dec 7, 2023

@invoker-LL Hi, your problem is interesting. The metric test/loss is abnormal, since it keeps increasing. You can compare the code with UniCRS for more detail. According to my impression, the main difference is that the length of soft token prompt is set to 0 in CFCRS

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants