Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[🐛BUG] RuntimeError: shape mismatch with DiffRec and LDiffRec #1981

Closed
lukas-wegmeth opened this issue Jan 25, 2024 · 2 comments
Closed
Assignees
Labels
bug Something isn't working

Comments

@lukas-wegmeth
Copy link

lukas-wegmeth commented Jan 25, 2024

Describe the bug
Training MovieLens-100K on algorithms DiffRec and LDiffRec crashes with exception "RuntimeError: shape mismatch: value tensor of shape [4040, 4040] cannot be broadcast to indexing result of shape [4040]".

CUDA available: True
command line args [--data_set_name MovieLens-100K --model_name LDiffRec] will not be used in RecBole
24 Jan 15:52    INFO  
General Hyper Parameters:
gpu_id = 0
use_gpu = True
seed = 42
state = INFO
reproducibility = True
data_path = ./data_sets/MovieLens-100K
checkpoint_dir = ./data_sets/MovieLens-100K/recbole_checkpoints/
show_progress = True
save_dataset = False
dataset_save_path = None
save_dataloaders = False
dataloaders_save_path = None
log_wandb = False

Training Hyper Parameters:
epochs = 50
train_batch_size = 2048
learner = adam
learning_rate = 0.001
train_neg_sample_args = {'distribution': 'uniform', 'sample_num': 1, 'alpha': 1.0, 'dynamic': False, 'candidate_num': 0}
eval_step = 5
stopping_step = 10
clip_grad_norm = None
weight_decay = 0.0
loss_decimal_place = 4

Evaluation Hyper Parameters:
eval_args = {'split': {'LS': 'valid_and_test'}, 'order': 'RO', 'group_by': 'user', 'mode': {'valid': 'uni100', 'test': 'uni100'}}
repeatable = False
metrics = ['Recall', 'MRR', 'NDCG', 'Hit', 'MAP', 'Precision', 'GAUC', 'ItemCoverage', 'AveragePopularity', 'GiniIndex', 'ShannonEntropy', 'TailPercentage']
topk = [1, 3, 5, 10, 20]
valid_metric = NDCG@10
valid_metric_bigger = True
eval_batch_size = 4096
metric_decimal_place = 4

Dataset Hyper Parameters:
field_separator = 	
seq_separator =  
USER_ID_FIELD = user_id
ITEM_ID_FIELD = item_id
RATING_FIELD = rating
TIME_FIELD = timestamp
seq_len = {}
LABEL_FIELD = label
threshold = None
NEG_PREFIX = neg_
load_col = {'inter': ['user_id', 'item_id', 'rating']}
unload_col = {}
unused_col = {}
additional_feat_suffix = []
rm_dup_inter = None
val_interval = {}
filter_inter_by_user_or_item = True
user_inter_num_interval = [0, inf)
item_inter_num_interval = [0, inf)
alias_of_user_id = None
alias_of_item_id = None
alias_of_entity_id = None
alias_of_relation_id = None
preload_weight = {}
normalize_field = []
normalize_all = False
ITEM_LIST_LENGTH_FIELD = item_length
LIST_SUFFIX = _list
MAX_ITEM_LIST_LENGTH = 50
POSITION_FIELD = position_id
HEAD_ENTITY_ID_FIELD = head_id
TAIL_ENTITY_ID_FIELD = tail_id
RELATION_ID_FIELD = relation_id
ENTITY_ID_FIELD = entity_id
benchmark_filename = None

Other Hyper Parameters: 
worker = 0
wandb_project = recbole
shuffle = True
require_pow = False
enable_amp = False
enable_scaler = False
transform = None
n_cate = 1
reparam = True
in_dims = [300]
out_dims = []
ae_act_func = tanh
lamda = 0.03
anneal_cap = 0.005
anneal_steps = 1000
vae_anneal_cap = 0.3
vae_anneal_steps = 200
noise_schedule = linear
noise_scale = 0.1
noise_min = 0.001
noise_max = 0.005
sampling_noise = False
sampling_steps = 0
reweight = True
mean_type = x0
steps = 5
history_num_per_term = 10
beta_fixed = True
dims_dnn = [300]
embedding_size = 10
mlp_act_func = tanh
time-aware = False
w_max = 1
w_min = 0.1
numerical_features = []
discretization = None
kg_reverse_r = False
entity_kg_num_interval = [0, inf)
relation_kg_num_interval = [0, inf)
MODEL_TYPE = ModelType.GENERAL
encoding = utf-8
training_neg_sample_args = {'distribution': 'uniform', 'sample_num': 1, 'dynamic': False, 'candidate_num': 0}
MODEL_INPUT_TYPE = InputType.LISTWISE
eval_type = EvaluatorType.RANKING
single_spec = True
local_rank = 0
device = cuda
valid_neg_sample_args = {'distribution': 'uniform', 'sample_num': 100}
test_neg_sample_args = {'distribution': 'uniform', 'sample_num': 100}


24 Jan 15:52    INFO  MovieLens-100K
The number of users: 944
Average actions of users: 106.04453870625663
The number of items: 1683
Average actions of items: 59.45303210463734
The number of inters: 100000
The sparsity of the dataset: 93.70575143257098%
Remain Fields: ['user_id', 'item_id', 'rating']
24 Jan 15:52    INFO  [Training]: train_batch_size = [2048] train_neg_sample_args: [{'distribution': 'uniform', 'sample_num': 1, 'alpha': 1.0, 'dynamic': False, 'candidate_num': 0}]
24 Jan 15:52    INFO  [Evaluation]: eval_batch_size = [4096] eval_args: [{'split': {'LS': 'valid_and_test'}, 'order': 'RO', 'group_by': 'user', 'mode': {'valid': 'uni100', 'test': 'uni100'}}]
24 Jan 15:52    WARNING  Max value of users history interaction records has reached 43.672014260249554% of the total.
24 Jan 15:52    INFO  LDiffRec(
  (mlp): DNN(
    (emb_layer): Linear(in_features=10, out_features=10, bias=True)
    (mlp_layers): MLPLayers(
      (mlp_layers): Sequential(
        (0): Dropout(p=0, inplace=False)
        (1): Linear(in_features=310, out_features=300, bias=True)
        (2): Tanh()
        (3): Dropout(p=0, inplace=False)
        (4): Linear(in_features=300, out_features=300, bias=True)
      )
    )
    (drop): Dropout(p=0.5, inplace=False)
  )
  (autoencoder): AutoEncoder(
    (dropout): Dropout(p=0.1, inplace=False)
    (encoder): MLPLayers(
      (mlp_layers): Sequential(
        (0): Dropout(p=0.0, inplace=False)
        (1): Linear(in_features=1683, out_features=600, bias=True)
        (2): Tanh()
      )
    )
    (decoder): MLPLayers(
      (mlp_layers): Sequential(
        (0): Dropout(p=0.0, inplace=False)
        (1): Linear(in_features=300, out_features=1683, bias=True)
      )
    )
  )
)
Trainable parameters: 1700693
24 Jan 15:52    INFO  epoch 0 training [time: 2.65s, train loss: 1853.4353]
24 Jan 15:52    INFO  epoch 1 training [time: 0.18s, train loss: 1684.0792]
24 Jan 15:52    INFO  epoch 2 training [time: 0.14s, train loss: 1610.4366]
24 Jan 15:52    INFO  epoch 3 training [time: 0.13s, train loss: 1545.5997]
24 Jan 15:52    INFO  epoch 4 training [time: 0.14s, train loss: 1487.6795]
Traceback (most recent call last):
  File "/mnt/./run_recbole_test.py", line 158, in <module>
    best_valid_score, best_valid_result = trainer.fit(train_data, valid_data)
  File "/usr/local/lib/python3.10/site-packages/recbole/trainer/trainer.py", line 464, in fit
    valid_score, valid_result = self._valid_epoch(
  File "/usr/local/lib/python3.10/site-packages/recbole/trainer/trainer.py", line 283, in _valid_epoch
    valid_result = self.evaluate(
  File "/usr/local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/recbole/trainer/trainer.py", line 616, in evaluate
    interaction, scores, positive_u, positive_i = eval_func(batched_data)
  File "/usr/local/lib/python3.10/site-packages/recbole/trainer/trainer.py", line 558, in _neg_sample_batch_eval
    scores[row_idx, col_idx] = origin_scores
RuntimeError: shape mismatch: value tensor of shape [4040, 4040] cannot be broadcast to indexing result of shape [4040]

To Reproduce
Steps to reproduce the behavior:

import argparse
from logging import getLogger
from recbole.config import Config
from recbole.data import create_dataset, data_preparation
from recbole.utils import ModelType, get_model, get_trainer, init_seed, init_logger
import torch

parser = argparse.ArgumentParser("Evaluate RecBole")
parser.add_argument('--data_set_name', dest='data_set_name', type=str, required=True)
parser.add_argument('--model_name', dest='model_name', type=str, required=True)
args = parser.parse_args()
print(f"CUDA available: {torch.cuda.is_available()}")
config_dict = {
    # environment settings
    "gpu_id": 0,  # default: 0
    "worker": 0,  # default: 0
    "seed": 42,  # default: "2020"
    "state": "INFO",  # default: "INFO"
    "encoding": "utf-8",  # default: "utf-8"
    "reproducibility": True,  # default: True
    "data_path": "./data_sets/",  # default: "dataset/"
    "checkpoint_dir": f"./data_sets/{args.data_set_name}/recbole_checkpoints/",  # default: "saved/"
    "show_progress": True,  # default: True
    "save_dataset": False,  # default: False
    "dataset_save_path": None,  # default: None
    "save_dataloaders": False,  # default: False
    "dataloaders_save_path": None,  # default: None
    "log_wandb": False,  # default: False
    "wandb_project": "recbole",  # default: "recbole"
    "shuffle": True,  # default: True
    # data settings
    # atomic file format
    "field_separator": "\t",  # default: "\t"
    "seq_separator": " ",  # default: " "
    # basic information
    # common features
    "USER_ID_FIELD": "user_id",  # default: "user_id"
    "ITEM_ID_FIELD": "item_id",  # default: "item_id"
    "RATING_FIELD": "rating",  # default: "rating"
    "TIME_FIELD": "timestamp",  # default: "timestamp"
    "seq_len": {},  # default: {}
    # label for point-wise dataloader
    "LABEL_FIELD": "label",  # default: "label"
    "threshold": None,  # default: None
    # negative sampling prefix for pair-wise dataloader
    "NEG_PREFIX": "neg_",  # default: "neg_"
    # sequential model needed
    "ITEM_LIST_LENGTH_FIELD": "item_length",  # default: "item_length"
    "LIST_SUFFIX": "_list",  # default: "_list"
    "MAX_ITEM_LIST_LENGTH": 50,  # default: 50
    "POSITION_FIELD": "position_id",  # default: "position_id"
    # knowledge-based model needed
    "HEAD_ENTITY_ID_FIELD": "head_id",  # default: "head_id"
    "TAIL_ENTITY_ID_FIELD": "tail_id",  # default: "tail_id"
    "RELATION_ID_FIELD": "relation_id",  # default: "relation_id"
    "kg_reverse_r": False,  # default: False
    "entity_kg_num_interval": "[0, inf)",  # default: "[0, inf)"
    "relation_kg_num_interval": "[0, inf)",  # default: "[0, inf)"
    # selectively loading
    "load_col": {"inter": ["user_id", "item_id", "rating"]},  # default: {inter: [user_id, item_id]}
    "unload_col": {},  # default: {}
    "unused_col": {},  # default: {}
    "additional_feat_suffix": [],  # default: []
    "numerical_features": [],  # default: []
    # filtering
    # remove duplicated user-item interactions
    "rm_dup_inter": None,  # default: None
    # filter by value
    "val_interval": {},  # default: {}
    # remove interaction by user or item
    "filter_inter_by_user_or_item": True,  # default: True
    # filter by number of interactions
    "user_inter_num_interval": "[0, inf)",  # default: "[0, inf)"
    "item_inter_num_interval": "[0, inf)",  # default: "[0, inf)"
    # preprocessing
    "alias_of_user_id": None,  # default: None
    "alias_of_item_id": None,  # default: None
    "alias_of_entity_id": None,  # default: None
    "alias_of_relation_id": None,  # default: None
    "preload_weight": {},  # default: {}
    "normalize_field": [],  # default: []
    "normalize_all": False,  # default: False
    "discretization": None,  # default: None
    # benchmark file
    "benchmark_filename": None,  # default: None
    # training settings
    "epochs": 50,  # default: 300
    "train_batch_size": 2048,  # default: 2048
    "learner": "adam",  # default: "adam"
    "learning_rate": 0.001,  # default: 0.001
    "training_neg_sample_args":
        {
            "distribution": "uniform",  # default: "uniform"
            "sample_num": 1,  # default: 1
            "dynamic": False,  # default: False
            "candidate_num": 0,  # default: 0
        },
    "eval_step": 5,  # default: 1
    "stopping_step": 10,  # default: 10
    "clip_grad_norm": None,  # default: None
    "loss_decimal_place": 4,  # default: 4
    "weight_decay": 0.0,  # default: 0.0
    "require_pow": False,  # default: False
    "enable_amp": False,  # default: False
    "enable_scaler": False,  # default: False
    # evaluation settings
    "eval_args":
        {
            "group_by": "user",  # default: "user"
            "order": "RO",  # default: "RO"
            "split":
                {
                    # "RS": [8, 1, 1] # default: {"RS": [8, 1, 1]}
                    "LS": "valid_and_test"
                },
            "mode":
                {
                    "valid": "uni100",  # default: "full"
                    "test": "uni100",  # default: "full"
                },
        },
    "repeatable": False,  # default: False
    "metrics": ["Recall", "MRR", "NDCG", "Hit", "MAP", "Precision", "GAUC", "ItemCoverage", "AveragePopularity",
                "GiniIndex", "ShannonEntropy", "TailPercentage"],
    # default: ["Recall", "MRR", "NDCG", "Hit", "Precision"]
    "topk": [1, 3, 5, 10, 20],  # default: 10
    "valid_metric": "NDCG@10",  # default: "MRR@10"
    "eval_batch_size": 4096,  # default: 4096
    "metric_decimal_place": 4,  # default: 4,
    # misc settings
    "model": args.model_name,
    "MODEL_TYPE": ModelType.GENERAL
}
config = Config(model=args.model_name, dataset=args.data_set_name, config_dict=config_dict)
init_seed(config['seed'], config['reproducibility'])
init_logger(config)
logger = getLogger()
logger.info(config)
dataset = create_dataset(config)
logger.info(dataset)
train_data, valid_data, test_data = data_preparation(config, dataset)
model = get_model(config["model"])(config, train_data.dataset).to(config['device'])
logger.info(model)
trainer = get_trainer(config["MODEL_TYPE"], config["model"])(config, model)
best_valid_score, best_valid_result = trainer.fit(train_data, valid_data)
test_result = trainer.evaluate(test_data)
print(test_result)

Expected behavior
Models from the algorithms DiffRec and LDiffRec should be trained and evaluated on the MovieLens-100K data set without crashing.

Desktop (please complete the following information):

  • OS: Linux
  • RecBole Version: 1.2.0
  • Python Version: 3.10
  • PyTorch Version: 2.1.1
  • cudatoolkit Version: 12.1
@lukas-wegmeth lukas-wegmeth added the bug Something isn't working label Jan 25, 2024
@BishopLiu
Copy link
Collaborator

@lukas-wegmeth Hi! Thanks for your careful check! We have fixed the above bugs in #1999.

@lukas-wegmeth
Copy link
Author

@BishopLiu #1999 fixes this issue, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants