Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPINN example not batching and never testing fold #5

Open
fac2003 opened this issue Aug 2, 2018 · 0 comments
Open

SPINN example not batching and never testing fold #5

fac2003 opened this issue Aug 2, 2018 · 0 comments

Comments

@fac2003
Copy link

fac2003 commented Aug 2, 2018

The SPINN example reads:

    for batch_idx, batch in enumerate(train_iter):
        opt.zero_grad()

        all_logits, all_labels = [], []
        fold = torchfold.Fold(device=device)
        # TODO: incorrect logic here, the for loop goes through the entire dataset, not the batch:
        for example in batch.dataset:

The line for example in batch.dataset: does not iterate though a batch. Instead, it iterates through an entire training set. This means that the fold code after the loop will not run until after the entire training set is scanned.
If I accumulate up to batch_size examples to test the fold code after the loop, I get an exception:

File "/Users/fac2003/PycharmProjects/torchfold/examples/snli/spinn-example.py", line 54, in leaf
embedded = self.embeddings(word_id)
File "/Users/fac2003/pytorch_env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/Users/fac2003/pytorch_env/lib/python3.6/site-packages/torch/nn/modules/sparse.py", line 108, in forward
self.norm_type, self.scale_grad_by_freq, self.sparse)
File "/Users/fac2003/pytorch_env/lib/python3.6/site-packages/torch/nn/functional.py", line 1076, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: index out of range at /Users/soumith/code/builder/wheel/pytorch-src/aten/src/TH/generic/THTensorMath.c:343

Presumably, the word id is larger than the max number of words. This is likely a consequence of initializing the tree with a non vectorized example. I am wondering why the torchtext tensors in batch are not used instead (they have the correct batch dimensionality).

fac2003 added a commit to CampagneLaboratory/torchfold that referenced this issue Aug 2, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant