Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can you provide your websqp kg embedding file? #39

Open
ToneLi opened this issue Oct 12, 2020 · 11 comments
Open

Can you provide your websqp kg embedding file? #39

ToneLi opened this issue Oct 12, 2020 · 11 comments

Comments

@ToneLi
Copy link

ToneLi commented Oct 12, 2020

I found your websqp kg embedding file is None, can you supply it? When I run your training embedding code with your KG (relevant your websqp ) . I found there is an error, CUDA out of memory
Can you suply the relevent embedding file, just like metaqa??

@sharon-gao
Copy link

I found your websqp kg embedding file is None, can you supply it? When I run your training embedding code with your KG (relevant your websqp ) . I found there is an error, CUDA out of memory
Can you suply the relevent embedding file, just like metaqa??

I encountered the same error about out of memory. Have you fixed this problem?

@apoorvumang
Copy link
Collaborator

Can you tell the exact command you executed, along with your GPU configuration? @ShuangNYU @ToneLi

@sharon-gao
Copy link

I use the command as below.
command = 'python3 main.py --dataset fbwq_full --num_iterations 1500 --batch_size 256 '
'--lr 0.0005 --dr 1.0 --edim 200 --rdim 200 --input_dropout 0.2 '
'--hidden_dropout1 0.3 --hidden_dropout2 0.3 --label_smoothing 0.1 '
'--valid_steps 10 --model ComplEx '
'--loss_type BCE --do_batch_norm 1 --l3_reg 0.001 '
'--outfile /scratch/ComplEx_fbwq_half'

The Error is
Number of training data points: 11560492
Entities: 1886683
Relations: 1144
Model is ComplEx
Starting training...
Traceback (most recent call last):
File "main.py", line 327, in
experiment.train_and_eval()
File "main.py", line 230, in train_and_eval
loss.backward()
File "/home/sg5963/.local/lib/python3.6/site-packages/torch/tensor.py", line 118, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/home/sg5963/.local/lib/python3.6/site-packages/torch/autograd/init.py", line 93, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: CUDA out of memory. Tried to allocate 1.41 GiB (GPU 0; 11.17 GiB total capacity; 9.62 GiB already allocated; 823.31 MiB free; 429.74 MiB cached)

Interestingly, I am training with TuckER now and has been on running for 5 hours...

@ToneLi
Copy link
Author

ToneLi commented Oct 18, 2020

I did the experiment in fbwq_half, I also have the same problem with ShuangNYU, CUDA out of memory, I set the batch_size is 64. While when I use TuckER, That's gone.

@apoorvumang
Copy link
Collaborator

Please set the batch size to 32 so that out of memory doesn't happen. Also, I'm not sure how training with TuckER will work out since the pre-trained embeddings are for ComplEx, not TuckER.

@ToneLi
Copy link
Author

ToneLi commented Oct 20, 2020

Can you tell me what's your gpu server, I trained half KG about WebQSP, it consumes my 24059M gpu space and one day but cannot get the result. On the other hand, I want to know does ShuangNYU get the result?

@apoorvumang
Copy link
Collaborator

@ToneLi We trained on single 1080Ti with 12G memory. Will let you know the exact command and exact output tomorrow.

@ToneLi
Copy link
Author

ToneLi commented Oct 20, 2020

Thanks!!

@sharon-gao
Copy link

Can you tell me what's your gpu server, I trained half KG about WebQSP, it consumes my 24059M gpu space and one day but cannot get the result. On the other hand, I want to know does ShuangNYU get the result?

I didn't get the results yet, either. I ran it on cluster of our school.
RuntimeError: CUDA out of memory. Tried to allocate 1.80 GiB (GPU 0; 15.90 GiB total capacity; 11.19 GiB already allocated; 1.80 GiB free; 2.14 GiB cached)

@ToneLi
Copy link
Author

ToneLi commented Oct 24, 2020

I rewrite the source code of TuckER, add the Complex in it, this code can run. I test it in half KG about WebQSP, my epoch is 6, the result is
Hits @10: 0.4956
Hits @3: 0.3396
Hits @1: 0.21595
Mean rank: 99493.86215. I will test the more epoch in the next. If need, you can contact me.

@ToneLi
Copy link
Author

ToneLi commented Nov 1, 2020

@apoorvumang, I want to know the result about your knowledge embedding in webqsp, because my code has run five days, my epoch is 400. until now, the results is not good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants