Other embeddings except for ComplEx? #41

sharon-gao · 2020-10-17T16:14:18Z

Hi, thanks for providing your code!

I was wondering whether you have ever tried any other embeddings like RESCAL like you included in your 'model.py'.
Did you select 'ComplEx' because it achieves the best performance?

apoorvumang · 2020-10-17T16:21:45Z

We did, and all multiplicative models performed similarly for us. We chose ComplEx because it was slightly better.

sharon-gao · 2020-10-17T16:28:26Z

I also had a try on MetaQA. It seems weird to me because they perform different on the link prediction task, so I thought an embedding with better quality (performing much better on link prediction) would also dominate in MetaQA task. However, they performed similarly. Without relation matching, they all achieved ~0.95 on the 1hop QA.
Do you have any idea why this happens?

apoorvumang · 2020-10-17T17:04:44Z

You might want to look into this paper: https://openreview.net/pdf?id=BkxSmlBFvr People have pointed out that rather than the scoring function, the training scheme (eg. negative sampling, loss) matters more.

sharon-gao · 2020-10-17T22:23:22Z

Thanks so much for sharing this paper! I thinks it helps a lot to explain!
BTW, when training embeddings over the half-version MetaQA KG, I got poor results (~0.12) on link prediction, even after training 1500 epochs. Do you have any suggestions on hyper-parameter tuning?

apoorvumang · 2020-10-18T05:40:25Z

@ShuangNYU What command did you use, and what was the exact output?

sharon-gao · 2020-10-20T21:32:22Z

For example, I used
command = 'python3 main.py --dataset MetaQA_half --num_iterations 500 --batch_size 256 '
'--lr 0.0005 --dr 1.0 --edim 200 --rdim 200 --input_dropout 0.2 '
'--hidden_dropout1 0.3 --hidden_dropout2 0.3 --label_smoothing 0.1 '
'--valid_steps 10 --model ComplEx '
'--loss_type BCE --do_batch_norm 1 --l3_reg 0.001 '
'--outfile /scratch/embeddings'

I doubt whether it's related to hyper-parameters. What configurations you used to train the embeddings?

mug2mag · 2021-11-03T09:05:03Z

I also had a try on MetaQA. It seems weird to me because they perform different on the link prediction task, so I thought an embedding with better quality (performing much better on link prediction) would also dominate in MetaQA task. However, they performed similarly. Without relation matching, they all achieved ~0.95 on the 1hop QA. Do you have any idea why this happens?

You might want to look into this paper: https://openreview.net/pdf?id=BkxSmlBFvr People have pointed out that rather than the scoring function, the training scheme (eg. negative sampling, loss) matters more.

@apoorvumang I also met this problem, and have read this paper mentioned above. But I can't exactly understand why. Could you please further explain why the better performance in knowledge embedding can not benefit the downstream KGQA task? Looking forward to your reply. Thanks very much!

apoorvumang · 2021-11-03T10:21:58Z

The point they make in the paper is that all scoring functions perform roughly the same. The reason that earlier models reported lesser numbers than modern reproduction is mostly due to not doing enough hyperparameter search as well as optimisation techniques not available at that time (Adam, dropout, batch norm)

Could you please further explain why the better performance in knowledge embedding can not benefit the downstream KGQA task?

For this you need to convincingly show that 1 model is better than another for KGC on MetaQA (not on other datasets). The models we obtained performed similarly on KGC as well

mug2mag · 2021-11-03T10:55:11Z

@apoorvumang Thanks very much for your quick reply.

The point they make in the paper is that all scoring functions perform roughly the same. The reason that earlier models reported lesser numbers than modern reproduction is mostly due to not doing enough hyperparameter search as well as optimisation techniques not available at that time (Adam, dropout, batch norm)

Could you please further explain why the better performance in knowledge embedding can not benefit the downstream KGQA task?

For this you need to convincingly show that 1 model is better than another for KGC on MetaQA (not on other datasets). The models we obtained performed similarly on KGC as well

Sorry, maybe I didn't make myself clear. I will explain additionally: I tried EmbedKGQA over my own dataset. By dealing with the dataset, I could get different performances (e.g., sometimes high hit@1, sometimes low hit@1) in the knowledge embedding stage using ComplEx. But I got similar results in the QA task with evident different ComplEx performances. I don't know why.

apoorvumang · 2021-11-03T11:34:36Z

Oh got it. Did you do this in the full KG setting?

If so, I think the performance would be same for all models since to answer questions you need to reason over train triples mostly, not unseen triples. I would suggest you check MRR on train split for these models, if its close to 1 (or equal to 1) for both, it might explain the similar performance.

I hope this is clear? It would be interesting to see your results

mug2mag · 2021-11-03T12:17:24Z

@apoorvumang hi again

Oh got it. Did you do this in the full KG setting?

I do this over the full KG setting but also with training set, valid set and test set both in Complex embedding stage and QA task.

If so, I think the performance would be same for all models since to answer questions you need to reason over train triples mostly, not unseen triples. I would suggest you check MRR on train split for these models, if its close to 1 (or equal to 1) for both, it might explain the similar performance.

Sorry for that I do not understand the sentence " if its close to 1 (or equal to 1) for both, it might explain the similar performance", and I have not searched out the related knowledge. [facepalm] Hope for more information.

apoorvumang · 2021-11-03T12:29:43Z

Could you evaluate KGC (ie MRR) for train set on your data? If the train KG is too large, maybe take a few lines from that to make a test2.txt and evaluate on that? That would give better insight (and then I can make my argument)

mug2mag · 2021-11-03T12:45:03Z

@apoorvumang
Different results with different split strategies in knowledge embedding learning stage.

training set: validation set: test set = 3:1:1
training set: validation set: test set = 5:1:1. In this setting, data in validation set and test set are all included in the training set to hope for a better result in the QA task. But I got similar hit@1 result about 66% with these two distinct performances in ComplEx.

apoorvumang · 2021-11-03T12:55:33Z

So in the 2nd setting the performance is pretty bad even though triples that you are running evaluation on are the ones you have trained on? This seems very weird.

When I trained for MetaQA, I got MRR 1.0 on train set (since it has already seen all the triples) but here it seems much worse.

What library are you using for training KG Embeddings? And what are your dataset stats (ie entities, relations, triples)? Also did you try any hyperparameter tuning?

mug2mag · 2021-11-03T13:04:52Z

@apoorvumang
Yes, it's very weird.
Specific information about my KG:

train:  965250
test:  193050
valid:  193050

The toolkit is importing datasets.
The total of relations is 250.
The total of entities is 113446.
The total of train triples is 950341.

The total of test triples is 193050.
The total of valid triples is 193050.

I use the OpenKE library (https://github.com/thunlp/OpenKE) for training embedding, but I can't recognize any incorrect operations in my code

apoorvumang · 2021-11-03T13:15:31Z

Embedding size used?

mug2mag · 2021-11-03T13:23:18Z

Embedding size is 200

apoorvumang · 2021-11-03T13:41:15Z

Hmm can't really say why this is happening. I would expect filtered MRR to be much higher when evaluating on train set. The graph seems pretty dense but that would still not affect filtered rankings.

Also I wouldn't expect 2 ComplEx models having such different MRR (on train) to give similar hits@1 on QA.

Right now only thing I can suggest is to see if with public datasets you can get MRR close to 1.0 on train set. This should in theory rule out implementation bugs.

Second thing would be hyperparameter tuning. LibKGE has inbuilt hyperparam search so you could use that. But I would do that only after doing the above mentioned sanity check

mug2mag · 2021-11-03T13:49:39Z

@apoorvumang Thank you very much! I will try your suggestions. if there are some findings, I will let you know. Thanks again.

yhshu · 2022-02-18T02:02:55Z

May I know how did translational models perform instead of multiplicative models for EmbedKGQA? Thanks. @apoorvumang

apoorvumang · 2022-02-18T11:46:48Z

@yhshu We only performed some preliminary experiments with TransE where performance was pretty bad. It has not been tested thoroughly though

sudarshan77 · 2022-04-01T06:33:51Z

@apoorvumang Different results with different split strategies in knowledge embedding learning stage.

training set: validation set: test set = 3:1:1

training set: validation set: test set = 5:1:1. In this setting, data in validation set and test set are all included in the training set to hope for a better result in the QA task. But I got similar hit@1 result about 66% with these two distinct performances in ComplEx.

Dear @mug2mag,
Good work! I was thinking to implement Hit@3, Hit@10. Could you please throw the light on the implementation details (may be pseudocode or a snippet) will do!

Thanks

lihuiliullh mentioned this issue Nov 25, 2021

How to use ComplEx pretrain MetaQA_half? #109

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Other embeddings except for ComplEx? #41

Other embeddings except for ComplEx? #41

sharon-gao commented Oct 17, 2020

apoorvumang commented Oct 17, 2020

sharon-gao commented Oct 17, 2020

apoorvumang commented Oct 17, 2020

sharon-gao commented Oct 17, 2020

apoorvumang commented Oct 18, 2020

sharon-gao commented Oct 20, 2020

mug2mag commented Nov 3, 2021 •

edited

Loading

apoorvumang commented Nov 3, 2021 •

edited

Loading

mug2mag commented Nov 3, 2021 •

edited

Loading

apoorvumang commented Nov 3, 2021 •

edited

Loading

mug2mag commented Nov 3, 2021

apoorvumang commented Nov 3, 2021

mug2mag commented Nov 3, 2021

apoorvumang commented Nov 3, 2021

mug2mag commented Nov 3, 2021 •

edited

Loading

apoorvumang commented Nov 3, 2021

mug2mag commented Nov 3, 2021

apoorvumang commented Nov 3, 2021

mug2mag commented Nov 3, 2021

yhshu commented Feb 18, 2022

apoorvumang commented Feb 18, 2022

sudarshan77 commented Apr 1, 2022

Other embeddings except for ComplEx? #41

Other embeddings except for ComplEx? #41

Comments

sharon-gao commented Oct 17, 2020

apoorvumang commented Oct 17, 2020

sharon-gao commented Oct 17, 2020

apoorvumang commented Oct 17, 2020

sharon-gao commented Oct 17, 2020

apoorvumang commented Oct 18, 2020

sharon-gao commented Oct 20, 2020

mug2mag commented Nov 3, 2021 • edited Loading

apoorvumang commented Nov 3, 2021 • edited Loading

mug2mag commented Nov 3, 2021 • edited Loading

apoorvumang commented Nov 3, 2021 • edited Loading

mug2mag commented Nov 3, 2021

apoorvumang commented Nov 3, 2021

mug2mag commented Nov 3, 2021

apoorvumang commented Nov 3, 2021

mug2mag commented Nov 3, 2021 • edited Loading

apoorvumang commented Nov 3, 2021

mug2mag commented Nov 3, 2021

apoorvumang commented Nov 3, 2021

mug2mag commented Nov 3, 2021

yhshu commented Feb 18, 2022

apoorvumang commented Feb 18, 2022

sudarshan77 commented Apr 1, 2022

mug2mag commented Nov 3, 2021 •

edited

Loading

apoorvumang commented Nov 3, 2021 •

edited

Loading

mug2mag commented Nov 3, 2021 •

edited

Loading

apoorvumang commented Nov 3, 2021 •

edited

Loading

mug2mag commented Nov 3, 2021 •

edited

Loading