Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Which is better for Cross Lingual classfication Task: LASER or XLM? #106

Open
Ayush-iitkgp opened this issue Nov 8, 2019 · 16 comments

Comments

@Ayush-iitkgp
Copy link

Hello,
I have the training data with labels in English. Now, I want to use this data to predict for other languages. I saw XLM and LASER both support the cross-lingual classification. However, they don't have the benchmark on the same dataset, therefore, it's difficult to know which model is better. Does someone help me in determining which(XLM or LASER) is better for cross-lingual classification?

@Ayush-iitkgp
Copy link
Author

XLM is better than LASER

  1. XLM Benchmark: https://github.com/facebookresearch/XLM#ii-cross-lingual-language-model-pretraining-xlm
  2. LASER Benchmark: https://github.com/facebookresearch/LASER/tree/master/tasks/xnli#results

@PiotrCzapla
Copy link

You can consider further improving the results for regular document classification by following this approach http://nlp.fast.ai/classification/2019/09/10/multifit.html . We used LASER as XLM wasn't available when we were testing multifit. I would be super interested to see how it works with XLM

@MastafaF
Copy link

XLM does not cover all 100 languages to the best of my knowledge. Which model/implementation did you use @Ayush-iitkgp ?

@MastafaF
Copy link

Indeed, XLM with MLM+TLM only covers 15 languages currently...

@Ayush-iitkgp
Copy link
Author

@MastafaF I am starting with the XLM model with 15 languages. XLM does support 100 languages, see here.

@MastafaF
Copy link

@Ayush-iitkgp From my reading of their paper a few weeks ago, my understanding is that the version MLM+TLM is the one that gets to the best results in terms of multilingual embeddings and that can outperform LASER. Indeed, multi-BERT already implemented MLM for a large number of languages and the quality of the multilingual embeddings is not optimal.

@hoschwenk
Copy link
Contributor

hoschwenk commented Nov 20, 2019

Hello,
which approach is better, depends on the classification task, and maybe the languages you want to transfer to. Also, you may need a "deeper" classifier for LASER than for XLM.
Best is to try both approaches :-)

@Bachstelze
Copy link

@Ayush-iitkgp How is your approach doing?
I will try classification with model freezing and an extra layer on XLM-R for low-ressource languages. FastText could be interesting if time matters.

@Ayush-iitkgp
Copy link
Author

@Bachstelze my approach included fine-tuning the XLM model on English data and using zero-shot classification to predict on german and Spanish languages. However, the performance of the model on german and Spanish is less than even 20% (accuracy) so I am still figuring out what can be done. The problem in my case is that I only have labeled English data and a very small amount of non-English data for performance measurement. Do you have any recommendations?

@Bachstelze
Copy link

How is the performance of the model on English?
The following two recommendations could help with the German and Spanish accuracy problem:

  • Preserve the multilingual layers and only train an additional layer on the top. For example set the paramter 'trainable' to 'false' in allennlp.
  • Translate the labeled data to the other languages. For example with the transformer.wmt19.en-de

Hopefully you give the power of your knowledge back to the people.

@loretoparisi
Copy link

loretoparisi commented Feb 3, 2020

  1. https://github.com/facebookresearch/XLM#ii-cross-lingual-language-model-pretraining-xlm

Now XML-R has 100 language, so it makes a lot of sense to replaceLASER with XML-R:

XLM-R is the new state-of-the-art XLM model. XLM-R shows the possibility of training one model for many languages while not sacrificing per-language performance. It is trained on 2.5 TB of CommonCrawl data, in 100 languages
https://github.com/facebookresearch/XLM#ii-cross-lingual-language-model-pretraining-xlm

@MastafaF
Copy link

MastafaF commented Feb 14, 2020

Hi @loretoparisi,
My experiments on the task WMT2012 (similarity search) comparing XLM MLM on 100 languages and LASER shows that LASER clearly outperforms XLM in this case.
Haven't tried with XLM-R since on my side it is quite buggy still. But I will be happy to share more around it.

Cheers,

@loretoparisi
Copy link

loretoparisi commented Feb 15, 2020

@MastafaF pretty interesting test! We did not test XLM-R yet, I wonder why XLM model does not outperform LASER bi-LSTM networks, because according to the results they have presented it should be the opposite, but we did not replace LASER anyways for several other reasons.
At this point, a test with XLM-R must be done!

@MastafaF
Copy link

MastafaF commented Feb 17, 2020

Hi @loretoparisi , XLM-R gives poor results at the moment. Stay tuned for further experiments, will post the link soon for replication 😄

@MastafaF
Copy link

MastafaF commented Mar 10, 2020

Hi @loretoparisi , you can check some tests here on WMT2012 reproducing experiments from LASER and doing a comparative study between other multilingual architectures.
I plan on maintaining it as often as possible to compare SOTA solutions.
Feel free to raise an issue or send a PR if need be. Hope this helps! 😃

@loretoparisi
Copy link

@MastafaF thank you very much! It's a very comprehensive and rigorous analysis 💯

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants