When were the most recent publications of pre-training data included? #116

kaisugi · 2021-05-29T03:03:44Z

I know that SciBERT is pre-trained by the Semantic Scholar corpus. I also know that the Semantic Scholar corpus is not publicly available.

I am wondering how many new papers are included in the pre-training data. For example, are papers from ACL 2018 included?
The Semantic Scholar Corpus paper was published in 2018 or so, so I'm guessing that's right around the borderline between having a paper...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When were the most recent publications of pre-training data included? #116

When were the most recent publications of pre-training data included? #116

kaisugi commented May 29, 2021 •

edited

Loading

When were the most recent publications of pre-training data included? #116

When were the most recent publications of pre-training data included? #116

Comments

kaisugi commented May 29, 2021 • edited Loading

kaisugi commented May 29, 2021 •

edited

Loading