Skip to content

Commit

Permalink
research
Browse files Browse the repository at this point in the history
  • Loading branch information
ElisaWirsching committed Jul 5, 2024
1 parent d35c621 commit 1b9d033
Show file tree
Hide file tree
Showing 2 changed files with 2 additions and 2 deletions.
2 changes: 1 addition & 1 deletion research.html
Original file line number Diff line number Diff line change
Expand Up @@ -439,7 +439,7 @@ <h2>
<div class="color-button">
resources
</div>
</a><a href="https://codeocean.com/capsule/4082319/tree/v1">
</a><a href="https://codeocean.com/capsule/4082319/tree/v2">
<div class="color-button">
replication
</div>
Expand Down
2 changes: 1 addition & 1 deletion research.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ layout: page
<summary>Abstract</summary>
Word embeddings are now a vital resource for social science research. Unfortunately, it can be difficult to obtain high quality embeddings for non-English languages, and it may be computational expensive to do so. In addition, social scientists typically want to make statistical comparisons and do hypothesis tests on embeddings, but this is non-trivial with current approaches. We provide three new data resources designed to ameliorate the union of these issues: (1) a new version of <tt>fastText</tt> model embeddings, fit to Wikipedia corpora; (2) a multi-language "a la carte" (ALC) embedding version of the <tt>fastText</tt> model fit to Wikipedia corpora; (3) a multi-language ALC embedding version of the well-known <tt>GloVe</tt> model fit to Wikipedia corpora. These materials are aimed at "low resource" users who lack access to large corpora in their language of interest, or who lack access to the computational resources required to produce high-quality vector representations. We make these resources available for 30 languages, along with a code pipeline for another 127 languages available from Wikipedia corpora. We provide extensive validation of the materials, via reconstruction tests and some translation proofs-of-concept. We also conduct and report on human crowdworker tests, for our embeddings for Arabic, French, (traditional, Mandarin) Chinese, Japanese, Korean, Russian and Spanish. <br>
</details>
<a href="https://alcembeddings.org/assets/img/RSSW_paper_january_2024.pdf"><div class="color-button">pdf</div></a> <a href="http://alcembeddings.org/index.html"><div class="color-button">resources</div></a><a href="https://codeocean.com/capsule/4082319/tree/v1"><div class="color-button">replication</div></a>
<a href="https://alcembeddings.org/assets/img/RSSW_paper_january_2024.pdf"><div class="color-button">pdf</div></a> <a href="http://alcembeddings.org/index.html"><div class="color-button">resources</div></a><a href="https://codeocean.com/capsule/4082319/tree/v2"><div class="color-button">replication</div></a>
</li><br>
</ul>

Expand Down

0 comments on commit 1b9d033

Please sign in to comment.