This is supplementary data and code to help reproduce the results of a paper of the same name. It is currently available in somewhat unpolished form, but will be fully documented in the near future.
-
populations
: contains .tsv files with language-population data and other population-relevant data -
task_results
: contains .tsv files with the aggregated results for each NLP task. -
economic_indicators_data
: contains files from WITS (underwits_en_trade_summary_allcountries_allyears
) and converted to map to languages instead of countries. Important files are:languages_to_gdp.tsv
for monolingual mapping of languages to associated GDP estimations.bilingual_indicators.tsv
for bilingual mapping of languages to associated bilingual indicators (Imports, Exports) estimations. Also includes the triangulated BLEU scores for the language pair.
-
figs
: contains correlation figures, created with theplot_*_correlations.py
scripts. -
area-classifier
: contains data and code for a classifier of areas
counterfactuals.py
: computes the counterfactual scenarios presented in the paperconstants.py
contains functions to read in all necessary data, which are used in other files to run the metrics estimations and produce the plotseconomic_indicators.py
: contains function to read in economic indicators (called byconstants.py
)
- Add general metric calculation script
- Data paths are all absolute, need to correct this