Current Version = V0.1
Check history for other versions.
# ! wget https://raw.githubusercontent.com/cisnlp/GlotScript/main/metadata/GlotScript.tsv
df = pd.read_csv('GlotScript.tsv', na_filter= False, sep='\t')
-
MAIN or CORE: Given a language l identified by an ISO639 code, we categorize a script for l as MAIN if this is supported by at least two of the three sources.
-
AUXILIARY (aux): If only one metadata source agrees on a script and not the other, the script is placed in the auxiliary category specific to that source. Wiki-aux, LREC2800-aux, and SIL-aux are used for Wikipedia, LREC_2800, and SIL, respectively. SIL2-aux is exclusively used for discrepancies between ScriptSource and LangTag.
This dataset is available under the CC BY-SA 4.0 license, permitting modification and redistribution.
- Wikipeida: Since Wikipedia writing system metadata is not easily redistributed, we provide our crawled version of the Writing System Text from Wikipedia in the sources folder.
- ScriptSource
- Unicode CLDR
- LangTag
- LREC_2800
- Omniglot