-
Notifications
You must be signed in to change notification settings - Fork 48
Configuration file
The file config.yml
is the primary configuration file for your application. It contains the following fields:
app_name
: This is the name of your application. It should be relevant to your research question and target terms, and not contain any spaces.
description
: A brief description of what your application does and what kind of result it produces.
user
: Your full name
email
: The email address you would like us to use to contact you
language
: The primary programming language your application is written in. Used to run the proper dependency install command on the GeoDeepDive infrastructure. For a list of currently supported languages please see the supported languages page.
The following two fields are used for culling the corpus to be more relevant for your application. In a majority of cases, applications that are run against a subset of the corpus with a high signal will produce better results than those run against the entire corpus. For example, if you are interested in "coffee", but "coffee" only occurs in 1% of all documents, the application will run much faster and produce a better result if only that 1% of documents is used.
The idea is not to completely eliminate noise, but rather to increase signal. Choose terms whose presence in a document is a good indication that the document may contain content of interest.
dictionaries
: One or more comma-separated dictionaries to use for culling the corpus. GeoDeepDive contains categorized lists of preindexed terms to make subsetting the corpus easier. Current dictionaries include a list of all taxa from the Paleobiology Database and all stratigraphic names from Macrostrat. For a list of all available dictionaries, please see https://geodeepdive.org/api/dictionaries?all.
terms
: One or more comma-separated terms to use for culling the corpus.