You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Treat the MDIM yaml as a template, where we fill in some variables at the time we ship it.
Ideally, we would avoid hardcoding paths (in steps and yaml files), and all dependencies would be specified in the DAG.
After a discussion with @lucasrodes we thought of a possible solution. The config yaml file (for example of the covid mdim step) could have a special placeholder for a dataset path, e.g. {ds:short_name} (e.g. {ds:covid_cases}), specifying the short name of a dataset listed as a dependency of the mdim step in the DAG.
Then, the function paths.load_mdim_config would read the config yaml file, and replace those placeholders by the full URIs of the corresponding dataset.
Possible rabbit holes or related issues
But it is possible that multiple dependencies of an mdim step have the same short name. And we may also want to create an mdim that compares different versions of the same dataset. For such cases, we could define custom placeholders, e.g. {ds:custom_short_name}, and then pass a dictionary to paths.load_mdim_config mapping those custom short names to the corresponding dataset URI.
We also noticed that it is inconvenient that Table does not have an URI, and we rely on Table.metadata.dataset.uri. Maybe tables should also have a URI attribute.
We may need an additional function of paths to get the URI of a table in a dataset. Currently, the way we'd do that is by, e.g. `paths.load_dataset("dataset_path...") + "/table_name...".
Impact
We're not encountering this problem so much yet, but it's more that we are currently setting precedents on how a large amount of work will be done, so we're interested in saving ourselves future work by getting this right.
The text was updated successfully, but these errors were encountered:
Context
Currently, mdim steps require a config yaml file, which includes full paths of indicators in tables of grapher steps.
Potential problems
Possible solution
Treat the MDIM yaml as a template, where we fill in some variables at the time we ship it.
Ideally, we would avoid hardcoding paths (in steps and yaml files), and all dependencies would be specified in the DAG.
After a discussion with @lucasrodes we thought of a possible solution. The config yaml file (for example of the
covid
mdim step) could have a special placeholder for a dataset path, e.g.{ds:short_name}
(e.g.{ds:covid_cases}
), specifying the short name of a dataset listed as a dependency of the mdim step in the DAG.Then, the function
paths.load_mdim_config
would read the config yaml file, and replace those placeholders by the full URIs of the corresponding dataset.Possible rabbit holes or related issues
{ds:custom_short_name}
, and then pass a dictionary topaths.load_mdim_config
mapping those custom short names to the corresponding dataset URI.Table
does not have an URI, and we rely onTable.metadata.dataset.uri
. Maybe tables should also have a URI attribute.paths
to get the URI of a table in a dataset. Currently, the way we'd do that is by, e.g. `paths.load_dataset("dataset_path...") + "/table_name...".Impact
We're not encountering this problem so much yet, but it's more that we are currently setting precedents on how a large amount of work will be done, so we're interested in saving ourselves future work by getting this right.
The text was updated successfully, but these errors were encountered: