You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Here are some notes from the WB discussion about WDI and from our chat prior to it:
Clarify where we get metadata from. It looks like we're fetching it dynamically in the garden step, but in reality, we get it from a .zip file.
(Pablo) My suggestion would be that, in the meadow step, we store two tables: one for the main data, and one for the metadata. This way, in the garden step, there's no need to load_snapshot again. The garden step can simply depend on meadow step, which has two tables. We do something similar with FAOSTAT (although there the metadata is a separate dataset, which I think is a worse option).
Rethink how we cite WB and their underlying sources. Right now, we take the full source from WDI and shorten it with the help of GPT (see wdi.sources.json and update_metadata.ipynb), but it's inconsistent and doesn't follow our best practices (which have changed since switching from sources to origins). We could also extract more information into the origins fields.
(Pablo) Creating an appropriate short citation is an important point, but it sounds like it has already been handled? Additionally, we could use their long citations into our citation_full, with as much detail as they provide.
The Statistical Capacity Indicator was replaced by "Statistical Performance" (migrate indicators from the old version).
(Pablo) Not sure what the action item is here. Maybe simply mention it in the docstring of the snapshot step?
Describe the updating process in the README or in the snapshot docstring. If we find any quality issues, we should report them to [email protected] and [email protected].
(Pablo) Good idea. We don't have any readme for WDI in docs/data, so I suppose the appropriate place to describe the update procedure and contact persons could be the docstring of the snapshot .py file.
The text was updated successfully, but these errors were encountered:
Hi @Marigold thanks for summarizing the main conclusions in this issue. I'm waiting for others' feedback on this thread to write back to WDI.
I've added a few comments on the description above. Feel free to take over those tasks, given that you understand the details of this pipeline better. Otherwise I can delve into it in the coming weeks. And let me know if I can help, thanks!
PS: I'll merge my PR later today, to avoid blocking ETL in production.
Here are some notes from the WB discussion about WDI and from our chat prior to it:
load_snapshot
again. The garden step can simply depend on meadow step, which has two tables. We do something similar with FAOSTAT (although there the metadata is a separate dataset, which I think is a worse option).wdi.sources.json
andupdate_metadata.ipynb
), but it's inconsistent and doesn't follow our best practices (which have changed since switching from sources to origins). We could also extract more information into the origins fields.citation_full
, with as much detail as they provide.docs/data
, so I suppose the appropriate place to describe the update procedure and contact persons could be the docstring of the snapshot .py file.The text was updated successfully, but these errors were encountered: