may use SDMX interface #39

epogrebnyak · 2024-04-21T19:27:19Z

See tools developped by @ONEcampaign, @jm-rivera, @lpicci96 team:

https://github.com/ONEcampaign/bblocks/blob/main/bblocks/import_tools/imf_weo.py

lpicci96 · 2024-04-23T09:19:49Z

Thanks for opening this issue @epogrebnyak

The SDMX approach has some advantages, most importantly standardisation of data and metadata. It also bring in both national and regional data together, which the package currently lacks. There are some limitations. SDMX data is not available before 2017, this can be a limiting factor because a big advantage of the package is allowing access to historical releases. To keep this advantage you would need to integrate both SDMX and xls data together. I had also come across an issue with one of the 2021 releases (corrupted files) and I'm not sure if the files had been fixed by the IMF team.

In terms of implementation, this could be problematic because the field names are slightly different between the SDMX data and the xls data. There is also some renaming and reformatting being done by the package that would need to be refactored. I'm not sure all that would need to change but I imagine there would be some breaking changes to the UI. One example is the country functionality. In the downloaded data from the package there is also a column Country which isn't suitable for both country and regional data together. Generating and handling iso3 codes would also need to be amended to handle regions.

In terms of the priorities for our work at the ONE Campaign, we are most interested in the data extraction bit which becomes a component of our ETL. We would need to be quite reactive to new releases as well so we would still likely rely on some of our own tooling in that process, in case of breakages and need for maintenance of the tool. The advantages of weo-reader are 1. access historical releases 2. interactivity with data in a user-friendly way 3. the potential for more advanced analysis tools (which would could make the package eligible for JOSS).

To benefit both of our purposes, I propose I repackage the tool we created into a thin api for the SDMX data, which weo-reader can wrap. This way it is easier to start integrating the SDMX data while keeping all the existing functionality.

There are some other enhancements to weo-reader that I think could be interesting to pursue. Of course there is the addition of regional data, even through the xls files. The other is handling the downloaded data. Having all the raw data saved to disk is useful, but at times users don't need the raw data file and may prefer not having to go through the download step. There are ways to bypass saving the data to disk, and caching to prevent multiple redundant downloads. There could be a save_to_disk method of some kind to save the raw data. I think this could be a useful feature and happy to help.

Let me know your thoughts on my proposition and if you have other ideas

epogrebnyak · 2024-04-23T14:58:25Z

All good ideas, what is the entry point for SDMX and how is it documented?

lpicci96 · 2024-04-23T16:23:59Z

The releases come along with a SDMX Data Structure Definition. I would start there. This helper class we created parses the data to a dataframe. You can look at our implementation there

epogrebnyak · 2024-04-23T20:57:48Z

You can look at our implementation there

Is main action happening here? The SDMX is a zip file and then you process it into a dataframe? Is it roughly a URL -> ZipFile -> pd.DataFrame?

https://github.com/ONEcampaign/bblocks/blob/93da6b0175c0efdf9530826b64f747d5d6085d8e/bblocks/import_tools/imf_weo.py#L142-L149

lpicci96 · 2024-04-24T06:59:38Z

Partly yes. The full extraction pipeline is run by this function extract_data which takes a WEO version as a parameter. First it will find the href for the SDMX data. We use some webscraping instead of hardcoding the url in case of changes. If the href is found, it will make a request and store the response content as a ZipFile object. The zipfile object is then parsed using the Parser helper class to get that data as a DataFrame. extract_data is injected into the WEO class which is the main UI. When the data needs to be downloaded (either if the data has never been downloaded or the user wants to refresh) the extraction function is called.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

may use SDMX interface #39

may use SDMX interface #39

epogrebnyak commented Apr 21, 2024

lpicci96 commented Apr 23, 2024 •

edited

Loading

epogrebnyak commented Apr 23, 2024

lpicci96 commented Apr 23, 2024

epogrebnyak commented Apr 23, 2024 •

edited

Loading

lpicci96 commented Apr 24, 2024

may use SDMX interface #39

may use SDMX interface #39

Comments

epogrebnyak commented Apr 21, 2024

lpicci96 commented Apr 23, 2024 • edited Loading

epogrebnyak commented Apr 23, 2024

lpicci96 commented Apr 23, 2024

epogrebnyak commented Apr 23, 2024 • edited Loading

lpicci96 commented Apr 24, 2024

lpicci96 commented Apr 23, 2024 •

edited

Loading

epogrebnyak commented Apr 23, 2024 •

edited

Loading