Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

may use SDMX interface #39

Open
epogrebnyak opened this issue Apr 21, 2024 · 5 comments
Open

may use SDMX interface #39

epogrebnyak opened this issue Apr 21, 2024 · 5 comments

Comments

@epogrebnyak
Copy link
Owner

See tools developped by @ONEcampaign, @jm-rivera, @lpicci96 team:

https://github.com/ONEcampaign/bblocks/blob/main/bblocks/import_tools/imf_weo.py

@lpicci96
Copy link

lpicci96 commented Apr 23, 2024

Thanks for opening this issue @epogrebnyak

The SDMX approach has some advantages, most importantly standardisation of data and metadata. It also bring in both national and regional data together, which the package currently lacks. There are some limitations. SDMX data is not available before 2017, this can be a limiting factor because a big advantage of the package is allowing access to historical releases. To keep this advantage you would need to integrate both SDMX and xls data together. I had also come across an issue with one of the 2021 releases (corrupted files) and I'm not sure if the files had been fixed by the IMF team.

In terms of implementation, this could be problematic because the field names are slightly different between the SDMX data and the xls data. There is also some renaming and reformatting being done by the package that would need to be refactored. I'm not sure all that would need to change but I imagine there would be some breaking changes to the UI. One example is the country functionality. In the downloaded data from the package there is also a column Country which isn't suitable for both country and regional data together. Generating and handling iso3 codes would also need to be amended to handle regions.

In terms of the priorities for our work at the ONE Campaign, we are most interested in the data extraction bit which becomes a component of our ETL. We would need to be quite reactive to new releases as well so we would still likely rely on some of our own tooling in that process, in case of breakages and need for maintenance of the tool. The advantages of weo-reader are 1. access historical releases 2. interactivity with data in a user-friendly way 3. the potential for more advanced analysis tools (which would could make the package eligible for JOSS).

To benefit both of our purposes, I propose I repackage the tool we created into a thin api for the SDMX data, which weo-reader can wrap. This way it is easier to start integrating the SDMX data while keeping all the existing functionality.

There are some other enhancements to weo-reader that I think could be interesting to pursue. Of course there is the addition of regional data, even through the xls files. The other is handling the downloaded data. Having all the raw data saved to disk is useful, but at times users don't need the raw data file and may prefer not having to go through the download step. There are ways to bypass saving the data to disk, and caching to prevent multiple redundant downloads. There could be a save_to_disk method of some kind to save the raw data. I think this could be a useful feature and happy to help.

Let me know your thoughts on my proposition and if you have other ideas

@epogrebnyak
Copy link
Owner Author

All good ideas, what is the entry point for SDMX and how is it documented?

@lpicci96
Copy link

The releases come along with a SDMX Data Structure Definition. I would start there. This helper class we created parses the data to a dataframe. You can look at our implementation there

@epogrebnyak
Copy link
Owner Author

epogrebnyak commented Apr 23, 2024

You can look at our implementation there

Is main action happening here? The SDMX is a zip file and then you process it into a dataframe? Is it roughly a URL -> ZipFile -> pd.DataFrame?

https://github.com/ONEcampaign/bblocks/blob/93da6b0175c0efdf9530826b64f747d5d6085d8e/bblocks/import_tools/imf_weo.py#L142-L149

@lpicci96
Copy link

Partly yes. The full extraction pipeline is run by this function extract_data which takes a WEO version as a parameter. First it will find the href for the SDMX data. We use some webscraping instead of hardcoding the url in case of changes. If the href is found, it will make a request and store the response content as a ZipFile object. The zipfile object is then parsed using the Parser helper class to get that data as a DataFrame. extract_data is injected into the WEO class which is the main UI. When the data needs to be downloaded (either if the data has never been downloaded or the user wants to refresh) the extraction function is called.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants