Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prequester #136

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
Open

Prequester #136

wants to merge 11 commits into from

Conversation

xbarra
Copy link
Collaborator

@xbarra xbarra commented Jan 22, 2025

I will actually split it as many pr as there are (thanks github)

But I promised this several weeks* ago.

victorparfima and others added 8 commits October 17, 2024 19:38
Add indicator_model.onboard_single and implement it in the hazards.

Co-authored-by: Arfima Dev <[email protected]>
Co-authored-by: Violeta Crespo Cobas <[email protected]>
Co-authored-by: Virginia Morales <[email protected]>
Signed-off-by: Víctor de Luna Pérez <[email protected]>
Co-authored-by: Arfima Dev <[email protected]>
Co-authored-by: Carlos San Millán Carpintero <[email protected]>
Co-authored-by: Violeta Crespo Cobas <[email protected]>
Signed-off-by: Virginia Morales <[email protected]>
Co-authored-by: Arfima Dev <[email protected]>
Signed-off-by: Xavier Barrachina Civera <[email protected]>
Co-authored-by: Arfima Dev <[email protected]>
Co-authored-by: Violeta Crespo Cobas <[email protected]>
Signed-off-by: Víctor de Luna Pérez <[email protected]>
Add the corresponding group_id or indicator_model_id.
Change colormaps.
Fix bounds for water_temp, wet_bulb, winter_storm and jrc_subsidence.
Change jrc_landsiled hazard_type.

Co-authored-by: Arfima Dev <[email protected]>
Co-authored-by: Violeta Crespo Cobas <[email protected]>
Signed-off-by: Virginia Morales <[email protected]>
Co-authored-by: Arfima Dev <[email protected]>
Co-authored-by: Violeta Crespo Cobas <[email protected]>
Signed-off-by: Víctor de Luna Pérez <[email protected]>
This import takes several seconds and causes other imports to take too long.

Co-authored-by: Arfima Dev <[email protected]>
Signed-off-by: Xavier Barrachina Civera <[email protected]>
Co-authored-by: Arfima Dev <[email protected]>
Signed-off-by: Carlos San Millán Carpintero <[email protected]>
@joemoorhouse
Copy link
Collaborator

joemoorhouse commented Feb 12, 2025

Hi @xbarra,

Thanks for this: certainly handy to have all changes for review even if we decide to split. I think the main point to discuss is the changes in indicator_model.py to the interface of IndicatorModel. I think the changes make sense, but we need to split these up into two classes / protocols. The classes are just doing different things.

Originally/currently the IndicatorModel is designed to create hazard indicator data sets in a generic way where there is a 'source' that provides the input XArrays Datasets and a 'target' that writes the result data set. The main entry point is run_all where we specify the source and target. It's really designed for doing modelling, e.g. calculations of, e.g. mean degree days or drought indicators where we can have different sources. Additional code is needed to define the calculation to be run (and therefore the data set to be created), as a minimum specifying the source to be used (e.g. downscaled CMIP6 of different types or something else).

IndicatorModel
__init__: the init can be used to pass options that control how the calculation is run
run_all: run the calculation for all batch items using specified source and target
batch_items: the items in the batch
inventory: creates inventory entry
run_single: convenience method to run single batch

In some cases we reused this for 'onboarding' jobs where we are doing a variation on sourcing the data, transforming and writing. It can be fit into the pattern, but is is not ideal, and I think that's why this PR added new bits into the interface:

__init__: specify the base directory (and AbstractFileSystem so we can use, e.g. S3) that contains the source files. The class creates its own directory/prefix.
prepare: create the directory of source files in the first place by downloading and unzipping as needed and performing any one-off steps
onboard_all: on-board the whole data set by reading the source files
onboard_single: not obvious to me that this is needed actually (especially I see this calling run_all)

I propose these should be split into a separate ABC or protocol (I prefer protocols!) called 'Onboarder' or similar. I see that in many cases it is only the new bits of interfaces that are being used.

I think we can also have an Onboarder that wraps up an IndicatorModel by specifying the source to be used. I see that this is what was intended in degree_days.py:

    def _onboard_single(self, target, download_dir):
        source = NexGddpCmip6()
        self.run_all(source, target)

But I think this should rather be done by composition.

If you broadly agree, I suggest we make the interface changes and take just a few examples and then merge that. We can then do others once we are happy with the design?

Thanks,
Joe

@j08lue
Copy link
Contributor

j08lue commented Feb 25, 2025

This is mostly cosmetics but touches a lot of files.

Should probably try to get this merged soon, before conflicts pile up.

Signed-off-by: Violeta Crespo Cobas <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants