Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Meta] ML performance and optimisation #69

Open
4 tasks
ma595 opened this issue Oct 15, 2024 · 6 comments
Open
4 tasks

[Meta] ML performance and optimisation #69

ma595 opened this issue Oct 15, 2024 · 6 comments
Assignees
Labels

Comments

@ma595
Copy link
Collaborator

ma595 commented Oct 15, 2024

Explore the following issues. Either Daniel/Mandresy or ICCS will do the following:

Optimise training data / ML bias (or better RERUN time)

  • Remove LAI and NPP from predictors (from pre-run)
    • If re-run extension less than duration of pre-run -> WIN
  • Optimising clustering by selecting pixels that have more PFTs.
  • Add more variables that might not be fully equilibrated to the training.
  • Vary number of Ncc:
    • Model dependent duration of site-runs ( duration known )
    • Total time of site-runs; Ncc x duration
  • How to optimise tradeoff rerun time vs number of pixel for site runs

The issues addressing the above are as follows:

@ma595 ma595 added this to the ML optimisation milestone Oct 15, 2024
@ma595
Copy link
Collaborator Author

ma595 commented Oct 15, 2024

Break this down into subtasks and prioritise.

Without LAI and NPP performance was found to be poor. But they are expensive to obtain.

  • provide an option to remove these from training.

@dsgoll123
Copy link
Collaborator

dsgoll123 commented Oct 15, 2024

Here are the sub task:
We need to jointly see which ones we can place in the project time window.

(1) Without LAI and NPP performance was found to be poor. But they are expensive to obtain.

  • provide an option to remove these from training. [P1, but easy to do I assume]

(2) Optimising clustering by selecting pixels that have more PFTs. [P0]

  • pixel selection is done for each PFT separately, while many pixels contain multiple PFTs
  • adjust the clustering to select pixels which can provide information for multipe PFTs in order to reduce the total number of pixels
  • the cover fraction of the PFT should be at least 5 (10)% for a given pixel to be considered here
    Untitled document.pdf

(3) If we find LAI and NPP are critical, we could provide additional variables from the same file as predictors ( n = n of response variables ) [P2]

  • the associated file contains state variables of ORCHIDEE from early in the spinup (e.g. year 100 vs 3000k when equilbirum is reached)

(4) a routine which allows to scan for optimal Ncc [P0]

  • systematically test how Ncc affects stats of ML predictions
  • Ncc scale linear with comp demand of ORCHIDEE

@ma595
Copy link
Collaborator Author

ma595 commented Oct 15, 2024

Thanks @dsgoll123, I realise that (3) is not high priority but would you be able to provide these additional variables as soon as you can so that we are ready to use them when we have some time? I'll create individual issues so we can put them on the roadmap.

@ma595
Copy link
Collaborator Author

ma595 commented Oct 16, 2024

@ma595 asked why Ncc scales linearly with the computational demand of ORCHIDEE. If each pixel can be treated independently, why is this not embarrassingly parallel?

@ma595 ma595 changed the title ML performance and optimisation [Meta] ML performance and optimisation Oct 16, 2024
@ma595 ma595 added the meta label Oct 16, 2024
@dsgoll123
Copy link
Collaborator

Thanks @dsgoll123, I realise that (3) is not high priority but would you be able to provide these additional variables as soon as you can so that we are ready to use them when we have some time? I'll create individual issues so we can put them on the roadmap.

Yes, we can use the other variables in that file for a test. It is a mixed bag which should contain useful variables but also some without much useful information. Can you work with this or do you want me to pre-select variables?

@dsgoll123
Copy link
Collaborator

@ma595 asked why Ncc scales linearly with the computational demand of ORCHIDEE. If each pixel can be treated independently, why is this not embarrassingly parallel?

we are working on it to get there & should be there soon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants