You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When creating zarr-archives with MDP, I got suspicious that there might be something odd going on when --dask-distributed-local-core-fraction is set. It took an unreasonably long time to download and preprocess the data. Here is a MRE to describe the issue:
I prepared a few files in the zip folder to manually download era5 data and preprocess locally with MDP, compared with end-to-end preprocessing with MDP. To reproduce the results simply cd into the downloaded unzipped directory and run ./test_runtime.sh after making the script executable. archive.zip
era5_local.datastore.yaml: Configuration file for direct download and processing of ERA5 weather data from the local source.
era5.datastore.yaml: Configuration file for direct download and processing of ERA5 weather data from the remote source.
retrieve.py: Python script that handles downloading ERA5 data to local storage manually with xarray.
runtime_results.txt: Results file containing benchmark measurements comparing different processing methods and core fractions.
test_runtime.sh: Shell script that runs performance tests comparing direct download versus local processing with different CPU core utilization settings.
Here a snapshop of the results that show how different the local and remote approach are scaling.
Runtime Benchmark Results
========================
Test Results - 2025-01-15 12:34:19
Core Fraction: none
Direct download and processing: 33 seconds
Manual download: 30 seconds
Local data processing: 5 seconds
----------------------------------------
Test Results - 2025-01-15 12:36:31
Core Fraction: 0.1
Direct download and processing: 91 seconds
Manual download: 32 seconds
Local data processing: 9 seconds
----------------------------------------
Test Results - 2025-01-15 12:39:00
Core Fraction: 0.25
Direct download and processing: 102 seconds
Manual download: 32 seconds
Local data processing: 15 seconds
----------------------------------------
Test Results - 2025-01-15 12:41:27
Core Fraction: 0.5
Direct download and processing: 96 seconds
Manual download: 32 seconds
Local data processing: 19 seconds
----------------------------------------
Some information about the system I was using:
System: Linux
Kernel: 5.14.21-150500.55.65_13.0.74-cray_shasta_c_64k
Memory: 854Gi
OS: SUSE Linux Enterprise Server 15 SP5
Conda Env: mllam/* installed with pdm
The text was updated successfully, but these errors were encountered:
sadamov
changed the title
Download and/or processing time increases a lot with higher --dask-distributed-local-core-fraction
Download and/or processing time increases a lot with --dask-distributed-local-core-fraction
Jan 15, 2025
When creating zarr-archives with MDP, I got suspicious that there might be something odd going on when
--dask-distributed-local-core-fraction
is set. It took an unreasonably long time to download and preprocess the data. Here is a MRE to describe the issue:I prepared a few files in the zip folder to manually download era5 data and preprocess locally with MDP, compared with end-to-end preprocessing with MDP. To reproduce the results simply
cd
into the downloaded unzipped directory and run./test_runtime.sh
after making the script executable.archive.zip
Here a snapshop of the results that show how different the local and remote approach are scaling.
Some information about the system I was using:
The text was updated successfully, but these errors were encountered: