Add optional caching to AreaDefinition.get_area_slices #553

djhoese · 2023-11-12T12:31:04Z

While profiling some Satpy computations (ABI full disk -> nearest neighbor resampling) I noticed that a decent amount of time at the beginning of processing was spent outside of dask computations and was using a single core. After some print-statement debugging I discovered it was Satpy's reduce_data functionality and the AreaDefinition.get_area_slices that was taking the most time. The majority of that time is spent in the polygon intersection operation to see if the two areas being used intersect and where.

This PR adds a decorator and a couple configuration settings for caching the results of AreaDefinition.get_area_slices to on-disk JSON files. For my testing case I was seeing about ~10-12 seconds being used for get_area_slices per area definition pair. I was using 2 resolutions of ABI data and one target area so that was ~22 seconds. With this caching enabled that time basically disappears.

This PR is only a proof of concept at this point and I will continue to improve it. I just wanted to get the initial commits up on github for others to see.

codecov · 2023-11-17T03:49:04Z

Codecov Report

Attention: 11 lines in your changes are missing coverage. Please review.

Comparison is base (6a8afc0) 94.11% compared to head (b0a2579) 94.13%.

Files	Patch %	Lines
pyresample/future/geometry/_subset.py	90.00%	8 Missing ⚠️
pyresample/_caching.py	96.10%	3 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #553      +/-   ##
==========================================
+ Coverage   94.11%   94.13%   +0.02%     
==========================================
  Files          82       84       +2     
  Lines       13078    13188     +110     
==========================================
+ Hits        12308    12415     +107     
- Misses        770      773       +3

Flag	Coverage Δ
unittests	`94.13% <94.71%> (+0.02%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

coveralls · 2023-11-17T04:02:53Z

coverage: 93.72% (+0.03%) from 93.69%
when pulling b0a2579 on djhoese:cache-area-slices
into 6a8afc0 on pytroll:main.

mraspaud

Nice with the refactoring! Just a couple of comments, but looks good otherwise.

mraspaud · 2023-11-17T06:34:52Z

docs/source/howtos/configuration.rst

+small for many cached results.
+
+When setting this as an environment variable, this should be set with the
+string equivalent of the Python boolean values ``="True"`` or ``="False"``.


I think we need to add a sentence or two on what kind of improvement we can expect on performance and in which situation

Oh your comment is reminding me: Do you have a go-to benchmark for the gradient search that you can tell me or run yourself with this PR and with this caching enabled? I think the gradient search is the only other part of pyresample that uses the area slices directly and I don't want to make it unnecessarily slow. That said, if it caches the slices for an area -> area resampling (the only thing gradient search supports right now) then it'd probably make all future operations fast. Especially since iirc satpy doesn't do reduce_data for gradient search.

Ok I added more information, but I got a little wordy so let me know what you think.

@pnuu is using gradient search alot, maybe he can help here?

Looks good with the explanation.

I don't have my usual test script at hand, but can test this on monday. If I remember....

docs/source/howtos/configuration.rst

pyresample/_caching.py

pnuu · 2023-11-20T06:39:33Z

Timings using Satpy main and this PR. Using gradient_search to load, resample and save 10 composites (some Day/Night, some normal) for FCI L1c to 3868 x 3918 EPSG:3035 area.

reduce_data=True:

no caching 73.1 s (Dask graph A)
caching, first run: 72.3 s
caching, 2nd run: 57.7 s (B)

As a comparison, with reduce_data=False it takes ~59.7 s (C) to run the same script.

Dask graphs:

A

B

C

mraspaud · 2023-11-20T07:36:02Z

Thanks @pnuu , this looks good!
So I'm merging this.

djhoese · 2023-11-20T15:26:03Z

Thanks for testing @pnuu!

djhoese added 2 commits November 11, 2023 20:09

Add cache directory and cache geometry slices configuration options

8c15774

Add initial proof of concept area slice caching

6fd85a1

djhoese added enhancement performance improves speed or decreases memory consumption, but does not otherwise change functionality labels Nov 12, 2023

djhoese self-assigned this Nov 12, 2023

djhoese changed the title ~~Add cache directory and cache geometry slices configuration options~~ Add optional caching to AreaDefinition.get_area_slices Nov 12, 2023

djhoese added 5 commits November 12, 2023 21:25

Refactor JSON caching to resemble satpy caching decorators

94ca7a8

Drop musllinux builds (pyproj isn't providing them)

2c80543

Fix area slice tests after function refactor

e9f7fa0

Simplify area slice tests with parametrize

2f9df8b

Add more area slice caching tests

1de5f82

djhoese requested a review from mraspaud November 17, 2023 03:41

djhoese marked this pull request as ready for review November 17, 2023 03:41

Fix missing future annotations import

dc68bfe

mraspaud approved these changes Nov 17, 2023

View reviewed changes

djhoese added 2 commits November 17, 2023 09:34

Rename cache_geom_slices to cache_geometry_slices

a224317

Add more information to slice caching config documentation

b0a2579

mraspaud merged commit d8f45cd into pytroll:main Nov 20, 2023
21 checks passed

djhoese deleted the cache-area-slices branch November 20, 2023 15:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add optional caching to AreaDefinition.get_area_slices #553

Add optional caching to AreaDefinition.get_area_slices #553

djhoese commented Nov 12, 2023

codecov bot commented Nov 17, 2023 •

edited

Loading

coveralls commented Nov 17, 2023 •

edited

Loading

mraspaud left a comment

mraspaud Nov 17, 2023

djhoese Nov 17, 2023

djhoese Nov 17, 2023

mraspaud Nov 17, 2023

mraspaud Nov 17, 2023

pnuu Nov 17, 2023

pnuu commented Nov 20, 2023 •

edited

Loading

mraspaud commented Nov 20, 2023

djhoese commented Nov 20, 2023

Add optional caching to AreaDefinition.get_area_slices #553

Add optional caching to AreaDefinition.get_area_slices #553

Conversation

djhoese commented Nov 12, 2023

codecov bot commented Nov 17, 2023 • edited Loading

Codecov Report

coveralls commented Nov 17, 2023 • edited Loading

mraspaud left a comment

Choose a reason for hiding this comment

mraspaud Nov 17, 2023

Choose a reason for hiding this comment

djhoese Nov 17, 2023

Choose a reason for hiding this comment

djhoese Nov 17, 2023

Choose a reason for hiding this comment

mraspaud Nov 17, 2023

Choose a reason for hiding this comment

mraspaud Nov 17, 2023

Choose a reason for hiding this comment

pnuu Nov 17, 2023

Choose a reason for hiding this comment

pnuu commented Nov 20, 2023 • edited Loading

mraspaud commented Nov 20, 2023

djhoese commented Nov 20, 2023

codecov bot commented Nov 17, 2023 •

edited

Loading

coveralls commented Nov 17, 2023 •

edited

Loading

pnuu commented Nov 20, 2023 •

edited

Loading