Cache NUMBA kernels between CI runs #279

sjperkins · 2023-06-15T13:48:21Z

Closes #278

JSKenyon · 2023-06-15T14:26:32Z

This looks awesome! I will probably move this to the dev branch before merging.

sjperkins · 2023-06-15T14:31:36Z

This looks awesome! I will probably move this to the dev branch before merging.

Cool. Need to prod it a bit to see if it works.

sjperkins · 2023-06-15T14:39:28Z

.github/workflows/ci.yaml

+      - name: Cache Numba Kernels
+        uses: actions/cache@v3
+        with:
+          key: numba-cache-${{ steps.numba-cache-key.outputs.date }}


Constructing the key out of the date may be overkill. I suspect we could just use numba-cache and it would propagate and be updated between runs.

I guess the downside is that it might accumulate a bunch of crufty old kernels. Note AFAICT there's a 10GB cache limit per repo and cache entries expire weekly so it may not be a big deal.

@bennahugo suggested we add the numba version to the cache key. I wonder if numba is clever enough to trigger recompiles on new numba versions.

The python version may also be relevant given a codex __pycache__ dir looks as follows

__init__.cpython-310.pyc __init__.cpython-39.pyc bda_avg.cpython-310.pyc bda_avg.cpython-39.pyc bda_avg.row_average-23.py310.1.nbc bda_avg.row_average-23.py310.2.nbc bda_avg.row_average-23.py310.nbi bda_avg.row_average-23.py39.1.nbc bda_avg.row_average-23.py39.2.nbc bda_avg.row_average-23.py39.nbi bda_avg.row_chan_average-313.py310.1.nbc bda_avg.row_chan_average-313.py310.2.nbc bda_avg.row_chan_average-313.py310.3.nbc bda_avg.row_chan_average-313.py310.4.nbc bda_avg.row_chan_average-313.py310.5.nbc bda_avg.row_chan_average-313.py310.nbi bda_avg.row_chan_average-313.py39.1.nbc bda_avg.row_chan_average-313.py39.2.nbc bda_avg.row_chan_average-313.py39.3.nbc bda_avg.row_chan_average-313.py39.4.nbc bda_avg.row_chan_average-313.py39.nbi bda_mapping.bda_mapper-341.py310.1.nbc bda_mapping.bda_mapper-341.py310.nbi bda_mapping.bda_mapper-341.py39.1.nbc bda_mapping.bda_mapper-341.py39.2.nbc bda_mapping.bda_mapper-341.py39.nbi bda_mapping.cpython-310.pyc bda_mapping.cpython-39.pyc

sjperkins · 2023-06-15T14:41:17Z

Hmmm, I was fairly sure I mkdir'd that

sjperkins · 2023-06-20T06:56:22Z

So the kernel caching does not seem to be improving the test suite run time. even though kernel caches are created: https://github.com/ratt-ru/QuartiCal/actions/caches. This would also seem to suggest NUMBA_CACHE_DIR is respected.

JSKenyon · 2023-06-20T07:13:51Z

So the kernel caching does not seem to be improving the test suite run time. even though kernel caches are created: https://github.com/ratt-ru/QuartiCal/actions/caches. This would also seem to suggest NUMBA_CACHE_DIR is respected.

Is respected, or isn't? Might need to rerun the tests a few times - I have muddied the waters by merging in main. I do think that there is probably something which can be done - will take a closer look at the end of the week.

sjperkins · 2023-06-20T07:22:22Z

So the kernel caching does not seem to be improving the test suite run time. even though kernel caches are created: https://github.com/ratt-ru/QuartiCal/actions/caches. This would also seem to suggest NUMBA_CACHE_DIR is respected.

Is respected, or isn't?

I think it is respected -- The caches are about 11MB.

Another thought occurred, the cached kernel modification times are probably earlier than the checked out python code -- this might trigger recompilation: https://numba.readthedocs.io/en/stable/developer/caching.html

The cache is invalidated when the corresponding source file is modified.

Edit: Referenced the main article on caching, rather than the cuda article.

sjperkins · 2023-06-20T07:55:12Z

Another thought occurred, the cached kernel modification times are probably earlier than the checked out python code -- this might trigger recompilation: https://numba.readthedocs.io/en/stable/developer/caching.html

The cache is invalidated when the corresponding source file is modified.

Unfortunately it looks like it is the case that the timestamp is only the input to the cache key (at least as of Aug 22): https://numba.discourse.group/t/cache-behaviour/1520

I would like to propose to move away from invalidating the cache index based on the timestamp of the file, and use only the code+closure signature of the function itself. Would anyone see a problem with that? When I say code+closure signature I mean the exact same information that is used to select the overload from within the cache.

So this approach doesn't seem viable.

JSKenyon · 2023-06-20T08:04:39Z

So this approach doesn't seem viable.

Ah unfortunate. Perhaps there will be progress upstream at some point.

… required. (#285) * Fix version drift. * Bump to 0.2.0 * Use nearest-neighbour interpolation for points requiring extrapolation.

* Fix version drift. * Bump to 0.2.0 * Inspect envvar for scheduler address when one isn't specified. * Encode environment varraible as ascii. * Simplify.

* Fix version drift. * Bump to 0.2.0 * Initial commit of basic plotting functionality. * Change naming convention. * Improve transform argument. * Simplify transform selection. * Add rudimentary time and frequency selection. * Checkpoint ploter changes. Can now handle scans and spws, but is very slow. * More work on plotter - can now plot datasets in parallel. * Some tidying. * Slightly improve plot speed. Dominant cost is still saving the figures. * Commit some minor changes which speed up figure saving. * Lots of tiny fixes. * Tiny cosmetic changes. * Add custom tick formatter so that plots are the same size regardless. * Add matplotlib dependency. * Rework construction of plotting dictionary. Add a few utility functions which will likely be useful in other places in QC. * Rename variable to avoid confusion. * Fix bug affecting recursive grouping. * Avoid copies in grouping code. * Checkpoint work on extending functionality. * Make plotter more powerful. Add colourization option. Begin simplifying interface. * Allow user specification of colourmap. * Add plotsize parameter.

* Fix version drift. * Bump to 0.2.0 * Fix #293.

* Fix version drift. * Bump to 0.2.0 * Add optional label and single field selection to backup app * remove item instead of pop@index * do not .remove() from xds_list * Simplify using some existing functionality. --------- Co-authored-by: JSKenyon <[email protected]> Co-authored-by: landmanbester <[email protected]>

* Fix version drift. * Bump to 0.2.0 * Setting MAD threshold to zero will disable flagging on a given statistic.

* Fix version drift. * Bump to 0.2.0 * Disable flagging based on off-diagonal correlations in the mad flagger by default. This should make the mad flagger less agressive on data with unmodelled polarised emission.

* Fix version drift. * Bump to 0.2.0 * Fix a bug afecting the use of non-standard columns in data column input.

* assign to ms to avoid over-writing metadata in restore app * zip datasets in enumerate * add comment to document failure case * use backup_column_name in restore app * Apply OCD. --------- Co-authored-by: landmanbester <[email protected]> Co-authored-by: JSKenyon <[email protected]>

* Fix version drift. * Bump to 0.2.0 * Make summary correctly report FIELD_ID and SOURCE_ID.

* Drop 3.8. Commit poetry lock file. * Update stimela requirement. * Update docs. * Set min and max versions in pyproject.toml. * Remove python3.8 from test matrix.

…ing.

* Cache NUMBA kernels between CI runs (#279) * Cache NUMBA kernels between CI runs * Use actions/cache@v3 * Cache per python version * runner.tmp -> runner.temp * Debugging * Fix * Run entire test suite * timestamp needed otherwise cache hit occurs and cache not updated * Fix output * Add revert_me.txt * Use nearest-neighbour interpolation in regions where extrapolation is required. (#285) * Fix version drift. * Bump to 0.2.0 * Use nearest-neighbour interpolation for points requiring extrapolation. * Utilise environment variable when dask.address is unset. (#288) * Fix version drift. * Bump to 0.2.0 * Inspect envvar for scheduler address when one isn't specified. * Encode environment varraible as ascii. * Simplify. * Add plotting functionality (#290) * Fix version drift. * Bump to 0.2.0 * Initial commit of basic plotting functionality. * Change naming convention. * Improve transform argument. * Simplify transform selection. * Add rudimentary time and frequency selection. * Checkpoint ploter changes. Can now handle scans and spws, but is very slow. * More work on plotter - can now plot datasets in parallel. * Some tidying. * Slightly improve plot speed. Dominant cost is still saving the figures. * Commit some minor changes which speed up figure saving. * Lots of tiny fixes. * Tiny cosmetic changes. * Add custom tick formatter so that plots are the same size regardless. * Add matplotlib dependency. * Rework construction of plotting dictionary. Add a few utility functions which will likely be useful in other places in QC. * Rename variable to avoid confusion. * Fix bug affecting recursive grouping. * Avoid copies in grouping code. * Checkpoint work on extending functionality. * Make plotter more powerful. Add colourization option. Begin simplifying interface. * Allow user specification of colourmap. * Add plotsize parameter. * Fix #293 - OOB access caused by `output.subtract_directions` (#294) * Fix version drift. * Bump to 0.2.0 * Fix #293. * Namedbackups (#296) * Fix version drift. * Bump to 0.2.0 * Add optional label and single field selection to backup app * remove item instead of pop@index * do not .remove() from xds_list * Simplify using some existing functionality. --------- Co-authored-by: JSKenyon <[email protected]> Co-authored-by: landmanbester <[email protected]> * Selectively disable MAD flagging criteria (#298) * Fix version drift. * Bump to 0.2.0 * Setting MAD threshold to zero will disable flagging on a given statistic. * Disable mad flagging on off-diagonals by default (#300) * Fix version drift. * Bump to 0.2.0 * Disable flagging based on off-diagonal correlations in the mad flagger by default. This should make the mad flagger less agressive on data with unmodelled polarised emission. * Fix bug affecting non-standard columns in `input_ms.data_column` (#301) * Fix version drift. * Bump to 0.2.0 * Fix a bug afecting the use of non-standard columns in data column input. * Don't allow restore app to overwrite metadata (#307) * assign to ms to avoid over-writing metadata in restore app * zip datasets in enumerate * add comment to document failure case * use backup_column_name in restore app * Apply OCD. --------- Co-authored-by: landmanbester <[email protected]> Co-authored-by: JSKenyon <[email protected]> * Fix for summary reporting SOURCE_ID as FIELD_ID (#309) * Fix version drift. * Bump to 0.2.0 * Make summary correctly report FIELD_ID and SOURCE_ID. * Fix receptor summary (#310) * Fix version drift. * Bump to 0.2.0 * Fix incorrect assumption that FEED substable will always have 2 receptors. * Fix similar problem affecting parallactic angle construction. * Update missing column selection for compatibility with upsteam changes. * Fix xarray dims (#318) * Fix version drift. * Bump to 0.2.0 * Move all usage of xds.dims[dim] to xds.sizes[dim] in preparation for change of return type in xds.dims. * Fixes for changes relating to Numba error types. (#319) * Move now-deprecated graph metrics function into the scheduler plugin code. (#320) * Make small changes to enable 3.11 compatibilty. Requires changes in stimela + a release. (#321) * Restringify keys in scheduler plugin. (#322) * Attempt very dodgy solution to caching problem. * Look for code in the correct place. * Update pyproject.toml. Add poetry.lock. Update docs. (#323) * Drop 3.8. Commit poetry lock file. * Update stimela requirement. * Update docs. * Set min and max versions in pyproject.toml. * Remove python3.8 from test matrix. * Some debugging. * Fix unsaved file. * More debugging. * Temporarily make test suite much smaller. * Fix path. * Actually fix path. * Attempt at safer caching. * More fiddling with paths. * Fix bad tabbing. * Try to find out where things are failing. * More fiddling. * More fiddling. * More fiddling. * Try restore time action. * Tidy up caching approach. Use action. Restore matrix and test everything. * Remove tmp file. * Reword CI step name. --------- Co-authored-by: JSKenyon <[email protected]> Co-authored-by: Landman Bester <[email protected]> Co-authored-by: JSKenyon <[email protected]> Co-authored-by: landmanbester <[email protected]> * Bump dask-ms and codex-africanus dependencies. Update lock. --------- Co-authored-by: Simon Perkins <[email protected]> Co-authored-by: Landman Bester <[email protected]> Co-authored-by: landmanbester <[email protected]>

sjperkins added 2 commits June 15, 2023 15:47

Cache NUMBA kernels between CI runs

eb8c568

Use actions/cache@v3

dd6bbeb

sjperkins commented Jun 15, 2023

View reviewed changes

sjperkins and others added 9 commits June 15, 2023 16:57

Cache per python version

d4cba34

runner.tmp -> runner.temp

3ab69ad

Debugging

353ae63

Fix

a60360d

Run entire test suite

707b176

timestamp needed otherwise cache hit occurs and cache not updated

5c3aed0

Fix output

edc58e8

Add revert_me.txt

7ef4776

Merge branch 'main' into cache-numba-kernels

55fd7ba

JSKenyon and others added 10 commits January 26, 2024 10:58

Use nearest-neighbour interpolation in regions where extrapolation is…

d9f69ec

… required. (#285) * Fix version drift. * Bump to 0.2.0 * Use nearest-neighbour interpolation for points requiring extrapolation.

Utilise environment variable when dask.address is unset. (#288)

0c58b33

* Fix version drift. * Bump to 0.2.0 * Inspect envvar for scheduler address when one isn't specified. * Encode environment varraible as ascii. * Simplify.

Fix #293 - OOB access caused by output.subtract_directions (#294)

7cd19eb

* Fix version drift. * Bump to 0.2.0 * Fix #293.

Selectively disable MAD flagging criteria (#298)

6bed6f2

* Fix version drift. * Bump to 0.2.0 * Setting MAD threshold to zero will disable flagging on a given statistic.

Disable mad flagging on off-diagonals by default (#300)

7630360

* Fix version drift. * Bump to 0.2.0 * Disable flagging based on off-diagonal correlations in the mad flagger by default. This should make the mad flagger less agressive on data with unmodelled polarised emission.

Fix bug affecting non-standard columns in input_ms.data_column (#301)

eaa7515

* Fix version drift. * Bump to 0.2.0 * Fix a bug afecting the use of non-standard columns in data column input.

Fix for summary reporting SOURCE_ID as FIELD_ID (#309)

5478340

* Fix version drift. * Bump to 0.2.0 * Make summary correctly report FIELD_ID and SOURCE_ID.

JSKenyon changed the base branch from main to v0.2.1-dev January 29, 2024 12:24

JSKenyon and others added 11 commits January 29, 2024 14:30

Merge branch 'v0.2.1-dev' into cache-numba-kernels

8008ab3

Attempt very dodgy solution to caching problem.

2522829

Look for code in the correct place.

6e13b25

Update pyproject.toml. Add poetry.lock. Update docs. (#323)

3eb7c5d

* Drop 3.8. Commit poetry lock file. * Update stimela requirement. * Update docs. * Set min and max versions in pyproject.toml. * Remove python3.8 from test matrix.

Some debugging.

79158e2

Merge v0.2.1-dev.

92da770

Fix unsaved file.

e83fd40

More debugging.

6348e00

Temporarily make test suite much smaller.

92326a7

Fix path.

d1c67a9

Actually fix path.

16bdee5

Base automatically changed from v0.2.1-dev to main January 30, 2024 06:46

Attempt at safer caching.

c2a3b29

JSKenyon changed the base branch from main to v0.2.2-dev January 31, 2024 09:24

JSKenyon added 11 commits January 31, 2024 11:28

Merge in v0.2.2-dev

0e97a1c

More fiddling with paths.

32f5950

Fix bad tabbing.

3db9902

Try to find out where things are failing.

61f61b5

More fiddling.

4d1dca0

More fiddling.

3d21087

More fiddling.

6318f47

Try restore time action.

f37fa83

Tidy up caching approach. Use action. Restore matrix and test everyth…

d270cff

…ing.

Remove tmp file.

1da4879

Reword CI step name.

65c5efa

JSKenyon merged commit 572cdfb into v0.2.2-dev Jan 31, 2024
6 checks passed

JSKenyon deleted the cache-numba-kernels branch January 31, 2024 11:51

sjperkins mentioned this pull request Jan 31, 2024

Cache numba kernels ratt-ru/codex-africanus#294

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache NUMBA kernels between CI runs #279

Cache NUMBA kernels between CI runs #279

sjperkins commented Jun 15, 2023

JSKenyon commented Jun 15, 2023

sjperkins commented Jun 15, 2023

sjperkins Jun 15, 2023

sjperkins commented Jun 15, 2023

sjperkins commented Jun 20, 2023

JSKenyon commented Jun 20, 2023

sjperkins commented Jun 20, 2023 •

edited

Loading

sjperkins commented Jun 20, 2023

JSKenyon commented Jun 20, 2023

Cache NUMBA kernels between CI runs #279

Cache NUMBA kernels between CI runs #279

Conversation

sjperkins commented Jun 15, 2023

JSKenyon commented Jun 15, 2023

sjperkins commented Jun 15, 2023

sjperkins Jun 15, 2023

Choose a reason for hiding this comment

sjperkins commented Jun 15, 2023

sjperkins commented Jun 20, 2023

JSKenyon commented Jun 20, 2023

sjperkins commented Jun 20, 2023 • edited Loading

sjperkins commented Jun 20, 2023

JSKenyon commented Jun 20, 2023

sjperkins commented Jun 20, 2023 •

edited

Loading