-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache NUMBA kernels between CI runs #279
Conversation
This looks awesome! I will probably move this to the dev branch before merging. |
Cool. Need to prod it a bit to see if it works. |
.github/workflows/ci.yaml
Outdated
- name: Cache Numba Kernels | ||
uses: actions/cache@v3 | ||
with: | ||
key: numba-cache-${{ steps.numba-cache-key.outputs.date }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Constructing the key out of the date may be overkill. I suspect we could just use numba-cache
and it would propagate and be updated between runs.
I guess the downside is that it might accumulate a bunch of crufty old kernels. Note AFAICT there's a 10GB cache limit per repo and cache entries expire weekly so it may not be a big deal.
@bennahugo suggested we add the numba version to the cache key. I wonder if numba is clever enough to trigger recompiles on new numba versions.
The python version may also be relevant given a codex __pycache__
dir looks as follows
__init__.cpython-310.pyc
__init__.cpython-39.pyc
bda_avg.cpython-310.pyc
bda_avg.cpython-39.pyc
bda_avg.row_average-23.py310.1.nbc
bda_avg.row_average-23.py310.2.nbc
bda_avg.row_average-23.py310.nbi
bda_avg.row_average-23.py39.1.nbc
bda_avg.row_average-23.py39.2.nbc
bda_avg.row_average-23.py39.nbi
bda_avg.row_chan_average-313.py310.1.nbc
bda_avg.row_chan_average-313.py310.2.nbc
bda_avg.row_chan_average-313.py310.3.nbc
bda_avg.row_chan_average-313.py310.4.nbc
bda_avg.row_chan_average-313.py310.5.nbc
bda_avg.row_chan_average-313.py310.nbi
bda_avg.row_chan_average-313.py39.1.nbc
bda_avg.row_chan_average-313.py39.2.nbc
bda_avg.row_chan_average-313.py39.3.nbc
bda_avg.row_chan_average-313.py39.4.nbc
bda_avg.row_chan_average-313.py39.nbi
bda_mapping.bda_mapper-341.py310.1.nbc
bda_mapping.bda_mapper-341.py310.nbi
bda_mapping.bda_mapper-341.py39.1.nbc
bda_mapping.bda_mapper-341.py39.2.nbc
bda_mapping.bda_mapper-341.py39.nbi
bda_mapping.cpython-310.pyc
bda_mapping.cpython-39.pyc
So the kernel caching does not seem to be improving the test suite run time. even though kernel caches are created: https://github.com/ratt-ru/QuartiCal/actions/caches. This would also seem to suggest NUMBA_CACHE_DIR is respected. |
Is respected, or isn't? Might need to rerun the tests a few times - I have muddied the waters by merging in main. I do think that there is probably something which can be done - will take a closer look at the end of the week. |
I think it is respected -- The caches are about 11MB. Another thought occurred, the cached kernel modification times are probably earlier than the checked out python code -- this might trigger recompilation: https://numba.readthedocs.io/en/stable/developer/caching.html
Edit: Referenced the main article on caching, rather than the cuda article. |
Unfortunately it looks like it is the case that the timestamp is only the input to the cache key (at least as of Aug 22): https://numba.discourse.group/t/cache-behaviour/1520
So this approach doesn't seem viable. |
Ah unfortunate. Perhaps there will be progress upstream at some point. |
… required. (#285) * Fix version drift. * Bump to 0.2.0 * Use nearest-neighbour interpolation for points requiring extrapolation.
* Fix version drift. * Bump to 0.2.0 * Inspect envvar for scheduler address when one isn't specified. * Encode environment varraible as ascii. * Simplify.
* Fix version drift. * Bump to 0.2.0 * Initial commit of basic plotting functionality. * Change naming convention. * Improve transform argument. * Simplify transform selection. * Add rudimentary time and frequency selection. * Checkpoint ploter changes. Can now handle scans and spws, but is very slow. * More work on plotter - can now plot datasets in parallel. * Some tidying. * Slightly improve plot speed. Dominant cost is still saving the figures. * Commit some minor changes which speed up figure saving. * Lots of tiny fixes. * Tiny cosmetic changes. * Add custom tick formatter so that plots are the same size regardless. * Add matplotlib dependency. * Rework construction of plotting dictionary. Add a few utility functions which will likely be useful in other places in QC. * Rename variable to avoid confusion. * Fix bug affecting recursive grouping. * Avoid copies in grouping code. * Checkpoint work on extending functionality. * Make plotter more powerful. Add colourization option. Begin simplifying interface. * Allow user specification of colourmap. * Add plotsize parameter.
* Fix version drift. * Bump to 0.2.0 * Fix #293.
* Fix version drift. * Bump to 0.2.0 * Add optional label and single field selection to backup app * remove item instead of pop@index * do not .remove() from xds_list * Simplify using some existing functionality. --------- Co-authored-by: JSKenyon <[email protected]> Co-authored-by: landmanbester <[email protected]>
* Fix version drift. * Bump to 0.2.0 * Setting MAD threshold to zero will disable flagging on a given statistic.
* Fix version drift. * Bump to 0.2.0 * Disable flagging based on off-diagonal correlations in the mad flagger by default. This should make the mad flagger less agressive on data with unmodelled polarised emission.
* Fix version drift. * Bump to 0.2.0 * Fix a bug afecting the use of non-standard columns in data column input.
* assign to ms to avoid over-writing metadata in restore app * zip datasets in enumerate * add comment to document failure case * use backup_column_name in restore app * Apply OCD. --------- Co-authored-by: landmanbester <[email protected]> Co-authored-by: JSKenyon <[email protected]>
* Fix version drift. * Bump to 0.2.0 * Make summary correctly report FIELD_ID and SOURCE_ID.
* Drop 3.8. Commit poetry lock file. * Update stimela requirement. * Update docs. * Set min and max versions in pyproject.toml. * Remove python3.8 from test matrix.
* Cache NUMBA kernels between CI runs (#279) * Cache NUMBA kernels between CI runs * Use actions/cache@v3 * Cache per python version * runner.tmp -> runner.temp * Debugging * Fix * Run entire test suite * timestamp needed otherwise cache hit occurs and cache not updated * Fix output * Add revert_me.txt * Use nearest-neighbour interpolation in regions where extrapolation is required. (#285) * Fix version drift. * Bump to 0.2.0 * Use nearest-neighbour interpolation for points requiring extrapolation. * Utilise environment variable when dask.address is unset. (#288) * Fix version drift. * Bump to 0.2.0 * Inspect envvar for scheduler address when one isn't specified. * Encode environment varraible as ascii. * Simplify. * Add plotting functionality (#290) * Fix version drift. * Bump to 0.2.0 * Initial commit of basic plotting functionality. * Change naming convention. * Improve transform argument. * Simplify transform selection. * Add rudimentary time and frequency selection. * Checkpoint ploter changes. Can now handle scans and spws, but is very slow. * More work on plotter - can now plot datasets in parallel. * Some tidying. * Slightly improve plot speed. Dominant cost is still saving the figures. * Commit some minor changes which speed up figure saving. * Lots of tiny fixes. * Tiny cosmetic changes. * Add custom tick formatter so that plots are the same size regardless. * Add matplotlib dependency. * Rework construction of plotting dictionary. Add a few utility functions which will likely be useful in other places in QC. * Rename variable to avoid confusion. * Fix bug affecting recursive grouping. * Avoid copies in grouping code. * Checkpoint work on extending functionality. * Make plotter more powerful. Add colourization option. Begin simplifying interface. * Allow user specification of colourmap. * Add plotsize parameter. * Fix #293 - OOB access caused by `output.subtract_directions` (#294) * Fix version drift. * Bump to 0.2.0 * Fix #293. * Namedbackups (#296) * Fix version drift. * Bump to 0.2.0 * Add optional label and single field selection to backup app * remove item instead of pop@index * do not .remove() from xds_list * Simplify using some existing functionality. --------- Co-authored-by: JSKenyon <[email protected]> Co-authored-by: landmanbester <[email protected]> * Selectively disable MAD flagging criteria (#298) * Fix version drift. * Bump to 0.2.0 * Setting MAD threshold to zero will disable flagging on a given statistic. * Disable mad flagging on off-diagonals by default (#300) * Fix version drift. * Bump to 0.2.0 * Disable flagging based on off-diagonal correlations in the mad flagger by default. This should make the mad flagger less agressive on data with unmodelled polarised emission. * Fix bug affecting non-standard columns in `input_ms.data_column` (#301) * Fix version drift. * Bump to 0.2.0 * Fix a bug afecting the use of non-standard columns in data column input. * Don't allow restore app to overwrite metadata (#307) * assign to ms to avoid over-writing metadata in restore app * zip datasets in enumerate * add comment to document failure case * use backup_column_name in restore app * Apply OCD. --------- Co-authored-by: landmanbester <[email protected]> Co-authored-by: JSKenyon <[email protected]> * Fix for summary reporting SOURCE_ID as FIELD_ID (#309) * Fix version drift. * Bump to 0.2.0 * Make summary correctly report FIELD_ID and SOURCE_ID. * Fix receptor summary (#310) * Fix version drift. * Bump to 0.2.0 * Fix incorrect assumption that FEED substable will always have 2 receptors. * Fix similar problem affecting parallactic angle construction. * Update missing column selection for compatibility with upsteam changes. * Fix xarray dims (#318) * Fix version drift. * Bump to 0.2.0 * Move all usage of xds.dims[dim] to xds.sizes[dim] in preparation for change of return type in xds.dims. * Fixes for changes relating to Numba error types. (#319) * Move now-deprecated graph metrics function into the scheduler plugin code. (#320) * Make small changes to enable 3.11 compatibilty. Requires changes in stimela + a release. (#321) * Restringify keys in scheduler plugin. (#322) * Attempt very dodgy solution to caching problem. * Look for code in the correct place. * Update pyproject.toml. Add poetry.lock. Update docs. (#323) * Drop 3.8. Commit poetry lock file. * Update stimela requirement. * Update docs. * Set min and max versions in pyproject.toml. * Remove python3.8 from test matrix. * Some debugging. * Fix unsaved file. * More debugging. * Temporarily make test suite much smaller. * Fix path. * Actually fix path. * Attempt at safer caching. * More fiddling with paths. * Fix bad tabbing. * Try to find out where things are failing. * More fiddling. * More fiddling. * More fiddling. * Try restore time action. * Tidy up caching approach. Use action. Restore matrix and test everything. * Remove tmp file. * Reword CI step name. --------- Co-authored-by: JSKenyon <[email protected]> Co-authored-by: Landman Bester <[email protected]> Co-authored-by: JSKenyon <[email protected]> Co-authored-by: landmanbester <[email protected]> * Bump dask-ms and codex-africanus dependencies. Update lock. --------- Co-authored-by: Simon Perkins <[email protected]> Co-authored-by: Landman Bester <[email protected]> Co-authored-by: landmanbester <[email protected]>
Closes #278