Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

independent km caches for distinct cases; clear the Aesara cache if it's large #1127

Merged
merged 5 commits into from
Jul 27, 2021

Conversation

1fish2
Copy link
Contributor

@1fish2 1fish2 commented Jul 22, 2021

  • For Parca for operons #1123 : Put a checksum into the KmcountsCached cache filename so different cases get independent cache files, e.g. for mono/polycistronic operons or switching git branches. This succeeded with Parca options in a parameter optimization run. It's limited by needing to compute the checksum before reading KmcountsCached but it's still backed up by the content check.

    It could be made more picky by checksumming more inputs or less picky by rounding Kmcounts.astype(np.float16).

    This renames the cache file from fixtures/endo_km/km3.cPickle to (e.g.) cache/parca-km-1918837868.cPickle.

    The cache files will accumulate until make clean.

    Q. Does anyone prefer the fixtures/endo_km/ directory name?

  • Fix Can Aesara cache be cleared automatically? #1120 : make clean will clear the current Aesara cache if it's larger than a threshold -- 30MB seems fair since the cache is 11MB locally and 350 MB in ~tahorst on Sherlock.

    Aesara's cache management is not straightforward, partly to handle multiple processes and NFS.

Put a checksum into the `KmcountsCached` cache filename so different cases get independent cache files, e.g. when switching git branches, Parca options during parameter optimization, or mono/polycistronic operons.

This renames the cache file from `fixtures/endo_km/km3.cPickle` to `parca-km-1918837868.cPickle`, for instance.

Q. Does anyone prefer the "fixtures" directory name?

The cache files `cache/parca-km-*.cPickle` will accumulate until `make clean`.

Does this succeed in distinguishing current cases?

We could make this more sensitive by checksumming more inputs or less picky by rounding `Kmcounts.astype(np.float16)`.

See #1123
This `make clean` will clear the current Aesara cache if it's larger than a threshold. 30MB seems fair the cache is 11MB locally and 350 MB in `~tahorst` on Sherlock.

Aesara's cache management is not straightforward, partly to handle multiple processes and NFS.
@1fish2 1fish2 requested a review from tahorst July 22, 2021 22:29
Copy link
Member

@tahorst tahorst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! This should be a very helpful option!

This should also fix #996 - kmcounts accounts for the first 3 args to the loss function calculation but to be completely sure we don't need to recalculate, maybe we should add isEndoRnase and alpha to the checksum calculation and explicitly include the other inputs to the loss function (totalEndoRnaseCapacity, rnaConc and degradationRates)?

1/2 of code review feedback: add isEndoRnase and alpha to the cache checksum calculation to account for all the loss function's inputs.
This should fix #996 by putting more cross-checks in the Km counts cache, but I don't have a test case for that.

Ideally we'd factor out this part of the very long `setKmCooperativeEndoRNonLinearRNAdecay()` function and add test cases for it.

Q. Should `arrays_differ()` set the absolute and/or relative tolerances?

Adding to the cache file grew it from 37K to 110K. That should be fine but
@1fish2
Copy link
Contributor Author

1fish2 commented Jul 27, 2021

@tahorst does this #996 fix look right?

I don't have a test case for that but at least this doesn't change the parca output.

@tahorst
Copy link
Member

tahorst commented Jul 27, 2021

Nice! I think that should work. I don't know exactly when the issue in #996 was caused but since we're now checking everything it should be good to go. Thanks for tacking on the extra work on this PR!

Only thing to note is a typo: degredation_rates_s -> degradation_rates_s in a few places

... which affects the cache file content so it's esp. good to fix before merging into master.
@1fish2 1fish2 merged commit eddb632 into master Jul 27, 2021
@1fish2 1fish2 deleted the km-caches branch July 27, 2021 05:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Can Aesara cache be cleared automatically?
2 participants