-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parca for operons #1123
Comments
Thanks for all the discussions and for writing this up, Jerry! To propose my opinions on some of the questions here:
My hope is that eventually the default option would be (P), though in the early stages of this work the default would be set to (M) to minimize interference with other people's work. How quickly we can switch over to (P) would depend on how disruptive adding operon structures would be to how the model runs in general.
I'd say its uses are limited to a specific instance where we need to compare the outputs of the simulation with/without operons. I don't see this being used often once we finish up a publication on the operon integration and actually move over to default (P), except for specific debugging purposes.
Maybe. It's hard to anticipate what changes we would be bringing to the model in the future. If the necessary parameter changes for the variant requires that we go all the way back to raw_data and rerun the parca, we would need such a variant to change those parameters.
Yes, that would probably be a better way to do this, though in this specific case |
Lots of great discussion points here! Thanks for detailing everything Jerry! I agree with Gwanggyu's responses. Overall, I think improving the variant approach (adding a parca variant level and composing simulation variants) would be useful as long as it doesn't unnecessarily complicate the workflow and runscripts with a few more points about those in responses below.
I would expect a framework to use parca variants would be useful moving forward. We already have some parca options like variable elongation and capacity fitting that could be analyzed as variants. A parca variant framework could also be helpful for the work I added in #1108 if we consider different raw_data inputs and modified sim_data outputs as a variant in order to simplify the directory structure with each iteration and make it easier to apply simulation variants on top.
There would be a lot of utility in allowing variant composition in general. I think in particular, combining the condition variant with other variants would be useful right now.
I like this idea! Often times, I will be working on multiple branches that will each have to rerun the non linear optimization when I switch between them. This could be avoided if they each had their own cache files. |
Since more Kmcounts caches would help right away and it's an independent step for operons, I'll start there. Any objection to renaming it from So variant types & indexes that set different Parca options would help more generally, also composing variants. |
Put a checksum into the `KmcountsCached` cache filename so different cases get independent cache files, e.g. when switching git branches, Parca options during parameter optimization, or mono/polycistronic operons. This renames the cache file from `fixtures/endo_km/km3.cPickle` to `parca-km-1918837868.cPickle`, for instance. Q. Does anyone prefer the "fixtures" directory name? The cache files `cache/parca-km-*.cPickle` will accumulate until `make clean`. Does this succeed in distinguishing current cases? We could make this more sensitive by checksumming more inputs or less picky by rounding `Kmcounts.astype(np.float16)`. See #1123
Put a checksum into the `KmcountsCached` cache filename so different cases get independent cache files, e.g. when switching git branches, Parca options during parameter optimization, or mono/polycistronic operons. This renames the cache file from `fixtures/endo_km/km3.cPickle` to `parca-km-1918837868.cPickle`, for instance. Q. Does anyone prefer the "fixtures" directory name? The cache files `cache/parca-km-*.cPickle` will accumulate until `make clean`. Does this succeed in distinguishing current cases? We could make this more sensitive by checksumming more inputs or less picky by rounding `Kmcounts.astype(np.float16)`. See #1123
Design sketch after brainstorming with @ggsun:
composed_variant_index * 2 + PM_index
. (fw_queue, wcm, and manual runscripts require always supply a contiguous range of variant indexes. We could change that if needed.)simData.cPickle
,rawData.cPickle
,validationData.cPickle
, save-intermediates, ...) into separate kb/ subdirectories.rawData.cPickle
andvalidationData.cPickle
(or create symlinks?). Make the analysis code read those copies.apply_variant()
a little, maintains a single kb/ directory, allows sharing identical leaf nodes (ndarrays etc.) between the two sim_data trees, and could save some duplicate computation. But @ggsun points out that the two cases diverge pretty early in the Parca workflow because the operon structure affects the transcription probabilities and their modulation by transcription factors which takes up the bulk of the Parca calculations. So this sounds like more development work and less runtime parallelism.make clean
.Kmcounts.shape
andsum(abs(R_aux(KmcountsCached)))
. Would a CRC checksum be more selective thansum(abs())
?The text was updated successfully, but these errors were encountered: