-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Required Steps for Depositing Profiles #4
Comments
@gwaygenomics Can remind about the input you need on this? I'll use cytotools/annotate as a reference to provide inputs. |
Ah, that is a good reference, thanks for the pointer. I wasn't sure about the cytominer strategy of splitting core functionality from cyto-specific functionality so I put cytominer progress on hold. The primary reason for putting it on hold was so that the lincs data could be processed with a more stable (and thus more reproducible) tool. However, it sounds like the stability of cytominer (and pycytominer) is likely to occur in a longer timeframe than we need the lincs profiles. A potential intermediate solution could be to freeze a pycytominer version using conda (after confirming floating point differences) for lincs-specific processing. What do you think? |
Going forward, we will very likely produce at least two different Level 4a profiles
We will then produce corresponding 4b (normalized feature selected) versions of the two 4a profiles. We will also produce corresponding 4w (normalized and whitened) versions of the two 4a profiles. Which among these profiles are best for an application is still an open research question. But until then, we just produce them all. @gwaygenomics Does that sound reasonable? This does complicate the analysis for cell-health because you now need to decide which of the two 4a profiles you should use for predictions. For that case, I'd go with whole-plate because that makes it similar to the way you've processed the CRISPR data IIRC> |
That sounds good to me, and will very likely be the strategy we will use for all data processing using pycytominer, right? |
@shntnu and I chatted about this offline. I will summarize our decisions below:
Also, here are answers to the specific questions:
I normalize profiles by
Similar, but not exactly the same. Eventually pycytominer will be traditionally versioned on pypi and conda. Currently, pycytominer is versioned by github hash (see here). It is also worth noting that we can always reprocess the profiles again. This is the beauty of versioned data! |
@shntnu I have a couple followup questions now that I've started adding the processing code in #21 (cc @niranjchandrasekaran) Question 1 - Should we use z-score normalization or
|
Yes. Rationale: mostly empirical –
Yes, definitely ok.
Your plan sounds good. There's an incompatibility that I need to address in the handbook cytomining/profiling-handbook#53. Ugh. So glad we are thinking through provenance and reproducibility via this project! |
Closing this issue in favor of project management in https://github.com/broadinstitute/lincs-cell-painting/projects/1 |
I am working towards processing all Drug Repurposing data and adding the results in this repository. The cell health project (https://github.com/broadinstitute/cell-health) now requires that the data are uniformly processed, documented, and made available here.
I will outline below the necessary steps required to get the data and processing pipelines uploaded.
robustize_mad
normalization strategy, which will also require a decision on whole-plate or DMSO-specific normalization.4.apply
module in cell-health4.apply
modulelincs-cell-painting
profile repository a submodule of the cell-health projectThe text was updated successfully, but these errors were encountered: