-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create consensus spherized profiles #72
Comments
+1 on this. Mattias was very confused about this as well! |
Actually, I think they should go here: |
So I assume this means that the files in |
Very glad to have you both digging into this repo to uncover what is clear and what is not.
Correct, the profiles here are not spherized. We generate consensus signatures from the traditional level 4a normalized profiles. From file_bases = {
"whole_plate": {
"input_file_suffix": "_normalized.csv.gz",
"output_file_suffix": ".csv.gz",
},
"dmso": {
"input_file_suffix": "_normalized_dmso.csv.gz",
"output_file_suffix": "_dmso.csv.gz",
},
} We use these suffixes to load specific data levels.
They typically don't mean the same thing, but I am not sure what context you're referring to. In that context, it's possible we weren't entirely accurate! (plate normalization could be something like normalizing profiles only to DMSO controls per plate for a goal of aligning profiles across plates, while batch normalization might normalize multiple plates together across multiple batches for a goal of aligning profiles across batches) |
"Correct, the profiles here are not spherized. We generate consensus signatures from the traditional level 4a normalized profiles."
About the batches: I agree, they don't necessarily mean the same thing but then that means that there are two possible types of normalization and it is not clear which one is applied (again, talking from the level of someone going through the repository description without reading the actual pycytominer source code). Since there are always differences between plates it's the first thing I think about when reading about normalization. |
gotcha. Thanks! Spherizing is in fact just one normalization method, but it happens at a different level. Level 4a data (mad robustize normalization) comes from per-plate profiles. Spherized data come from all level 4a profiles. @FloHu - can you see if our discussion in #73 improves clarity on this specific point? And if not, can you describe it in the issue so that we can make all changes at once. Let's stay on track with this issue specifically being about creating consensus spherized profiles (which i agree is tightly related to #73 and can probably be fixed in the same PR!) |
@michaelbornholdt @FloHu or @shntnu - is anyone working on this currently or partially in the past? I might need this for an analysis in https://github.com/broadinstitute/lincs-profiling-comparison |
I haven't worked on it |
Completed in #76 |
I also haven't worked on this. |
Given that we create a single CSV file for spherized in this notebook, it will easiest to compute consensus in the same notebook.
The output should be stored at
lincs-cell-painting/spherized_profiles/consensus
and be named2016_04_01_a549_48hr_batch1_dmso_spherized_profiles_with_input_normalized_by_dmso_consensus_median.csv.gz
2016_04_01_a549_48hr_batch1_dmso_spherized_profiles_with_input_normalized_by_whole_plate_consensus_median.csv.gz
2016_04_01_a549_48hr_batch1_dmso_spherized_profiles_with_input_normalized_by_dmso_consensus_modz.csv.gz
2016_04_01_a549_48hr_batch1_dmso_spherized_profiles_with_input_normalized_by_whole_plate_consensus_modz.csv.gz
i.e.
median
andmodz
consensus for each of the two Batch 1 files in this directory.And same for Batch 2 (
2017_12_05_Batch2
)The text was updated successfully, but these errors were encountered: