Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create consensus spherized profiles #72

Closed
shntnu opened this issue Jul 7, 2021 · 11 comments · Fixed by #76
Closed

Create consensus spherized profiles #72

shntnu opened this issue Jul 7, 2021 · 11 comments · Fixed by #76

Comments

@shntnu
Copy link
Collaborator

shntnu commented Jul 7, 2021

Given that we create a single CSV file for spherized in this notebook, it will easiest to compute consensus in the same notebook.

The output should be stored at lincs-cell-painting/spherized_profiles/consensus and be named

  • 2016_04_01_a549_48hr_batch1_dmso_spherized_profiles_with_input_normalized_by_dmso_consensus_median.csv.gz
  • 2016_04_01_a549_48hr_batch1_dmso_spherized_profiles_with_input_normalized_by_whole_plate_consensus_median.csv.gz
  • 2016_04_01_a549_48hr_batch1_dmso_spherized_profiles_with_input_normalized_by_dmso_consensus_modz.csv.gz
  • 2016_04_01_a549_48hr_batch1_dmso_spherized_profiles_with_input_normalized_by_whole_plate_consensus_modz.csv.gz

i.e. median and modz consensus for each of the two Batch 1 files in this directory.

And same for Batch 2 (2017_12_05_Batch2)

@michaelbornholdt
Copy link
Contributor

+1 on this. Mattias was very confused about this as well!

@michaelbornholdt
Copy link
Contributor

Actually, I think they should go here:
lincs-cell-painting/consensus

@FloHu
Copy link

FloHu commented Jul 15, 2021

So I assume this means that the files in lincs-cell-painting/consensus are not spherized. Then which normalization strategy was applied there? This is not clear from the respective notebook (consensus/build-consensus-signatures.ipynb). Also, do "plate normalization" and "batch normalization" refer to the same procedure (as I would think)?

@gwaybio
Copy link
Member

gwaybio commented Jul 15, 2021

Very glad to have you both digging into this repo to uncover what is clear and what is not.

So I assume this means that the files in lincs-cell-painting/consensus are not spherized. Then which normalization strategy was applied there?

Correct, the profiles here are not spherized. We generate consensus signatures from the traditional level 4a normalized profiles.

From build-consensus-signature.ipynb cell 5.

file_bases = {
    "whole_plate": {
        "input_file_suffix": "_normalized.csv.gz",
        "output_file_suffix": ".csv.gz",
    },
    "dmso": {
        "input_file_suffix": "_normalized_dmso.csv.gz",
        "output_file_suffix": "_dmso.csv.gz",
    },
}

We use these suffixes to load specific data levels.

Also, do "plate normalization" and "batch normalization" refer to the same procedure (as I would think)?

They typically don't mean the same thing, but I am not sure what context you're referring to. In that context, it's possible we weren't entirely accurate!

(plate normalization could be something like normalizing profiles only to DMSO controls per plate for a goal of aligning profiles across plates, while batch normalization might normalize multiple plates together across multiple batches for a goal of aligning profiles across batches)

@FloHu
Copy link

FloHu commented Jul 15, 2021

"Correct, the profiles here are not spherized. We generate consensus signatures from the traditional level 4a normalized profiles."

  • then my confusion stems from the fact that I thought sperizing is one normalization method just as the Cytominer figure suggests.

About the batches: I agree, they don't necessarily mean the same thing but then that means that there are two possible types of normalization and it is not clear which one is applied (again, talking from the level of someone going through the repository description without reading the actual pycytominer source code). Since there are always differences between plates it's the first thing I think about when reading about normalization.

@gwaybio
Copy link
Member

gwaybio commented Jul 15, 2021

gotcha. Thanks!

Spherizing is in fact just one normalization method, but it happens at a different level. Level 4a data (mad robustize normalization) comes from per-plate profiles. Spherized data come from all level 4a profiles.

@FloHu - can you see if our discussion in #73 improves clarity on this specific point? And if not, can you describe it in the issue so that we can make all changes at once.

Let's stay on track with this issue specifically being about creating consensus spherized profiles (which i agree is tightly related to #73 and can probably be fixed in the same PR!)

@gwaybio
Copy link
Member

gwaybio commented Aug 11, 2021

@michaelbornholdt @FloHu or @shntnu - is anyone working on this currently or partially in the past? I might need this for an analysis in https://github.com/broadinstitute/lincs-profiling-comparison

@shntnu
Copy link
Collaborator Author

shntnu commented Aug 11, 2021

I haven't worked on it

@gwaybio
Copy link
Member

gwaybio commented Aug 11, 2021

Completed in #76

@michaelbornholdt
Copy link
Contributor

I also haven't worked on this.
Can't say I know where they are now. I assume they are 'hidden' with lfs in the consensus folder?

@gwaybio
Copy link
Member

gwaybio commented Aug 11, 2021

here you go: https://github.com/broadinstitute/lincs-cell-painting/tree/e9737c3e4e4443eb03c2c278a145f12efe255756/spherized_profiles/consensus

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants