You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've noticed that downloading the raw data and running the provided R scripts for the data sets to generate the RDS files outputs different results than what the ones on the website. Specifically, the logcounts don't match up everywhere.
I discovered this while looking through the scmap paper and going through the data sets. This problem only occurs for the data sets with CPM normalization (in Supplementary Table 2 from scmap) namely: Goolam, Li, Kolodziejczyk, Baron, Segerstolpe, Klein, Zeisel, Shekhar and Macosko.
I've gone through how the logcounts are computed in create_sce.R, but it doesn't match up with the actual results. For example, take the Li data set
Hi Pavlin, thanks for your question. All of the datasets were created from the scripts provided in the repository. In fact, this was done automatically, I wasn't involved. The only reason for this being different I see that it maybe a new version of R/Bioconductor/scater/limma package, in which calculateCPM function has been updated. calculateCPM I think is from the limma package. Last time the datasets on our website were generated on February 27, 2018, which was quite a long time ago, many things have been updated since then. So, my recommendation would be to use the scripts with the newest versions of all of the packages. You will have slightly different numbers, but I am sure the final results won't change dramatically. Hope this helps.
I am experiencing the same issue. I don't think it's the calculateCPM function (which comes from scater). This is a pretty straightforward transformation (equivalent to t(t(counts)*1e6/colSums(counts))) and I can't find any record of it changing in the scater documentation.
All of the datasets were created from the scripts provided in the repository. In fact, this was done automatically, I wasn't involved.
I'm unable to run your scripts without the changes in ForrestCKoch@09074c1. Are these the same scripts you used?
I've noticed that downloading the raw data and running the provided R scripts for the data sets to generate the RDS files outputs different results than what the ones on the website. Specifically, the
logcounts
don't match up everywhere.I discovered this while looking through the scmap paper and going through the data sets. This problem only occurs for the data sets with CPM normalization (in Supplementary Table 2 from scmap) namely: Goolam, Li, Kolodziejczyk, Baron, Segerstolpe, Klein, Zeisel, Shekhar and Macosko.
I've gone through how the
logcounts
are computed increate_sce.R
, but it doesn't match up with the actual results. For example, take the Li data setwhile loading
li.rds
available athttps://hemberg-lab.github.io/scRNA.seq.datasets/human/tissues/
givesHow were these
logcounts
computed?The text was updated successfully, but these errors were encountered: