error on selectFeatures #31

kehuangke · 2021-08-11T06:35:10Z

Hello,

I use scmap to annotate cell types based on a reference annotation dataset. The reference annotation dataset was downloaded from celldex. However, I encounter an error when I choose HumanPrimaryCellAtlasData. The function of selectFeatures can run properly when I choose DatabaseImmuneCellExpressionData. Unlucky, I want to use HumanPrimaryCellAtlasData for future analysis.

There is the code that does not work properly(HumanPrimaryCellAtlasData)：

ref<-celldex::HumanPrimaryCellAtlasData()
colData(ref)$cell_type1 <- colData(ref)$label.fine
rowData(ref)$feature_symbol <- rownames(ref)
ref_sce <- SingleCellExperiment::SingleCellExperiment(assays=list(logcounts=Matrix::Matrix(assays(ref)$logcounts)),
colData=colData(ref), rowData=rowData(ref))
ref_sce <- scmap::selectFeatures(ref_sce,suppress_plot=TRUE）

The error is

Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
0 (non-NA) cases
In addition: Warning message:
In linearModel(object, n_features) :
Your object does not contain counts() slot. Dropouts were calculated using logcounts() slot...

The same code can be run rightly if I use DatabaseImmuneCellExpressionData

ref <- celldex::DatabaseImmuneCellExpressionData()
colData(ref)$cell_type1 <- colData(ref)$label.fine
rowData(ref)$feature_symbol <- rownames(ref)
ref_sce <- SingleCellExperiment::SingleCellExperiment(assays=list(logcounts=Matrix::Matrix(assays(ref)$logcounts)), colData=colData(ref), rowData=rowData(ref))
ref_sce <- scmap::selectFeatures(ref_sce,suppress_plot=TRUE)

I have changed logcounts to counts and expanded value by power 10, but that did not work.

I speculate the error is due to the value of logcounts.
This is the logcounts value on HumanPrimaryCellAtlasData which report an error

This is the logcounts value on DatabaseImmuneCellExpressionData which can run properly.

Could you please help me solve this problem?

Thanks

rersister · 2021-08-24T05:29:14Z

I also saw the same question and got the same error, but I don't know how to solve. Could you help us to solve this problem?

FionaMoon · 2022-03-25T06:36:12Z

I convert logcounts into counts, but this error still exists.

a <- assay(sce, "logcounts")
b <- expm1(a)
assays(sce)$counts <- b

pcantalupo · 2022-07-28T19:34:29Z

I'm getting the same error when trying to use celldex's MouseRNAseq reference with scmap. Is there any update on this issue?

Cristinex · 2022-11-13T04:26:15Z

Same error. Any update?

jordan841220 · 2022-11-17T09:47:39Z

@kehuangke @Cristinex @pcantalupo
After looking into the code of selectFeature(), I think linearModel() function might be the reason of this.
In linearModel(), the dropout rate of a feature was defined as how many cells express 0 log_count in such feature.
However, this is not gonna work if there are no 0 log_count at all in the reference, leading to 0 dropout rates for all of the features.

linearModel <- function(object, n_features) {
    log_count <- as.matrix(logcounts(object))
    cols <- ncol(log_count)
    if (!"counts" %in% assayNames(object)) {
        warning("Your object does not contain counts() slot. Dropouts were calculated using logcounts() slot...")
        dropouts <- rowSums(log_count == 0)/cols * 100    
    } else {
        count <- as.matrix(counts(object))
        dropouts <- rowSums(count == 0)/cols * 100
    }
    # do not consider genes with 0 and 100 dropout rate
    dropouts_filter <- dropouts != 0 & dropouts != 100         
    dropouts_filter <- which(dropouts_filter)
    dropouts <- log2(dropouts[dropouts_filter])
    expression <- rowSums(log_count[dropouts_filter, ])/cols
    
    fit <- lm(dropouts ~ expression)

And if dropout rates are all 0%, we can not get fit <- lm(dropouts ~ expression) in linearModel() to work, resulting from the fact that we will filter out genes with 0 dropout rate. So, no features will be considered at the end.

In conclusion, copy the function from the source code and modified this line
dropouts <- rowSums(log_count == 0)/cols * 100
to something like
dropouts <- rowSums(log_count <= 3)/cols * 100
would work. (The value defining the dropout cutoff depends on your reference.)

or
One can modified this line to get proper number of filtered features:
dropouts_filter <- dropouts != 0 & dropouts != 100

Wish the authors can explain more on this issue.

rersister mentioned this issue Aug 24, 2021

error on selectFeatures BaderLab/CellAnnotationTutorial#2

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

error on selectFeatures #31

error on selectFeatures #31

kehuangke commented Aug 11, 2021 •

edited

Loading

rersister commented Aug 24, 2021

FionaMoon commented Mar 25, 2022 •

edited

Loading

pcantalupo commented Jul 28, 2022

Cristinex commented Nov 13, 2022

jordan841220 commented Nov 17, 2022 •

edited

Loading

error on selectFeatures #31

error on selectFeatures #31

Comments

kehuangke commented Aug 11, 2021 • edited Loading

rersister commented Aug 24, 2021

FionaMoon commented Mar 25, 2022 • edited Loading

pcantalupo commented Jul 28, 2022

Cristinex commented Nov 13, 2022

jordan841220 commented Nov 17, 2022 • edited Loading

kehuangke commented Aug 11, 2021 •

edited

Loading

FionaMoon commented Mar 25, 2022 •

edited

Loading

jordan841220 commented Nov 17, 2022 •

edited

Loading