Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error on selectFeatures #31

Open
kehuangke opened this issue Aug 11, 2021 · 5 comments
Open

error on selectFeatures #31

kehuangke opened this issue Aug 11, 2021 · 5 comments

Comments

@kehuangke
Copy link

kehuangke commented Aug 11, 2021

Hello,

I use scmap to annotate cell types based on a reference annotation dataset. The reference annotation dataset was downloaded from celldex. However, I encounter an error when I choose HumanPrimaryCellAtlasData. The function of selectFeatures can run properly when I choose DatabaseImmuneCellExpressionData. Unlucky, I want to use HumanPrimaryCellAtlasData for future analysis.

There is the code that does not work properly(HumanPrimaryCellAtlasData):

ref<-celldex::HumanPrimaryCellAtlasData()
colData(ref)$cell_type1 <- colData(ref)$label.fine
rowData(ref)$feature_symbol <- rownames(ref)
ref_sce <- SingleCellExperiment::SingleCellExperiment(assays=list(logcounts=Matrix::Matrix(assays(ref)$logcounts)),
colData=colData(ref), rowData=rowData(ref))
ref_sce <- scmap::selectFeatures(ref_sce,suppress_plot=TRUE)

The error is

Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
0 (non-NA) cases
In addition: Warning message:
In linearModel(object, n_features) :
Your object does not contain counts() slot. Dropouts were calculated using logcounts() slot...

The same code can be run rightly if I use DatabaseImmuneCellExpressionData

ref <- celldex::DatabaseImmuneCellExpressionData()
colData(ref)$cell_type1 <- colData(ref)$label.fine
rowData(ref)$feature_symbol <- rownames(ref)
ref_sce <- SingleCellExperiment::SingleCellExperiment(assays=list(logcounts=Matrix::Matrix(assays(ref)$logcounts)), colData=colData(ref), rowData=rowData(ref))
ref_sce <- scmap::selectFeatures(ref_sce,suppress_plot=TRUE)

I have changed logcounts to counts and expanded value by power 10, but that did not work.

I speculate the error is due to the value of logcounts.
This is the logcounts value on HumanPrimaryCellAtlasData which report an error

image

This is the logcounts value on DatabaseImmuneCellExpressionData which can run properly.

image

Could you please help me solve this problem?

Thanks

@rersister
Copy link

I also saw the same question and got the same error, but I don't know how to solve. Could you help us to solve this problem?

@FionaMoon
Copy link

FionaMoon commented Mar 25, 2022

I convert logcounts into counts, but this error still exists.

a <- assay(sce, "logcounts")
b <- expm1(a)
assays(sce)$counts <- b

@pcantalupo
Copy link

I'm getting the same error when trying to use celldex's MouseRNAseq reference with scmap. Is there any update on this issue?

@Cristinex
Copy link

Same error. Any update?

@jordan841220
Copy link

jordan841220 commented Nov 17, 2022

@kehuangke @Cristinex @pcantalupo
After looking into the code of selectFeature(), I think linearModel() function might be the reason of this.
In linearModel(), the dropout rate of a feature was defined as how many cells express 0 log_count in such feature.
However, this is not gonna work if there are no 0 log_count at all in the reference, leading to 0 dropout rates for all of the features.

linearModel <- function(object, n_features) {
    log_count <- as.matrix(logcounts(object))
    cols <- ncol(log_count)
    if (!"counts" %in% assayNames(object)) {
        warning("Your object does not contain counts() slot. Dropouts were calculated using logcounts() slot...")
        dropouts <- rowSums(log_count == 0)/cols * 100    
    } else {
        count <- as.matrix(counts(object))
        dropouts <- rowSums(count == 0)/cols * 100
    }
    # do not consider genes with 0 and 100 dropout rate
    dropouts_filter <- dropouts != 0 & dropouts != 100         
    dropouts_filter <- which(dropouts_filter)
    dropouts <- log2(dropouts[dropouts_filter])
    expression <- rowSums(log_count[dropouts_filter, ])/cols
    
    fit <- lm(dropouts ~ expression)

And if dropout rates are all 0%, we can not get fit <- lm(dropouts ~ expression) in linearModel() to work, resulting from the fact that we will filter out genes with 0 dropout rate. So, no features will be considered at the end.

In conclusion, copy the function from the source code and modified this line
dropouts <- rowSums(log_count == 0)/cols * 100
to something like
dropouts <- rowSums(log_count <= 3)/cols * 100
would work. (The value defining the dropout cutoff depends on your reference.)

or
One can modified this line to get proper number of filtered features:
dropouts_filter <- dropouts != 0 & dropouts != 100

Wish the authors can explain more on this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants