Skip to content

Commit

Permalink
Merge pull request #50 from ccb-hms/ms_refresh2
Browse files Browse the repository at this point in the history
ms refresh 2
  • Loading branch information
andrewGhazi authored Oct 3, 2024
2 parents 12a699d + c553a88 commit 74fee19
Showing 1 changed file with 10 additions and 13 deletions.
23 changes: 10 additions & 13 deletions episodes/multi-sample.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -199,16 +199,13 @@ summed

The main advantage of using pseudo-bulk samples is the possibility to use
well-tested methods for differential analysis like `edgeR` and `DESeq2`, we will
focus on the former for this analysis. `edgeR` uses a Negative Binomial
Generalized Linear Model.
focus on the former for this analysis. `edgeR` and `DESeq2` both use negative binomial models under the hood, but differ in their normalization strategies and other implementation details.

First, let's start with a specific cell type, for instance the "Mesenchymal stem
cells", and look into differences between this cell type across conditions.
cells", and look into differences between this cell type across conditions. We put the counts table into a `DGEList` container called `y`, along with the corresponding metadata.

```{r}
label <- "Mesenchyme"
current <- summed[,label == summed$celltype.mapped]
current <- summed[, summed$celltype.mapped == "Mesenchyme"]
y <- DGEList(counts(current), samples = colData(current))
Expand Down Expand Up @@ -241,7 +238,7 @@ y <- y[keep,]
summary(keep)
```

We can now proceed to normalize the data There are several approaches for
We can now proceed to normalize the data. There are several approaches for
normalizing bulk, and hence pseudo-bulk data. Here, we use the Trimmed Mean of
M-values method, implemented in the `edgeR` package within the `calcNormFactors`
function. Keep in mind that because we are going to normalize the pseudo-bulk
Expand Down Expand Up @@ -274,7 +271,7 @@ par(mfrow = c(1,1))
Furthermore, we want to check if the samples cluster together based
on their known factors (like the tomato injection in this case).

To do so, we use the MDS plot, which is very close to a PCA representation.
In this case, we'll use the multidimensional scaling (MDS) plot. Multidimensional scaling (which also goes by principal *coordinate* analysis (PCoA)) is a dimensionality reduction technique that's conceptually similar to principal *component* analysis (PCA).

```{r}
limma::plotMDS(cpm(y, log = TRUE),
Expand Down Expand Up @@ -308,7 +305,6 @@ Additionally, the Common and Trend BCV are shown in `red` and `blue`.
plotBCV(y)
```


We then fit a Quasi-Likelihood (QL) negative binomial generalized linear model for each gene.
The `robust = TRUE` parameter avoids distortions from highly variable clusters.
The QL method includes an additional dispersion parameter, useful to handle the uncertainty and variability of the per-gene variance, which is not well estimated by the NB dispersions, so the two dispersion types complement each other in the final analysis.
Expand Down Expand Up @@ -419,9 +415,10 @@ data means by the total number of cells in each sample (cell type).
On the other hand, this can lead our data to be susceptible to compositional
effect. "Compositional" refers to the fact that the cluster abundances in a
sample are not independent of one another because each cell type is effectively
competing for space in the sample. They behave like proportions. If cell type A
abundance increases in a new condition, that means we'll observe less of
everything else, even if everything else is unaffected by the new condition.
competing for space in the sample. They behave like proportions in that they
must sum to 1. If cell type A abundance increases in a new condition, that means
we'll observe less of everything else, even if everything else is unaffected by
the new condition.

Compositionality means that our conclusions can be biased by the amount of cells
present in each cell type. And this amount of cells can be totally unbalanced
Expand Down Expand Up @@ -466,7 +463,7 @@ topTags(res2, n = 10)

### Testing against a log-fold change threshold

This other approach assumes that the composition bias introduces a spurious
A second approach assumes that the composition bias introduces a spurious
log2-fold change of no more than a \tau quantity for a non-DA label.

In other words, we interpret this as the maximum log-fold change in the total
Expand Down

0 comments on commit 74fee19

Please sign in to comment.