T cells

Identifying T cell populations by single-cell RNA sequencing (scRNAseq)

Kane Foster, Teerapon Sahwangarrom, Gordon Beattie, James Reading (30-05-2022)

The objective of this Standard Operating Procedure (SOP) is to provide a clear outline of how to identify populations of T cells by single-cell RNA sequencing (scRNAseq). This is working on the assumption of not having accompanying CITEseq data, and possibly having TCR information.

1. Computationally isolating T cells

This can apply to a dataset of the whole tumour micro-environment (i.e. T cells alongside myeloid cells and B cells), or for removing contaminant non-T cells from a CD3+ flow-sorted dataset.

Strategies:

Filter cells by sufficient expression of T cell individual marker genes (CD3D, CD3E, CD3G), or a UCell score of such genes
Standard clustering workflow, then identify clusters enriched for T cell expression markers

Can be done across a range of clustering resolutions (higher resolutions recommended as to resolve differences between cell types, similar clusters can be merged after if desired)

If clustering confounded by ribosomal, TCR, mitochondrial, stress or other confounding gene types, recommend removing these genes prior to calculating highly variable genes.

Filter for cells with a TCR (if available)

Useful if absolute certainty of T cell identity required. Depending on quality on dataset (and scTCRseq efficient), may removed a proportion of true T cells which did not have their TCR sequenced due to technical reasons.

Issues:

It is particularly important here to separate cytotoxic cells and NK cells, which have a very similar transcriptome.
These approaches can also capture doublets of T cells with other cell types, so care should be taken to try to not select cells with T cell features alongside I.e. myeloid genes. A TCR+ or CD3-expressing single cell located in or alongside a cluster of other cells is likely a doublet.
For advice on removing undesirable T cell-like lymphocytes (I.e. MAITs, iNKTs, γδT), see below

2. Computationally separating CD4 and CD8 cells

Separating the CD4 and CD8 lineage by scRNAseq is more difficult than by cytometry. When the transcriptome is sampled by relatively low-depth 10X scRNAseq, lineage-separating transcripts can be overpowered by more dominant transcriptional programs, such as the ribosome-enriched naïve state or the cytolytic program.

Furthermore, CD4+ T cells express low levels of the CD4 transcript but very high levels of CD4 protein (see Fig 3B).

Strategies:

Standard clustering workflow, then identify CD4+ and CD8+ clusters.

After clustering, visualise CD4+ and CD8+ on the UMAP and look for regions where the two overlap

Deliberate over-clustering, label many small clusters as CD4+ and CD8+, then re-merge the CD4 and CD8 labels
Targeted re-clustering of a single mixed CD4+ and CD8+ cluster.
Utilise TCR clone labels to assign each clonotype to either CD4 or CD8 subsets. If a clone lacks CD8 and CD4, but has a transcriptome similar to other detected (or suspected) CD4 cells, then likely CD4+ given CD4 low detection.

CD3+ non-CD4 or CD8 lymphoid cells can emerge

I.e., MAITs, iNKTs, γδ T.
These have distinct marker genes and TCR usage so can be separated from αβ T cells, as explained here.
Confusingly, γδ T cells can also express TRAC and TRBC1/2. Furthermore, it is not impossible for an αβ T cell to express a TRG chain (PMID:21734177).
Practically, this means one cannot select cells by presence of TRA or TRC1/2, nor remove cells by present of TRD or TRG chains.
Instead, I recommend selecting cells with non-zero expression of TRA and TRB genes (I.e. TRBV11-2), as these represent real αβ T cells

Issues:

There is a possibility of capturing CD4+ and CD8+ doublets, which may be the cause of a small cluster of cells which cannot be sufficiently resolved to CD4+ and CD8+ cells. CD4+CD8+ do also exist.
Several transcriptional programs massively overpower lineage separation (I.e. cell cycle and the interferon response).

These could be separated by cell-wise CD4+ and CD8+ transcript detection, with the caveat of losing some cells to dropout

Alternately, they could be retained as mixed CD4+CD8+ cluster, with the contribution of a specific lineage inferred via TCR sharing between this and other clusters.

The MAIT transcriptional state is very similar to that of CD4+Th17 or CD8+Tc17 cells.

3. Identifying CD4 and CD8 subsets

The phenotype of a CD4 or CD8 subset as sampled by scRNAseq is essentially the transcriptional state of a cell. Connecting transcriptional state to known T cell biology while retaining information about what genes define a cluster of cells is difficult, and therefore there has been low consensus in the literature on this topic.

Strategies:

UCell scores for phenotype gene lists.
Gene Set Enrichment Analysis and/or GO Term (Biological Process) analysis of marker genes.
Quantitative ON or OFF status of lineage-defining transcription factors via SCENIC.
Trajectory Inference tools have potential for phenotyping, i.e. activation status may vary across multiple clusters.
Some phenotypes do fall onto more of a diffuse trajectory than into discrete groups (I.e. pre-dysfunctional and late dysfunctional CD8+ T cells, or CD4+ Th subsets).
A useful way to examine if a cluster represents a discrete subset or instead of one of several clusters with a fairly similar phenotype could plotting marker genes for each cluster in turn across all clusters, or plotting cluster pairwise similarity.

Issues:

While transcriptional states may be recurrent among publications, labels are not. Most authors are eager to ascribe a known phenotype (T-CM, T-EM, etc) based off prior work using biological intuition, but unless they specifically use CITE-seq or flow-sorting these labels (and associated subset behaviour) are not necessarily equivalent.

Ascribe labels to define the observed transcriptional states by highly significant marker genes or transcription factors

If the same transcriptional state is found across studies but is labelled differently should instead be renamed to a consensus.

Studies that have strength of sampling (I.e. PBMC and TIL) can be used to validate the accurate of a label to a specific T cell subset

The source of phenotype gene lists must be considered: quality of research, similarity of dataset, strength of their testing regimen (testing approach, n samples), clustering granularity
A phenotype gene list may not perfectly align with a datasets clustering, I.e. a T-RM signature may be partially enriched across multiple clusters. Clustering may be slightly tweaked to try to align (slightly increasing/decreasing clustering resolution), but one should always trust the clusters output by the data. It is possible to simply report multiple clusters partially enriched in a specific phenotype =
Marker gene testing results depend on the component of the entire dataset. I.e. In a dataset with no late dysfunctional CD8+ T cells, an earlier subset of pre-dysfunctional CD8s would appear relatively more exhausted versus other naïve or memory subsets but can quantitatively lack features of later exhaustion. This should also be considered when deriving phenotype gene lists

4. Independent Confirmation: Query-Reference Prediction Tools

Involves using computational approaches acquiring a predicted cell type label for your data. The most simple tools simply take data and a list of marker genes, whereas most complicated projection approaches map your data (query) to an annotated atlas of cell types (the reference) .

Strategies

Non-projection based tools such as scPred and SingleR are useful as they are computationally non-demanding and can provide an initial rough estimate of cell identity.
The simplest projection tools is Azimuth.
Numerous custom projection tools are available (this is the same computational pipeline as batch correction)
Using multiple orthogonal projection tools or multiple datasets can strengthen this analysis.
Should be performed late in analysis to avoid a bias in cell annotation, although can also be useful for a very first glance to identify major lineages (T, B, Myeloid, etc)

Issues:

The predicted label is only as good as the original phenotyping and labels of the reference. Azimuth allows you to view or download the Reference for yourself for inspection.
If possible, project to a dataset for which you have lots of faith in the labeling (I.e. a CITEseq dataset), using the same criteria as was used to select publications to use as a source for phenotype gene lists.
However, you should not use the same dataset you used for phenotyping as for Projection as this is not independent.

scRNAseq studies of tumour-reactive T cells

Studies which utilised scRNAseq to assign a single-cell phenotype to tumour-reactive T cell clones, identified through various experimental assays

Pages

Home

Analysis Steps

Specific Cell Types

T cells

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

T cells

Identifying T cell populations by single-cell RNA sequencing (scRNAseq)

Table of Contents

1. Computationally isolating T cells

2. Computationally separating CD4 and CD8 cells

3. Identifying CD4 and CD8 subsets

4. Independent Confirmation: Query-Reference Prediction Tools

scRNAseq studies of tumour-reactive T cells

Pages

Clone this wiki locally