Skip to content

ctg-lund/ctg-sc-rna-10x

Repository files navigation

ctg-sc-rna-10x

Nextflow pipeline for preprocessing of 10x chromium sc-RNA data with cellranger.

  • Designed to handle multiple projects in one sequencing run (but also works with only one project)
  • Supports mm10 and hg38 references, but can also be run with custom reference genome and annotation (must be added via nextflow.config). See custom genome below.
  • Supports nuclei samples

USAGE

  1. Clone and build the Singularity container for this pipeline: https://github.com/perllb/ctg-sc-rna-10x/tree/master/container/sc-rna-10x.v6
  2. Edit your samplesheet to match the example samplesheet. See section SampleSheet below
  3. Edit the nextflow.config file to fit your project and system.
  4. Run pipeline
nohup nextflow run pipe-sc-rna-10x.nf > log.pipe-sc-rna-10x.txt &

Cron

  • /projects/fs1/shared/ctg-cron/ctg-pipe-cron/
  • Looks for complete runfolders WITH CTG_SampleSheet.sc-rna-10x.csv
    • Complete: Sync complete, and run complete.
  • Will start pipeline IF ctg.sc-rna-10x.done or ctg.sc-rna-10x.start are NOT in runfolder.
  • So if you want to restart it, you can delete ctg.sc-rna-10x* from runfolder

Input Files

The following files must be in the runfolder to start pipeline successfully.

  1. Samplesheet (CTG_SampleSheet.sc-rna.10x.csv)

(Note that if running without demux, another samplesheet is needed! See below https://github.com/perllb/ctg-sc-rna-10x/blob/master/README.md#running-without-demux-with-existing-fastq-files)

Samplesheet requirements:

Note: One samplesheet pr project! Note: Must be in comma-separated values format (.csv)

[Data] , , , , , , , , ,
Lane Sample_ID index Sample_Project Sample_Species nuclei force agg email deliver
Si1 SI-GA-D9 proj_2021_012 human n n y [email protected];[email protected] y
Si2 SI-GA-H9 proj_2021_192 hs-mm y 5000 n [email protected] n

The nf-pipeline takes the following Columns from samplesheet to use in channels:

  • Sample_ID : ID of sample. Sample_ID can only contain a-z, A-Z and "_". E.g space and hyphen ("-") are not allowed! If 'Sample_Name' is present, it will be ignored.
  • Sample_Project : Project ID. E.g. 2021_033, 2021_192.
  • Sample_Species : Only 'human'/'mouse'/'hs-mm'/'custom' are accepted. If you want to run the mixed GRCh38+mm10 genome, set "hs-mm". If species is not human or mouse (or mixed - "hs-mm") - or if an alternative reference e.g. with added gene/sequnece - set 'custom'. This custom reference genome has to be specified in the nextflow config file. See below how to edit the config file. Alternatively, when running driver, you can specify the path command line with the -c flag: sc-rna-10x-driver -c /full/path/to/reference
  • nuclei : Set to 'y' if the sample is nuclei, otherwise 'n'.
  • force : Set to 'n' if NOT running with --force-cells. If you want to force cells for the sample, set this to the number you want to force
  • agg : Set to 'y' for all samples that you want to aggregate (pr project)

Delivery-email generation:

  • email : Column should have the email adresses for recipients of delivery mail. If multiple emails, separate with ";"
  • deliver: Set to 'y' if data should be automatically transferred to lfs603 and email sent to customer (defined in email) after pipeline is executed. Otherwise, set to 'n'.

Demux

  • index : Must use index ID (10x ID) if dual index. For single index, the index sequence works too.
  • Lane : Only needed to add if you actually sequence the project on a specific lane. Else, this column can be omitted.

METAID

  • Note that if you want to define a specific metaid for the run/analysis, it can be specified above the [Data] section in the samplesheet. See example below.
  • If not specified, the sc-rna-10x-driver will automatically generate a metaid, based on runfolder date and ID.

Samplesheet template (.csv)

Name : CTG_SampleSheet.sc-rna-10x.csv

metaid,2021_012
[Data]
Lane,Sample_ID,index,Sample_Project,Sample_Species,nuclei,force,agg,email,deliver
,a1,a,SI-TT-A11,2021_Test2_Aydan,human,n,n,n,[email protected],y
,b2,b,SI-TT-A12,2021_Test2_Aydan,human,n,n,n,[email protected],y

OR without specifying metaid (will be automatically generated)

[Data]
Lane,Sample_ID,index,Sample_Project,Sample_Species,nuclei,force,agg,email,deliver
,a1,a,SI-TT-A11,2021_Test2_Aydan,human,n,n,n,[email protected],y
,b2,b,SI-TT-A12,2021_Test2_Aydan,human,n,n,n,[email protected],y

Running without demux (with existing fastq files)

The main difference of the samplesheet is that fastqpath is added to samplesheet header:

metaid,2021_012
fastqpath,/path/to/fastq
[Data]
Lane,Sample_ID,index,Sample_Project,Sample_Species,nuclei,email,deliver
,Si1,SI-GA-D9,2021_012,human,n,n,[email protected];[email protected],y
,Si2,SI-GA-H9,2021_012,hs-mm,y,5000,[email protected],y
  • The fastqpath has to point to a directory which has "/sid...fastq" structure. That is, the fastqpath folder has to contain all fastq files for each sample, with name starting with the corresponding Sample_ID.
__ fastqpath
           |__ Sample_ID*R1*fastq
           |__ Sample_ID*R2*fastq
           |__ Sample_ID*I1*fastq
           |__ Sample_ID*I2*fastq
            ....

The driver can be executed from wherever.

Pipeline steps:

Cellranger version: cellranger v6.0

Output:

Container

https://github.com/perllb/ctg-containers/tree/main/sc-rna-10x/sc-rna-10x.v6

Custom genome

If custom genome (not hg38 or mm10) is used

  1. Set "Sample_Species" column to 'custom' in samplesheet:

Example:

Sample_ID Sample_Name index Sample_Project Sample_Species nuclei
Si1 Sn1 SI-GA-D9 proj_2021_012 custom y
Si2 Sn2 SI-GA-H9 proj_2021_012 custom y
  1. In nextflow.config, set custom_genome=/PATH/TO/CUSTOMGENOME

Add custom genes (e.g. reporters) to cellranger annotation

You can use this script to add custom genes to the cellranger ref https://github.com/perllb/ctg-cellranger-add2ref

About

nextflow pipeline for 10x sc rna analysis

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published