-
Notifications
You must be signed in to change notification settings - Fork 4
/
07-genomics.Rmd
128 lines (101 loc) · 5.47 KB
/
07-genomics.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
# Genomics
```{r , include=FALSE}
knitr::opts_chunk$set(
message = FALSE,
warning = FALSE
)
```
## Single Cell Analysis
### Minimising the size of a Seurat Object
Single cell analyses are ofetn collated in Seurat objects which can be huge. We reduced the size of our Seurat object by pulling the relevant sections using the code below reducing the object to <20% of its original size. This works by only including the sparse matrices. This is not a workable example, but the PBMC dataset could be used if necessary.
```{r , eval = FALSE}
#library(Seurat)
#library(dplyr)
#Load in the data
pathtoglobaldata <-"mac_shiny/data/Global_sc_object"
object_global<-readRDS(pathtoglobaldata)
# Get sparse assay data
sizetest_global<-GetAssayData(object = object_global, slot = "data") # counts don’t work, scaled breaks my pc with memory errors, not sure it matters that much
object_global_small<- CreateSeuratObject(sizetest_global)
# Add in classifications.
object_global_small@reductions$tsne<-object_global@reductions$tsne # reduction embeddings for tsnegraph
object_global_small$Phenotype<- object_global$Phenotype
[email protected]$final_classification<- [email protected]$final_classification
object_global_small$final_classificaiton<- object_global$final_classification #cell classifications
Idents(object = object_global_small) <- "final_classificaiton" #set forever
#same
saveRDS(object_global_small,"mac_shiny/data/global_object_sml")
```
## Nextflow
Nextflow enables scalable and reproducible scientific workflows using software containers. These can either work using singularity (HPC and secure compute), docker (insecure compute - requires root) and using conda. Singularity is the preferred option.
The core workflows can be found at [here](https://nf-co.re/).
For a basic alignment job for RNASeq or DNASeq, you will require a machine with around 200GB memory and 24 cores. Fast scratch space is advantageous (NVMe fast disk is ideal). Fast disk space is needed for the working directory (cached files and staged files) and for the results storage. At least three times the size of the input data is ideal.
### RNASeq example with sbatch (slurm)
```{bash, eval = FALSE}
#!/bin/bash --login
#
#SBATCH --job-name=single-node-multicore ### Replace with the name of your job
#SBATCH [email protected] ### Your email address
#SBATCH --mail-type=END,FAIL # Conditions when an email is sent
#SBATCH --output=%x.%j.out ### output file: <job-name.job-id.out>
#SBATCH --nodes=1 ### Keep the job on one computer/node.
#SBATCH --ntasks=1 ### single task
#SBATCH --cpus-per-task=64 ### each task uses up to 64 cpu cores (defaults to 1 if absent)
#SBATCH --time=10-00:00:00 ### WallTime (10 days)
#SBATCH --mem=480G
#SBATCH --partition=compute ### Partition Name
date
hostname
# Set OMP_NUM_THREADS to the same value as --cpus-per-task
# with a fallback option in case it isn't set.
module load nextflow-22.04.0
module load singularity/singularity-3.8.7
nextflow run nf-core/rnaseq -w "/nfs/sharedscratch2/users/tdrake/nf_work/" --input "/nfs/sharedscratch2/users/tdrake/sample_sheet_rna_hpc.csv" --genome GRCm38 --remove_ribo_rna -profile singularity --outdir "/nfs/sharedscratch2/users/tdrake/rna_seq_output/"
date
```
### DNASeq example with sbatch (slurm)
```{bash, eval = FALSE}
#!/bin/bash --login
#
#SBATCH --job-name=single-node-multicore ### Replace with the name of your job
#SBATCH [email protected] ### Your email address
#SBATCH --mail-type=END,FAIL # Conditions when an email is sent
#SBATCH --output=%x.%j.out ### output file: <job-name.job-id.out>
#SBATCH --nodes=1 ### Keep the job on one computer/node.
#SBATCH --ntasks=1 ### single task
#SBATCH --cpus-per-task=64 ### each task uses up to 64 cpu cores (defaults to 1 if absent)
#SBATCH --time=10-00:00:00 ### WallTime (10 days)
#SBATCH --mem=480G
#SBATCH --partition=compute ### Partition Name
date
hostname
# Set OMP_NUM_THREADS to the same value as --cpus-per-task
# with a fallback option in case it isn't set.
module load nextflow-22.04.0
module load singularity/singularity-3.8.7
nextflow run nf-core/sarek -w "/nfs/sharedscratch2/users/tdrake/nf_work/" --input "/mnt/data/R09/seq_store/izy_22_2_r_wes_nextflow_sarek/izy_wes_samplesheet.csv" -profile docker --genome GRCm38 --outdir "/mnt/data/R09/seq_store/izy_22_2_r_wes_nextflow_sarek/" --skip_tools baserecalibrator --tools vep,snpeff,sarek,cnvkit
date
```
### ATACSeq example with sbatch (slurm)
```{bash, eval = FALSE}
#!/bin/bash --login
#
#SBATCH --job-name=single-node-multicore ### Replace with the name of your job
#SBATCH [email protected] ### Your email address
#SBATCH --mail-type=END,FAIL # Conditions when an email is sent
#SBATCH --output=%x.%j.out ### output file: <job-name.job-id.out>
#SBATCH --nodes=1 ### Keep the job on one computer/node.
#SBATCH --ntasks=1 ### single task
#SBATCH --cpus-per-task=64 ### each task uses up to 64 cpu cores (defaults to 1 if absent)
#SBATCH --time=10-00:00:00 ### WallTime (10 days)
#SBATCH --mem=480G
#SBATCH --partition=compute ### Partition Name
date
hostname
# Set OMP_NUM_THREADS to the same value as --cpus-per-task
# with a fallback option in case it isn't set.
module load nextflow-22.04.0
module load singularity/singularity-3.8.7
nextflow run nf-core/atacseq -w "/nfs/sharedscratch2/users/tdrake/nf_work/" --input "/nfs/sharedscratch2/users/tdrake/atac_seq_ctnnb_p53_myc/sample_atac.csv" --genome GRCm38 -profile singularity --outdir "/nfs/sharedscratch2/users/tdrake/atac_seq_output_broad/"
date
```