-
Notifications
You must be signed in to change notification settings - Fork 6
/
Copy pathpacbio.txt
152 lines (99 loc) · 5.39 KB
/
pacbio.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
================
PacBio tutorials
================
Smrtportal AMI
-------------------
Smrtportal/smrtanalysis is the software developed by Pacific Biosciences
* start a new amazon instance:
* select m3.2xlarge to get 8 cpus (instead of 4 with m1.xlarge) but
NOT with the beacon AMI; instead, follow the instructions for setting up the smrtportal AMI from `this pdf <https://s3.amazonaws.com/files.pacb.com/software/smrtanalysis/2.1.0/Doc/Running%20SMRT%20Analysis%20on%20Amazon.pdf>`__
* **NOTE** skip the filezilla step
Downloading data
-------------------
We'll use data from a single smrtcell based on a size selected 20kb *E. coli* library
Do::
cd /opt/smrtanalysis/common/inputs_dropbox
wget http://files.pacb.com/datasets/secondary-analysis/ecoli-k12-P4C2-20KSS/ecoliK12.tar.gz
tar -vxzf ecoliK12.tar.gz
To import into smrtportal:
* in smrtportal, go to Home --> Import and Manage --> Import SMRT Cells
* click on common/inputs_dropbox
* click 'Scan' and OK
HGAP for assembly
-------------------
* in smrtportal, go to Home --> Create New
* add your imported smrtcell by selecting it and clicking the triangle pointing to the right
* give your job a name
* for protocol, select RS_HGAP_Assembly.2 (*not* '.1')
* click 'Save'
* click 'Start'
Monitor the run, it will probably go overnight
**Where is the data**
Check /opt/smrtanalysis/common/jobs/016/016XXX, where 016XXX is the job ID from smrtportal.
Results appear in the *data* folder
Base modification detection
---------------------------
Use:
* the same smrtcell data
* the RS_Modification_and_Motif_Analysis.1 protocol
* the 'e coli K12 MG1655' reference from the dropdown menu
Running smrtanalysis from the commandline
-----------------------------------------
See this part of the Pacific Biosciences wiki: https://github.com/PacificBiosciences/SMRT-Analysis/wiki/SMRT-Pipe-Reference-Guide-v2.0#-using-the-command-line
* You'll need a settings xml file, which can only be obtained by setting up a smrtportal job with the correct protocol, and grabbing the settings.xml from /opt/smrtanalysis/common/jobs/016/016XXX, where 016XXX is the job ID from smrtportal
* You'll also need an input.fofn file ('file-of-filenames', which contains the full path to the bax.h5 (or bas.h5) file(s)
* use screen!
Do::
. /opt/smrtanalysis/etc/setup.sh
fofnToSmrtpipeInput.py input.fofn > input.xml
smrtpipe.py --params=settings.xml xml:input.xml
Or customise smrtpipe to use more cpus (if available)::
smrtpipe.py -D NPROC=24 --params=settings.xml xml:input.xml
Results appear in the *data* folder
Running blasr to map reads
--------------------------
* upload the reference (e.g., the velvet assembly we used for QC/Validation)
* we will run blasr on a subset of the reads (using only one of the three bax.h5 files)
* samtools is included in the smrtportal distribution
* use screen!
Do::
. /opt/smrtanalysis/etc/setup.sh
blasr /opt/smrtanalysis/common/inputs_dropbox/ecoliK12/Analysis_Results/m130404_014004_sidney_c100506902550000001823076808221337_s1_p0.2.bax.h5 \
/path/to/velvet_pe+mp.fa \
-minSubreadLength 1000 -bestn 1 -nproc 8 -sam -out pacbio_2_velvet_pe+mp_71.sam
samtools view -buS pacbio_2_velvet_pe+mp_71.sam | samtools sort - pacbio_2_velvet_pe+mp_71.sorted
samtools index pacbio_2_velvet_pe+mp_71.sorted
The bam and bai files can be added to the IGV browser
Running bridgemapper
--------------------
Check out https://github.com/PacificBiosciences/Bioinformatics-Training/wiki/Bridgemapper
First, fix a 'bug' which makes one of the steps single-cpu instead of parallel.
Edit /opt/smrtanalysis/analysis/lib/python2.7/pbbridgemapper/bridgemapper.py:
change lines **426 + 427** from::
stdout, stderr = runBlasr(affixesFastq, args.referenceFasta,
affixBlasrOutputPath)
To::
stdout, stderr = runBlasr(affixesFastq, args.referenceFasta,
affixBlasrOutputPath, nproc=args.nproc)
You can now run Bridgemapper through the smrtportal, which gives the optimal output for viewing in the PacBio genome browser SMRTview. However, this will take a long time.
To do that, add the velvet assembly as reference through 'Home --> Import and Manage --> Manage Reference Sequences --> New'
**Alternatively,** run bridgemapper on a subset of the reads through the command line. Use screen!
Do::
. /opt/smrtanalysis/etc/setup.sh
pbbridgemapper --debug --nproc 8 /opt/smrtanalysis/common/inputs_dropbox/ecoliK12/Analysis_Results/m130404_014004_sidney_c100506902550000001823076808221337_s1_p0.2.bax.h5 \
/path/to/velvet_pe+mp.fa bridgemapper_out
* --debug allows for seeing what bridgemapper is doing
To use SMRTview with the results, you'll need to make a smrtportal reference repo::
referenceUploader -c -p smrtpipe_references -n velvet_pe+mp -f /path/to/velvet_pe+mp.fa --saw='sawriter -welter'
* -c: create
* -p: folder for references
* -n: name of the reference
* -f: fasta file
**Viewing with SMRTView**
* download the split_reads.bridgemapper.gz and complete smrtpipe_references folder to your harddrive
* install SMRTView from https://github.com/PacificBiosciences/DevNet/wiki/SMRT-View
* choose 'File --> Open data from server' and select 'Files of type --> Reference Metadata'
* find the relevant reference.info.xml in the smrtpipe_references folder
* choose 'File --> add Tracks from server" and select 'Files of type --> As above and also gzipped files'
* add the split_reads.bridgemapper.gz file
Start browsing!