Normal ShapeMapper execution automatically performs a series of analysis steps internally. This is usually the most convenient option for general users. However, some users may wish to perform individual analyses semi-manually in isolation, allowing sequence alignment with a different aligner, or the introduction of specialized filtering stages, or the inclusion of mutation processing steps within a larger workflow.
Under the hood, ShapeMapper is fairly modular. This page attempts
to summarize the main steps power users will be most interested in.
Core shapemapper binary executables and scripts are located in internals/bin
.
Python scripts are designed for python>=3.5. Thirdparty executables are in various
locations within internals/thirdparty
.
To temporarily put bundled binaries and thirdparty executables
into the active shell PATH, run source internals/paths/bin_paths.sh
. This
will enable, for example, simply typing make_reactivity_profiles.py
instead of
internals/thirdparty/miniconda/bin/python3 internals/bin/make_reactivity_profiles.py
.
To aid in understanding the various files, executables, and commandline
parameters used in a run, we recommend executing run_example_modular.sh
, or
running shapemapper on a small dataset with the following additional parameters:
--serial
--verbose
--render-flowchart
--output-processed-reads
--output-aligned-reads
--output-parsed-mutations
--output-counted-mutations
Examine the log file to see the exact commands used by shapemapper to run
each module, and examine primary intermediate and output files in the shapemapper_out
folder. Additional intermediate files will be present within the
shapemapper_temp
folder. A workflow graphic will be written to shapemapper_out/*flowchart.svg
;
this can be inspected in software such as Inkscape.
ShapeMapper performs sequence alignment using Bowtie2 by default, or optionally STAR.
Running an alignment typically consists of two separate commands: building an index, then aligning reads using the index. There are many tutorials online on how to perform these steps.
See Alignment to reference sequences for an overview geared toward SHAPE-MaP and see Aligner parameters for commandline details.
- Usage of a gapped aligner is highly recommended, since sequence deletions are a large component of the MaP signal (at least for backbone adducts read out with the SuperScript II reverse transcriptase under relaxed-fidelity conditions). See Fig. 1C in Busan and Weeks, 2018.
- Alignments must be in SAM format for processing with ShapeMapper.
- ShapeMapper requires the
MD
field to be present in each SAM read - Paired reads (if present) must be located on adjacent lines for correct handling.
- Sorting reads by mapped location is not required.
This stage parses a single .sam
alignment file as input and produces a single .mut
file as output containing
read mapping information and processed mutations for each read passing quality filters.
To run this module in isolation, use internals/bin/shapemapper_mutation_parser
. Run with
no arguments for commandline help.
See Parsed mutations for output file format.
Analyses performed by shapemapper_mutation_parser
are documented in
Analysis steps:
- Primer trimming and enforcement of read location requirements
- Ambiguously aligned mutation handling
- Multinucleotide mutation handling
- Post-alignment basecall quality filtering
- Chemical adduct location inference
This stage takes a single .mut
parsed mutations file as input and outputs a table of
counted mutations and read depths.
Use shapemapper_mutation_counter
to run this module. Run with no arguments
for commandline help.
See Mutation counts for output file format.
This stage takes mutation counts and read depth tables from one, two, or three samples (modified, untreated, denatured control), computes an overall reactivity profile, normalizes that profile, generates output figures, and performs quality control checks.
To calculate a reactivity profile from mutation counts tables, run
make_reactivity_profiles.py
. Run with --help for usage.
This script produces a single reactivity profile table as output.
See column descriptions. Also see Calculation of mutation rates and Reactivity profile calculation.
To normalize a reactivity profile, run normalize_profiles.py
.
For most situations, provide a single reactivity profile table as input with
--tonorm (the other arguments can be safely ignored).
The file will be overwritten with added columns for normalized reactivity and reactivity stderr.
See Reactivity profile normalization.
To convert a reactivity profile into various output formats convenient for
downstream software, run tab_to_shape.py
.
See File formats.
To perform quality control checks and render summary figures, run
render_figures.py
.