linearham

     __      _.._
  .-'__`-._.'.--.'.__.,
 /--'  '-._.'    '-._./
/__.--._.--._.'``-.__/
'._.-'-._.-._.-''-..'

References

Dhar A, Ralph DK, Minin VN, Matsen FA IV (2020) A Bayesian phylogenetic hidden Markov model for B cell receptor sequence analysis. PLoS Comput Biol 16(8): e1008030 (https://doi.org/10.1371/journal.pcbi.1008030).

installation
running
output files
naive sequence comparisons

Installation

Linearham's dependencies are described in the Dockerfile. We recommend that you run linearham inside a Docker container, since this will make installation much easier (if you're new to Docker, read this). However, you can also install the dependencies by hand, in which case you should clone the repository and run each command in the Dockerfile that's on a line starting with RUN (treat WORKDIR as cd). The more similar your system is to that described by the Dockerfile's FROM line (at the moment, debian), the easier this will be.

Using Docker

It's best to start by running an interactive session in the container:

docker run -it quay.io/matsengrp/linearham /bin/bash.

You can also create a shell script with your linearham commands for docker to run, and put it in place of /bin/bash (use test.sh as an example). If you want your run to be reproducible, choose a tag of the form v<stuff> from quay.io, then specify it like so: quay.io/matsengrp/linearham:v<stuff>.

To access your own data from within the container, or to persist your output beyond the container, you must use volumes by specifying -v. We recommend using this convention (other paths may not work):

Choose a directory outside of Docker that contains both your input data directory and your desired output directory
Mount it as a volume to /linearham/work inside the container: -v /your/local/path:/linearham/work
All commands inside the container referencing paths inside that directory should do so via /linearham/work, e.g. setting --outdir=/linearham/work/output inside the container will persist your output outside the container in /your/local/path/output.

Note that because Docker must run as root, this means that you will be writing to the directory on your host machine as root, so a) be very careful and b) don't choose anything anywhere near / (something like /home/user/several/sub/dirs is good).

Running linearham

All linearham actions are run using scons in the main linearham directory. Available actions are --run-partis, --run-linearham, and --build-partis-linearham. Note that because of the way scons parses arguments, you must always use an = sign: --arg=val.

Linearham uses a partis output file as its input. If you've already run partis to create this file, skip to --run-linearham; if not, you can have linearham run partis for you with --run-partis.

`--run-partis`

This runs partis on an input sequence file:

scons --run-partis --fasta-path=<file> --locus={igh|igk|igl} --outdir=<dir> [--parameter-dir=<dir>]

--fasta-path can be any file type that partis --infname handles (see partis help). Partis requires a directory with fitted sample-specific parameters, which if not already present will be automatically inferred based on the sequences in the fasta file. Because linearham runs on only a single family at a time, but these parameters are more accurate if inferred on the entire repertoire of many clonal families, it is better if you use parameters cached in a previous partis run on the entire repertoire, and then pass them to linearham with --parameter-dir. However, if you don't, the automatically-inferred parameters will still work fine, they'll just be somewhat less accurate (since they'll only be based on the one family).

Other partis-related arguments:

Name	Description
`--all-clonal-seqs`	If set, attempts to force all sequences in the fasta file into the same clonal family; otherwise it runs partis partition to infer the clonal families
`--locus`	Which immunoglobulin locus (defaults to `igh`)?
`--outdir`	The output directory (defaults to `output`).

`--run-linearham`

Once you have a partis output file, whether you made it separately or with linearham, you can run linearham itself:

scons --run-linearham --outdir=<dir> [--partis-yaml-file=<file>] [--parameter-dir=<dir>]

If there is one clonal family (i.e. cluster) in the partis output file, linearham will run on that. If there is more than one, you'll have to select a cluster to run on using the following options. Partis performs clustering hierarchically, so its output stores a list of partitions, where each partition divides the sequences in the repertoire into clonal families (clusters). By default, linearham looks in the best (most likely) of these partitions, but you can specify the (zero-based) index of a different one with --partition-index. Within a partition, you can specify a cluster either by (zero-based) index with --cluster-index, or with the unique id of a particular sequence in the cluster with --cluster-seed-unique-id (see partis --seed-unique-id for more info). Other options:

Command	Description
`--partis-yaml-file`	Path to the partis output file that is linearham's input. Defaults to the location in `--outdir` to which the linearham `--run-partis` action writes it
`--outdir`	The output directory (defaults to `output`).
`--lineage-unique-ids`	`,`-separated list of sequence ids, each of whose lineages will be analyzed separately, and in somewhat more detail than by default (see below).
`--parameter-dir`	Directory from which linearham reads partis hmm files. If not set, it defaults to the location in `--outdir` used by `--run-partis`. As for `--run-partis` (above), parameters will be much more accurate if you cache them with partis beforehand on the entire repertoire, but if this isn't possible they'll be inferred automatically on the one family on which you're running linearham, which should be fine.

If you don't have a clonal family/cluster of interest, or are not sure how to identify it using these options, you can run scripts/parse_cluster.py to work it out.

For example, running:

./scripts/parse_cluster.py lib/partis/test/reference-results/partition-new-simu.yaml --fasta-output-file parsed_cluster.fa --yaml-output-file parsed_cluster.yaml | less -RS

will print a table of available clusters in the best partition similar to this:

 available clusters in partition at index 29 (best):
index   size    unique_ids
0       71      [...]
1       11      [...]
2       262     [...]
3       4       [...]

Using the indices from this table, you can specify the corresponding clusters to Linearham. Running on the cluster with 262 sequences from the above table would look like:

scons --run-linearham --cluster-index=2 <args.. >

You can also figure out which sequences are in which clusters with the partis view-output action piped to less -RS.

Other linearham-related arguments:

Command	list?	Description
`--template-path`	no	The RevBayes template path (defaults to templates/revbayes_template.rev).
`--mcmc-iter`	yes	How many RevBayes MCMC iterations should we use (defaults to 10000)?
`--mcmc-thin`	yes	What RevBayes MCMC thinning frequency should we use (defaults to 10)?
`--tune-iter`	yes	How many RevBayes tuning iterations should we use (defaults to 5000)?
`--tune-thin`	yes	What RevBayes tuning thinning frequency should we use (defaults to 100)?
`--num-rates`	yes	The number of gamma rate categories (defaults to 4).
`--burnin-frac`	yes	What fraction of MCMC burnin should we use (defaults to 0.1)?
`--subsamp-frac`	yes	What bootstrap sampling fraction should we use (defaults to 0.05)?
`--rng-seed`	yes	The random number generator (RNG) seed (defaults to 0).
`--asr-pfilters`	no	The ancestral sequence posterior probability thresholds (defaults to 0.1).

If some aguments are specified as ,-separated lists (middle column), linearham will run revbayes separately, writing to separate output directories, for all combinations of all such parameters. For more information on these arguments, run scons --help.

`--build-partis-linearham`

This compiles linearham, partis, and other dependencies. You'll only need to run this if you've either modified some source code or you're installing without docker.

Output files

A variety of output files are written to --outdir. In the top-level dir are:

the partis output file that was used as input for linearham (e.g. partis_run.yaml), which contains partis-inferred clonal families (clusters) and annotations (including inferred naive sequence)
sub dirs for each cluster on which linearham was run, with name of form cluster-N/ for the cluster index N

Within each cluster's subdir cluster-N/ are:

a fasta file cluster-N/cluster_seqs.fasta with each of the cluster's input sequences, as well as its partis-inferred naive sequence
- NOTE the input sequences have SHM indels "reversed" (reverted to their state in the naive rearrangement), and non-variable regions (V/J framework) trimmed off
a partis yaml output file cluster.yaml resulting from pulling just this cluster out of the original partis output file that was used as input
the linearham output dir cluster-N/mcmciter<stuff> where <stuff> records the exact options of the run e.g. mcmciter10000_mcmcthin10_tuneiter5000_tunethin100_numrates4_rngseed0/

XXX why are there two levels of subdir, i.e. mcmcmiterXXX as well as burninfracYYY?

Within the linearham output dir `mcmciter<stuff>`

e.g. mcmciter10000_mcmcthin10_tuneiter5000_tunethin100_numrates4_rngseed0/:

file	format	description
lh_revbayes_run.trees	XXX	XXX
revbayes_run.log	XXX	XXX
revbayes_run.rev	XXX	XXX
revbayes_run.stdout.log	XXX	XXX
revbayes_run.trees	XXX	XXX

Within `burninfrac<stuff>`

e.g. burninfrac0.1_subsampfrac0.05/

file	format	description
linearham_run.log	tsv	posterior samples of the naive sequence annotation and the phylogenetic substitution model and rate variation parameters
linearham_run.trees	newick	posterior tree samples with ancestral sequence annotations (formatted for use by Dendropy
linearham_run.ess	tsv	approximate effective sample sizes for each field in linearham_run.log
aa_naive_seqs.fasta	fasta	each sampled naive amino acid sequence and its associated posterior probability
aa_naive_seqs.dnamap	fasta (ish)	map from each sampled naive amino acid sequence to its corresponding set of nucleotide naive sequences and posterior probabilities
aa_naive_seqs.png	png	logo plot of naive amino acid sequence posterior probability using WebLogo to visualize per-site uncertainties
linearham_annotations_all.yaml	yaml	XXX
linearham_annotations_best.yaml	yaml	XXX

If --lineage-unique-ids is specified, there will also be additional lineage-specific output files in subdirectories (one for each sequence id specified) like lineage_<uid>/. These include:

file	format	description
aa_lineage_seqs.fasta	fasta	for each intermediate ancestor in the lineage of the sequence with the specified id, the sampled amino acid sequence and its associated posterior probability
aa_lineage_seqs.dnamap	fasta(ish)	for each intermediate ancestor of the lineage of the sequence with the specified id, map from sampled amino acid sequence to its corresponding set of nucleotide sequences and posterior probabilities
aa_lineage_seqs.pfilterX.png	png	posterior probability lineage graphic made with Graphviz, where `X` is the posterior probability cutoff for the sampled sequences.

Naive sequence comparisons

One way to visualize the various output naive sequences and their probabilities is with lib/partis/bin/cf-linearham.py, which takes as input a linearham output dir and a partis output file (the latter preferably created with the --calculate-alternative-annotations option set). It then prints an ascii-art comparison of the amino acid and nucleotide naive sequences, as well as (for partis) a rundown of the alternative gene calls and their probabilities (the most likely of which was presumably input to linearham). More info here.

Name		Name	Last commit message	Last commit date
Latest commit History 263 Commits
.github/workflows		.github/workflows
data		data
doc_images		doc_images
lib		lib
scripts		scripts
src		src
templates		templates
test		test
.clang-format		.clang-format
.gitignore		.gitignore
.gitmodules		.gitmodules
Dockerfile		Dockerfile
Doxyfile		Doxyfile
README.md		README.md
SConstruct		SConstruct
clean.sh		clean.sh
requirements.txt		requirements.txt
test.sh		test.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

linearham

References

Installation

Using Docker

Running linearham

`--run-partis`

`--run-linearham`

`--build-partis-linearham`

Output files

Within the linearham output dir `mcmciter<stuff>`

Within `burninfrac<stuff>`

Naive sequence comparisons

About

Releases

Packages

Languages

naveenjasti/linearham

Folders and files

Latest commit

History

Repository files navigation

linearham

References

Installation

Using Docker

Running linearham

--run-partis

--run-linearham

--build-partis-linearham

Output files

Within the linearham output dir mcmciter<stuff>

Within burninfrac<stuff>

Naive sequence comparisons

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

`--run-partis`

`--run-linearham`

`--build-partis-linearham`

Within the linearham output dir `mcmciter<stuff>`

Within `burninfrac<stuff>`

Packages