Skip to content

Commit

Permalink
Merge pull request #3 from pdimens/anchor
Browse files Browse the repository at this point in the history
pull next release with updated LM3
  • Loading branch information
pdimens authored Jun 3, 2021
2 parents 3bc16d6 + 4424105 commit e25e23b
Show file tree
Hide file tree
Showing 60 changed files with 1,110 additions and 855 deletions.
21 changes: 16 additions & 5 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/.snakemake
.vscode
/.snakemake
/1_ParentCall
/2_Filtering
/3_SeparateChromosomes
Expand All @@ -11,10 +11,21 @@
/7_Intervals
/8_RepeatMask
/9_Chain
/10_Anchoring
/11_MareyMaps
/10_PlaceAndOrientContigs
/11_AGP
/12_Fasta
/13_MareyMapsUntrimmed
/14_NewIntervals
/15_Trim
/16_MareyMapsTrimmed
/JoinSingles2All_iter
/RefineMap
/pedigree.txt
map.master
/*.vcf
LOD.master
snps.txt
/*.paf
/*.vcf
/*.fasta
/*.fa
/*.fa.gz
/*.fasta.gz
Binary file modified .misc/LepAnchor.rulegraph.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
38 changes: 28 additions & 10 deletions LepWrap
Original file line number Diff line number Diff line change
Expand Up @@ -4,24 +4,42 @@ if which snakemake &>/dev/null; then
foo=1
else
echo -e "ERROR:\nSnakemake installation is required to run LepWrap, but not found in the current environment."
echo -e "If [ana|mini]conda are installed, created a pre-configred environment with:\n"
printf "\033[01;32m"
printf "conda env create -f conda_setup.yml\n"
printf "\033[0m"
exit 1
fi

if [ ! -f config.yaml ]; then
echo -e "ERROR:\nThe file 'config.yaml' is required to run LepWrap, but not found in the current directory."
exit 1
fi

if [[ -z "$1" ]]; then
echo "Perform the modules of Lep-Map3. Make sure config.yaml is properly configured for how you intend to run things."
echo "Perform the modules of Lep-Map3 and optionally Lep-Anchor"
echo "Make sure config.yaml is properly configured for how you intend to run things."
echo ""
echo "[usage] LepWrap <number of threads to use>"
echo "[example] LepWrap 16"
exit 1
fi

echo "Running Lep-Map3"
sleep 2s
snakemake --cores $1 --snakefile ./rules/LepMap3.smk --directory .
lepmap(){
echo "Running Lep-Map3"
sleep 2s
snakemake --cores $1 --snakefile ./rules/LepMap3/LepMap3.smk --directory .
}

LA=$(grep "run_lepanchor" config.yaml | cut -d":" -f2 | xargs | tr '[:upper:]' '[:lower:]')
if [ $LA == "true" ]; then
echo -e "\nRunning Lep-Anchor"
sleep 2s
snakemake --cores $1 --snakefile ./rules/LepAnchor.smk --directory .
fi
lepanchor(){
LA=$(grep "run_lepanchor" config.yaml | cut -d":" -f2 | xargs | tr '[:upper:]' '[:lower:]')
if [ $LA == "true" ]; then
echo -e "\nRunning Lep-Anchor"
sleep 2s
snakemake --cores $1 --snakefile ./rules/LepAnchor/LepAnchor.smk --directory .
else
exit 1
fi
}

lepmap $1 && lepanchor $1
15 changes: 9 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
![logo](.misc/logo.png)


_It's Lep-Map3, but with snakes_ 🐍🐍

[![alt text](https://img.shields.io/badge/docs-wiki-75ae6c?style=for-the-badge&logo=Read%20The%20Docs)](https://github.com/pdimens/LepWrap/wiki)
Expand All @@ -11,7 +12,7 @@ LepWrap is a reusable pipeline to use the linkage map software [Lep-Map3](https:


### How to install
You will need a `conda` installation (Anaconda or Miniconda), along with cloning this repository locally.
You will need a `conda` installation ([Anaconda](https://docs.anaconda.com/anaconda/install/) or [Miniconda](https://docs.conda.io/en/latest/miniconda.html)), along with cloning this repository locally. I recommend Miniconda.

#### 1. Cloning LepWrap
Download a zip of this repository using the "Code" button on the top-right and unzip it on your machine or:
Expand All @@ -23,7 +24,9 @@ git clone https://github.com/pdimens/LepWrap.git
Assuming you have `anaconda` or `miniconda` installed:
```bash
cd LepWrap
conda create -f conda_setup.yml -n lepwrap

conda env create -f conda_setup.yml

```
This will create an environment called `lepwrap` that can be activated with:
```bash
Expand All @@ -38,13 +41,13 @@ You will need to modify `config.yaml` to suit your needs, then you can simply ru
where `<number_of_cores>` is an integer of the maximum number of cores/threads you want the pipeline to use.

### Something to keep in mind

Lep-Map3/Anchor are **very** comprehensive software, and LepWrap doesn't incorporate all the features and nuances. Your study is unique, so you are encouraged to clone or fork this repo and adapt LepWrap to your needs! All of the code in LepWrap is written as human-readable as possible and aggressively annotated, so give it a shot and adapt it to your workflow. If using LepWrap in a publication, cite **Pasi Rastas** for his work and please include a link to it in your publication. If you like using it, star this repository and give me (Pavel) a shout out on Twitter [@pvdimens](https://twitter.com/PVDimens) [![alt text](http://i.imgur.com/wWzX9uB.png)](https://twitter.com/PVDimens) 😊
LepWrap does things a certain way, employing the most common/reasonable way of using Lep-Map3 (and LepAnchor more or less). Version `3.2+` is **a lot** more flexible that its predecessors, but might still lack something you're looking for. Your study is unique, and I encourage youto clone/fork this repository and adapt LepWrap to your needs! All of the code in LepWrap is written in human-readable bash or aggressively annotated R, so give it a shot and adapt it to your workflow. PR's always welcome!



## Citation
If using LepWrap in a publication, cite **Pasi Rastas** for their work on Lep-Map3/Lep-Anchor and please include a link to this repository. If you like using it, give me (Pavel) a shout out on Twitter [@pvdimens](https://twitter.com/PVDimens) [![alt text](http://i.imgur.com/wWzX9uB.png)](https://twitter.com/PVDimens) =)

Pasi Rastas, Lep-MAP3: robust linkage mapping even for low-coverage whole genome sequencing data, Bioinformatics, Volume 33, Issue 23, 01 December 2017, Pages 3726–3732,https://doi.org/10.1093/bioinformatics/btx494
> Pasi Rastas, Lep-MAP3: robust linkage mapping even for low-coverage whole genome sequencing data, Bioinformatics, Volume 33, Issue 23, 01 December 2017, Pages 3726–3732,https://doi.org/10.1093/bioinformatics/btx494
Pasi Rastas, Lep-Anchor: automated construction of linkage map anchored haploid genomes, Bioinformatics, Volume 36, Issue 8, 15 April 2020, Pages 2359–2364, https://doi.org/10.1093/bioinformatics/btz978
> Pasi Rastas, Lep-Anchor: automated construction of linkage map anchored haploid genomes, Bioinformatics, Volume 36, Issue 8, 15 April 2020, Pages 2359–2364, https://doi.org/10.1093/bioinformatics/btz978
103 changes: 78 additions & 25 deletions config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,33 +8,45 @@
# ParentCall2 #
#---------------#
# the filtered VCF file with your genotype likelihoods:
vcf: "out.14.missrecode.vcf"
vcf: "out.15.miss95recode.vcf"

# Instructions to create pedigree file: https://sourceforge.net/p/lep-map3/wiki/software/LepMap3 Home/#parentcall2
# the pedigree file associated with your data
pedigree: "pedigree.txt"

# If there are any additional parameters you would like to use for ParentCall2 (e.g. halfSibs=1), add them here
extra_params_ParentCall: "removeNonInformative=1"


#---------------#
# Filtering2 #
#---------------#
# data tolerance value-- set this to 0 if you want to skip the Filtering2 module
data_tol: 0.1
data_tol: 0

# If there are any additional parameters you would like to use for Filtering2 (e.g. convert2Biallelic=1), add them here
extra_params_Filtering: ""


#------------------------#
# SeperateChromosomes2 #
#------------------------#
# LepMak3r will iteratively perform SeperateChromosomes2 for each
# LepWrap will iteratively perform SeperateChromosomes2 for each
# LOD score in the range of lod_min to lod_max

# The minimum LOD for SeperateChromosomes2
lod_min: 5
lod_min: 20

# The maximum LOD for SeperateChromosomes2
lod_max: 30
lod_max: 40

# Use only markers with informative father (1), mother(2), both parents(3) or neither parent(0)
informative: "informativeMask=123"

# If there are any additional parameters you would like to use for SeparateChromosomes2 (e.g. distrotionLOD=1), add them here
extra_params_SeparateChromosomes: "sizeLimit=5 distortionLOD=1"


#-------------------#
# JoinSingles2ALL #
#-------------------#
Expand All @@ -48,6 +60,10 @@ lod_limit: "lodLimit=2"
# start with lower values for lod_limit and increase as necessary
lod_difference: "lodDifference=2"

# If there are any additional parameters you would like to use for JoinSingles2All (e.g. iterate=1), add them here
extra_params_JoinSingles: "iterate=1 distortionLOD=1"


#-----------------#
# OrderMarkers2 #
#-----------------#
Expand All @@ -59,32 +75,39 @@ exp_lg: 24
# Set iterations to the number of iterations you want per chromosome (more is better). Recommend 30 or more
iterations: 100

# Set threads_per to the number of threads you would like to use per linkage group for ordering
threads_per: 2

# Choose your distance method by commenting the line you dont want
dist_method: "useKosambi=1"
#dist_method: "useMorgan=1"
# If there are any additional parameters you would like to use for OrderMarkers2 (e.g. hyperPhaser=1), add them here
extra_params_OrderMarkers: "hyperPhaser=1 useKosambi=1 phasingIterations=2"

# number of iterations to use of OrderMarkers2 phasing.
# this number is typically 1-2
phase_iterations: 2

#-----------------#
# Edge Trimming #
#-----------------#
# Edge trimming will examine the first and last X% of markers in a linkage group
# and remove clusters that are N centimorgans away from the next marker
# you can "skip" trimming by making edge_length really low (e.g. 1)
# and trim_cutoff really high (e.g. 50)
# and remove clusters that are N% centimorgans (of the total cM span) away from
# the next marker. You can "skip" trimming by setting trim_cutoff really high (e.g. 80-100).

# Set edge_length to the percent number of markers you would like to examine from either end of the linkage group
# Value can be an integer or decimal, i.e. 15 is the same as 0.15, which both mean "15%"
edge_length: 1
# Value can be an integer or decimal, i.e. 15 is the same as 0.15, which both mean "15%" (10-20 is reasonable)
edge_length: 20

# Set trim_cuttoff to the centiMorgan distance cutoff (5 is reasonable)
trim_cutoff: 100

# Set trim_cuttoff to the centiMorgan distance cutoff (10 is reasonable)
trim_cutoff: 40

#--------------------#
# Re-OrderMarkers2 #
#--------------------#
# The second round of OrderMarkers will use the same basic parameters as the first round (but not the extra params)
# If there are additional parameters you would like to use, add them here:
extra_params_reOrderMarkers: "improveOrder=1 useKosambi=1"


#-----------------------#
# Calculate Distances #
#-----------------------#
# If you used useKosambi=1 or useMorgan=1 for Ordering/reOrdering, add that same
# parameter to distance_method:
distance_method: ""


####################
Expand All @@ -104,7 +127,7 @@ lg_count: 24
PAF_file: "alnPB.paf"

# if you have a proximity file add it here, otherwise leave the text as "/dev/null".
# This isn't yet implemented in the official Lep-Anchor wrapper, so don't change this.
# This isn't yet implemented in Lep-Anchor, so don't change this.
proximity_file: "/dev/null"

# Which system is appropriate for running the HaploMerger2 section? Comment out the one that doesn't apply.
Expand All @@ -116,16 +139,46 @@ OS_info: "Ubuntu"
#OS_info: "CentOS6"


#------------#
# CleanMap #
#------------#
# If there are any additional parameters you would like to use for CleanMap (e.g. chimericDistance=500), add them here
extra_params_CleanMap: ""


#-----------#
# Map2Bed #
#-----------#
# If there are any additional parameters you would like to use for Map2Bed (e.g. markerSupport=4), add them here
extra_params_Map2Bed: ""


#-------------------------#
# PlaceAndOrientContigs #
#-------------------------#
# choose which of the input types you want to generate by uncommenting the other
# LepAnchor uses intervals by default, but either works.
#lepanchor_input: "noIntervals=0" # uses intervals input
lepanchor_input: "noIntervals=1" # uses map position input

# The size limit for detecting potential haplotype contigs (default: ~5000)
# Set this value really high (50000+) to ignore haplotype removal in between PlaceOrient iterations
haplotype_limit: 5000

# If there are any additional parameters you would like to use for PlaceAndOrientContigs (e.g. randomOrder=1), add them here

extra_params_PlaceOrient: "keepEmptyIntervals=1 numRuns=10"


#-----------------#
# Edge Trimming #
#-----------------#
# Edge trimming will examine the first and last X% of markers in a linkage group
# and remove clusters that are N% centimorgans (of the total cM span) away from
# the next marker. You can "skip" trimming by making edge_length really low (e.g. 1)
# and trim_cutoff really high (e.g. 50)
# the next marker. You can "skip" trimming by setting LA_trim_cutoff really high (e.g. 80-100)

# Set edge_length to the percent number of markers you would like to examine from either end of the linkage group
# Value can be an integer or decimal, i.e. 15 is the same as 0.15, which both mean "15%"
# Value can be an integer or decimal, i.e. 15 is the same as 0.15, which both mean "15%" (10-15 is reasonable)
LA_edge_length: 20

# Set trim_cuttoff to the centiMorgan distance cutoff (5 is reasonable)
Expand Down
9 changes: 0 additions & 9 deletions rules/LAtest.smk

This file was deleted.

Loading

0 comments on commit e25e23b

Please sign in to comment.