Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimized compare_ref function from loop logic to dataframe logic #226

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 44 additions & 0 deletions bamsurgeon.def
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
Bootstrap: docker
From: ubuntu:20.04

# This definition file will allow to create a Singularity image, similar to Dockerfile

%environment
export BAMSURGEON_PICARD_JAR=/picard.jar
export PATH=/app/bin:$PATH # Force /app/bin to stay within the container

%post
# To avoids interactive prompts during build)
export DEBIAN_FRONTEND=noninteractive

# Preconfigure tzdata
echo "tzdata tzdata/Areas select US" | debconf-set-selections
echo "tzdata tzdata/Zones/US select Eastern" | debconf-set-selections

# Install dependencies
apt-get update && apt-get upgrade -y
apt-get install -y python3 python3-pip git wget build-essential libz-dev \
libglib2.0-dev libbz2-dev liblzma-dev default-jre autoconf samtools bwa

mkdir -p /app/bin # Create a custom directory for binaries
wget https://github.com/dzerbino/velvet/archive/refs/tags/v1.2.10.tar.gz
tar -xvzf v1.2.10.tar.gz
cd velvet-1.2.10
make
cp velvetg /app/bin
cp velveth /app/bin

git clone https://github.com/adamewing/exonerate.git
cd exonerate
autoreconf -fi
./configure
make
make install

wget https://github.com/broadinstitute/picard/releases/download/2.27.3/picard.jar -O /picard.jar

# Install python dependencies : pysam, pandas
pip install pysam pandas

# Bamsurgeon
git clone https://github.com/adamewing/bamsurgeon.git
31 changes: 25 additions & 6 deletions bin/bamsurgeon/replace_reads.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
from collections import defaultdict

from bamsurgeon.common import rc

import pandas as pd
import logging
FORMAT = '%(levelname)s %(asctime)s %(message)s'
logging.basicConfig(format=FORMAT)
Expand Down Expand Up @@ -68,16 +68,35 @@ def get_excluded_reads(file):
ex.add(line.strip())
return ex

# Changed from loop logic to dataframe logic
# def compare_ref(targetbam, donorbam):
# ''' if targetbam and donorbam are aligned to different references
# and the references are in a different order it's a problem
# '''
# for ref in targetbam.references:
# if ref not in donorbam.references or donorbam.gettid(ref) != targetbam.gettid(ref):
# logger.error("contig mismatch: %s\n" % ref)
# return False
# return True

## Modified on 05/22/2024: changed loop logic to dataframe logic
def compare_ref(targetbam, donorbam):
''' if targetbam and donorbam are aligned to different references
and the references are in a different order it's a problem
'''
for ref in targetbam.references:
if ref not in donorbam.references or donorbam.gettid(ref) != targetbam.gettid(ref):
logger.error("contig mismatch: %s\n" % ref)
return False
return True
donor_df = pd.DataFrame({"donorbam": donorbam.references})
target_df = pd.DataFrame({"targetbam": targetbam.references})

mismatched_refs = donor_df[~target_df.isin(donor_df)].dropna()

if len(mismatched_refs):
for i in range(len(mismatched_refs)):
logger.error("contig mismatch: %s \n"%mismatched_refs.loc[i, "donorbam"])
return False
else:
return True



def replace_reads(origbamfile, mutbamfile, outbamfile, fasta_ref, nameprefix=None, excludefile=None, allreads=False, keepqual=False, progress=False, keepsecondary=False, keepsupplementary=False, seed=None, quiet=False):
''' outputbam must be writeable and use targetbam as template
Expand Down