Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimized compare_ref function from loop logic to dataframe logic #226

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

TrangNg-Th
Copy link

Description

This pull request optimizes the compare_ref function for faster run of bam merging step after creating mutated bam file. In this version, I

  • Modified the function to use dataframe logic instead of loop logic, using the pandas package to handle and compare references faster.
  • Added a bamsurgeon.def Singularity definition file, allowing users to build a Singularity image for executing the code.
  • The command to build the Singularity image is: singularity build bamsurgeon.sif bamsurgeon.def

Additional Dependencies

  • pandas
  • (optional) Singularity

Benefits

  • Significantly reduced runtime by processing references more efficiently.
  • Improved code readability and maintainability by using dataframe operations.

Benchmark

Test Setup:

  • Donor BAM file size: 3GB
  • Reference FASTA file: Mmul 8.0.1, consisting of 284,727 contigs (including 22 chromosomes)
  • Single SNP

Performance Improvement:

  • The addsnv.py script execution time has been reduced from approximately 4 hours to 2-3 minutes with the optimized compare_ref function.

@TrangNg-Th TrangNg-Th changed the title Optimized compare_ref function from loop logic to dataframe logic. Also Optimized compare_ref function from loop logic to dataframe logic. May 23, 2024
@TrangNg-Th TrangNg-Th changed the title Optimized compare_ref function from loop logic to dataframe logic. Optimized compare_ref function from loop logic to dataframe logic May 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant