Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add straightforward --input #29

Merged
merged 11 commits into from
Apr 18, 2024
Merged

Add straightforward --input #29

merged 11 commits into from
Apr 18, 2024

Conversation

JMencius
Copy link
Contributor

Hi @wdecoster
I manage to modify the original code to implement the --input mentioned in issue #10.
Breifly I just add an --input or -i to accept an input filename and transform to flow and input to filter function.
I also test it with

cat {FILEPATH}/test.fastq | ./chopper -q 10 > testQ10_old.fastq;
Kept 207 reads out of 250 reads

./chopper -q 10 -i {FILEPATH}/test.fastq > testQ10_new.fastq;
Kept 207 reads out of 250 reads

These run give the same result. I hope you like my modifications.

JMencius and others added 7 commits April 18, 2024 12:19
add -i, --input parameter and processing codes for input file
change help message for --input
change help info for --input
adding input to Cli in tests
@wdecoster
Copy link
Owner

Thank you! I had to add input: None to the Cli structs in the tests, but otherwise this looks good. Am I correct to think that this won't work for (gz) compressed input files?

@JMencius
Copy link
Contributor Author

Yeah, it won't work for compressed input file. There are too many compressed format such as .gz, .tar.gz or .zip, for the compressed file, user can just use the pipeline for different compressed file.

@JMencius
Copy link
Contributor Author

@wdecoster If you want such as chopper -i test.fastq.gz -q 10 > test_q10.fastq to work, maybe I will continue to add some code.

@wdecoster
Copy link
Owner

That would be great, as I would hope most people keep their fastqs compressed, and gz is definitely the most frequent compression format for such files. But as you said, anyone could just use stdin for things like that :-)

@JMencius
Copy link
Contributor Author

JMencius commented Apr 18, 2024

@wdecoster I have added the support for .gz compressed file.
I also use the fastq file in /test-data directory to run some test. The outcome of compressed file and not compressed file give the some results.
Maybe you shall make some change the readme.md, when I was new to chopper I was confused for using pipeline for running chopper LOL. Hope theses modifications can make chooper more user-friendly.

use triple backticks
@wdecoster
Copy link
Owner

Looks great, thanks so much!

@wdecoster wdecoster merged commit 271f9a9 into wdecoster:master Apr 18, 2024
1 check passed
@JMencius
Copy link
Contributor Author

JMencius commented Apr 19, 2024

@wdecoster There is a typo in the EXAMPLE section I add in readme.md
chopper -q 10 -l 500 -i reads.fastq.gz | gzip > filtered_reads.fast f q.gz
Please do remember to modify it. Es tut mir leid.

@incoherentian
Copy link

incoherentian commented Aug 19, 2024

Hi both! Only seeing this today and was excited to remove a couple pipes. Tried to no avail on an 8-core allocation, while piping still worked -

[$USER@ccs0045(hawk) d161_FBA18768_IB_CHAHIGH_ALL_merge]$ module load miniforge3
[$USER@ccs0045(hawk) d161_FBA18768_IB_CHAHIGH_ALL_merge]$ source activate
[$USER@ccs0045(hawk) d161_FBA18768_IB_CHAHIGH_ALL_merge]$ conda activate chopper080
[$USER@ccs0045(hawk) d161_FBA18768_IB_CHAHIGH_ALL_merge]$ chopper --threads 8 -q 17 --headcrop 20 -i /scratch/$USER/bactopia/in_bactopia/d161_FBA18768_IB_CHAHIGH_ALL/FBA18768_pass_barcode11_Q10_all.fastq.gz > /scratch/$USER/bactopia/in_bactopia/d161_FBA18768_IB_CHAHIGH_ALL_merge/FBA18768_pass_barcode11_Q17_all.fastq.gz
Kept 0 reads out of 1 reads
[$USER@ccs0045(hawk) d161_FBA18768_IB_CHAHIGH_ALL_merge]$ ls
FBA18768_pass_barcode11_Q17_all.fastq.gz  FBA18768_pass_barcode11_Q19_all.fastq.gz
FBA18768_pass_barcode11_Q18_all.fastq.gz  FBA18768_pass_barcode11_Q20_all.fastq.gz
[$USER@ccs0045(hawk) d161_FBA18768_IB_CHAHIGH_ALL_merge]$ rm FBA18768_pass_barcode11_Q17_all.fastq.gz
[$USER@ccs0045(hawk) d161_FBA18768_IB_CHAHIGH_ALL_merge]$ gunzip -c /scratch/$USER/bactopia/in_bactopia/d161_FBA18768_IB_CHAHIGH_ALL/FBA18768_pass_barcode11_Q10_all.fastq.gz | chopper --threads 8 -q 17 --headcrop 20 | gzip > /scratch/$USER/bactopia/in_bactopia/d161_FBA18768_IB_CHAHIGH_ALL_merge/FBA18768_pass_barcode11_Q17_all.fastq.gz
Kept 164021 reads out of 562377 reads

Making a silly mistake here?

@wdecoster
Copy link
Owner

I wonder if that is fixed by a later PR that hasn't made it into a release, yet. I will post a new binary ere later today (I hope?) to debug this further...

@wdecoster
Copy link
Owner

Can you try with v0.9.0?
https://github.com/wdecoster/chopper/releases/tag/v0.9.0

@incoherentian
Copy link

-i is working identically to the old pipes for me with 0.9.0. Thanks @wdecoster @JMencius and @sharkLoc too!

@JMencius
Copy link
Contributor Author

Hi @wdecoster Would you mind change this expression in readme.md, since now the performance is similar between -i and linux pipe |.

  • Note that the tool may be substantially slower in the third example above, and piping while decompressing is recommended (as in the first example).`

@wdecoster
Copy link
Owner

Done!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants