Hostile with no options classifying different than --invert #42

jannikseidelQBiC · 2024-09-09T06:13:57Z

Hi and first, thanks for the great work.

I tried to run Hostile to get the filtered result files and the removed read-pairs (Illumina paired-end data as input). What caught my eye is that the two results do not match:
reads_removed in the first output should be the same as reads_out in the second (and the other combination).

Mode	reads_removed	reads_out
no option	19870638	42475288
`--invert`	42896358	19449568

Difference to 'no option'	421070	-421070

The commands I used (installation of Hostile 1.1.0 via conda):

hostile clean --fastq1 <file_forward>.fq.gz --fastq2 <file_reverse>.fq.gz --out-dir filtered_1 > log1_filtered.log
hostile clean --fastq1 <file_forward>.fq.gz --fastq2 <file_reverse>.fq.gz --out-dir removed_1 --invert > log1_removed.log

It seams that running with the --invert flag does a different classification than without. Am I missing an option to set to get the same results?

Thanks in advance!

PS: Here are the log files.

[
    {
        "version": "1.1.0",
        "aligner": "bowtie2",
        "index": "human-t2t-hla",
        "options": [],
        "fastq1_in_name": "<file_forward>.fq.gz",
        "fastq1_in_path": "<path_to_files>/<file_forward>.fq.gz",
        "fastq1_out_name": "<file_forward>.clean_1.fastq.gz",
        "fastq1_out_path": "filtered_1/<file_forward>.clean_1.fastq.gz",
        "reads_in": 62345926,
        "reads_out": 42475288,
        "reads_removed": 19870638,
        "reads_removed_proportion": 0.31872,
        "fastq2_in_name": "<file_reverse>.fq.gz",
        "fastq2_in_path": "<path_to_files>/<file_reverse>.fq.gz",
        "fastq2_out_name": "<file_reverse>.clean_2.fastq.gz",
        "fastq2_out_path": "filtered_1/<file_reverse>.clean_2.fastq.gz"
    }
]

[
    {
        "version": "1.1.0",
        "aligner": "bowtie2",
        "index": "human-t2t-hla",
        "options": [
            "invert"
        ],
        "fastq1_in_name": "<file_forward>.fq.gz",
        "fastq1_in_path": "<path_to_files>/<file_forward>.fq.gz",
        "fastq1_out_name": "<file_forward>.clean_1.fastq.gz",
        "fastq1_out_path": "removed_1/<file_forward>.clean_1.fastq.gz",
        "reads_in": 62345926,
        "reads_out": 19449568,
        "reads_removed": 42896358,
        "reads_removed_proportion": 0.68804,
        "fastq2_in_name": "<file_reverse>.fq.gz",
        "fastq2_in_path": "<path_to_files>/<file_reverse>.fq.gz",
        "fastq2_out_name": "<file_reverse>.clean_2.fastq.gz",
        "fastq2_out_path": "removed_1/<file_reverse>.clean_2.fastq.gz"
    }
]

The text was updated successfully, but these errors were encountered:

bede · 2024-09-09T19:40:03Z

Hi Jannik, thank you, this is interesting. From your data there certainly appears to be a problem with how --invert is implemented. By any chance are you able to send me some (or all) of your test data?

Bede

jannikseidelQBiC · 2024-09-11T06:39:14Z

Hi Bede,
the dataset I cannot share. Could you try to reproduce the behavior with another dataset? If it depends on only this dataset this would be also highly interesting.

Best,
Jannik

bede · 2024-09-11T06:44:52Z

Thank you – that's understandable. I will investigate using other data.

…

On Wed, 11 Sep 2024 at 07:39, Jannik Seidel ***@***.***> wrote: Hi Bede, the dataset I cannot share. Could you try to reproduce the behavior with another dataset? If it depends on only this dataset this would be also highly interesting. Best, Jannik — Reply to this email directly, view it on GitHub <#42 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAHWAAFC3GIBWGFGIMC7BRTZV7QSTAVCNFSM6AAAAABN3ZL4TKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNBSG44DANRQHE> . You are receiving this because you commented.Message ID: ***@***.***>

bede added the bug Something isn't working label Sep 9, 2024

bede added this to the 1.2.0 milestone Sep 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hostile with no options classifying different than --invert #42

Hostile with no options classifying different than --invert #42

jannikseidelQBiC commented Sep 9, 2024

bede commented Sep 9, 2024

jannikseidelQBiC commented Sep 11, 2024

bede commented Sep 11, 2024 via email

Hostile with no options classifying different than --invert #42

Hostile with no options classifying different than --invert #42

Comments

jannikseidelQBiC commented Sep 9, 2024

bede commented Sep 9, 2024

jannikseidelQBiC commented Sep 11, 2024

bede commented Sep 11, 2024 via email