Informative error when samplesheet contains incorrect primer sequences #19

jackscanlan · 2024-08-13T00:39:00Z

Currently, if incorrect primer sequences are specified in the samplesheet, SPLIT_LOCI will yield empty files and the pipeline will end early without error. Because READ_TRACKING won't have run, there will be no obvious sign to an inexperienced user that primer sequences may have been wrong. This partially stems from READ_FILTER having optional outputs, which I would rather keep at this stage.

One way to handle this could be to check for the presence of primer sequences at the start and/or end of the reads, and if they are found in a proportion of reads below a specified threshold, the pipeline will fail with an informative error. Alternatively, if either SPLIT_LOCI or PRIMER_TRIM outputs too many empty read files, the pipeline can fail (perhaps a pipeline parameter can toggle this behaviour on/off).

In addition, the pipeline could check for common sequences at the start and ends of reads and display them to the user in the error message, which might help them realise what the true primers are and adjust the inputs accordingly.

Would have to think about scenarios where one primer pair is correct but another isn't, as well.

The text was updated successfully, but these errors were encountered:

jackscanlan added invalid This doesn't seem right rework Redoing or refining something and removed invalid This doesn't seem right labels Aug 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Informative error when samplesheet contains incorrect primer sequences #19

Informative error when samplesheet contains incorrect primer sequences #19

jackscanlan commented Aug 13, 2024

Informative error when samplesheet contains incorrect primer sequences #19

Informative error when samplesheet contains incorrect primer sequences #19

Comments

jackscanlan commented Aug 13, 2024