Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sample_ID validation #125

Open
nathanweeks opened this issue Feb 17, 2023 · 0 comments
Open

Sample_ID validation #125

nathanweeks opened this issue Feb 17, 2023 · 0 comments

Comments

@nathanweeks
Copy link

The Illumina Sequencing Sample Sheet Format Specifications document cited in the sample-sheet code:

# From the section "Character Encoding" in the Illumina format specification.
#
# https://www.illumina.com/content/dam/illumina-marketing/
# documents/products/technotes/
# sequencing-sheet-format-specifications-technical-note-970-2017-004.pdf

explicitly mentions additional restrictions on Sample_ID column values:

The field for the Sample_ID column has special character restrictions as only alphanumeric (ASCII codes 48-57, 65-90, and 97-122), dash (ASCII code 45), and underscore (ASCII code 95) are permitted. The Sample_ID length is limited to 100 characters maximum.

The sample_sheet validation code currently allows some invalid Sample_ID values (e.g., containing +) that some tools (like bcl2fastq) reject. Could the sample_sheet validation code be enhanced to detect Sample_IDs that don't conform to the Illumina spec?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant