Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bclconvert logfile update #560

Open
wants to merge 10 commits into
base: develop
Choose a base branch
from
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ This repository contains the main scripts for routine analysis of clinical next

| Script | Run mode | Details |
| ------ | -------- | ------- |
|[demultiplex.py](demultiplex.py) | Command line | Demultiplex (excluding TSO runs) and calculate cluster density for Illumina NGS data using `bcl2fastq2` [(guide)](demultiplex/README.md) |
|[demultiplex.py](demultiplex.py) | Command line | Demultiplex (excluding TSO runs) and calculate cluster density for Illumina NGS data using `bclconvert2` [(guide)](demultiplex/README.md) |
| [setoff_workflows.py](setoff_workflows.py) | Command line | Upload NGS data to DNAnexus and trigger in-house workflows [(guide)](setoff_workflows/README.md) |
| [upload_runfolder](upload_runfolder) | Command line or module import | Uploads an Illumina runfolder to DNAnexus [(guide)](upload_runfolder/README.md)|
| [wscleaner](wscleaner) | Command line | Automates the deletion of runfolders that have been uploaded to the DNAnexus cloud storage service [(guide)](wscleaner/README.md)|
Expand Down Expand Up @@ -52,7 +52,7 @@ The below diagram is a UML class diagram showing the relationships between the c
| [config](config) | lime green | Stores the configuration classes for use by other modules |
| [ad_email](ad_email) | blue | Email sending module [(guide)](ad_email/README.md) |
| [ad_logger](ad_logger) | sea green | This module contains classes that create logging objects that write messages to the syslog, stream and log files. Used by other modules [(guide)](ad_logger/README.md) |
| [demultiplex](demultiplex) | orange | Demultiplex (excluding TSO runs) and calculate cluster density for Illumina NGS data using `bcl2fastq2` [(guide)](demultiplex/README.md) |
| [demultiplex](demultiplex) | orange | Demultiplex (excluding TSO runs) and calculate cluster density for Illumina NGS data using `bclconvert2` [(guide)](demultiplex/README.md) |
| [setoff_workflows](setoff_workflows) | pink | Upload NGS data to DNAnexus and trigger in-house workflows [(guide)](setoff_workflows/README.md) |
| [toolbox](toolbox) | grey | Contains classes and functions shared [(guide)](toolbox/README.md) |
| [upload_runfolder](upload_runfolder) | sand | Uploads an Illumina runfolder to DNAnexus [(guide)](upload_runfolder/README.md) |
Expand Down Expand Up @@ -98,7 +98,7 @@ The above image describes the possible associations in the Class Diagram. In the
| Demultiplex output | Catches any traceback from errors when running the cron job that are not caught by exception handling within the script | `TIMESTAMP.txt` | `/usr/local/src/mokaguys/automate_demultiplexing_logfiles/Demultiplexing_stdout` |
| demultiplex (script_logger) | Records script-level logs for the demultiplex script | `TIMESTAMP_demultiplex_script.log` | `/usr/local/src/mokaguys/automate_demultiplexing_logfiles/demultiplexing_script_logfiles/` |
| demultiplex (demux_rf_logger) | Records runfolder-level logs for the demultiplex script | `RUNFOLDERNAME_demultiplex_runfolder.log` | `/usr/local/src/mokaguys/automate_demultiplexing_logfiles/demultiplexing_script_logfiles/` |
Bcl2fastq output | STDOUT and STDERR from bcl2fastq2 | `bcl2fastq2_output.log` | Within the runfolder |
Bclconvert output | STDOUT and STDERR from bclconvert2 | `bclconvert2_output.log` | Within the runfolder |
| ss_validator | Records runfolder-level logs for the samplesheet_validator script | `RUNFOLDERNAME_samplesheet_validator_script.log` | `/usr/local/src/mokaguys/automate_demultiplexing_logfiles/samplesheet_validator_script_logfiles/` |
| backup | Records the logs from the upload runfolder script | `RUNFOLDERNAME_upload_runfolder.log` | `/usr/local/src/mokaguys/automate_demultiplexing_logfiles/upload_runfolder_script_logfiles/` |
| wscleaner | Records the logs from the wscleaner script | `TIMESTAMP_wscleaner.log` | `/usr/local/src/mokaguys/automate_demultiplexing_logfiles/wscleaner/` |
Expand Down
2 changes: 1 addition & 1 deletion ad_logger/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ logfiles_config = {
"sw": sw_runfolder_logfile,
"demux": demultiplex_runfolder_logfile,
"backup": upload_runfolder_logfile,
"bcl2fastq2": bcl2fastqlog_file,
"bclconvert2": bclconvertlog_file,
"ss_validator": samplesheet_validator_logfile,
}

Expand Down
2 changes: 1 addition & 1 deletion config/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ Panel number lists are created from the PANEL_DICT, assimilating pan numbers fro

- SNP does not have R numbers (test_number) as it is an identity check for the GMS SMS
- Panels for WES (analysed in Congenica) and TSO500 (analysed in QCII), and ArcherDX (analysed in Archer software), are applied at the point of analysis, so R and M numbers (test_number) for these are not listed below. These pan numbers do not necessarily refer to bed files but rather project configuration (e.g. DNAnexus instances, project layout etc.)
- Development runs have two options for pan numbers, one for runs that require standard processing with bcl2fastq and one for runs that require manual processing as they have UMIs
- Development runs have two options for pan numbers, one for runs that require standard processing with bclconvert and one for runs that require manual processing as they have UMIs

| Dictionary key | Details |
|----------------|----------|
Expand Down
39 changes: 24 additions & 15 deletions config/ad_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@
# JOB_NAME_STR must be @-separated to be picked up by the gmail filter which
# determines which slack channel to send the alert to
JOB_NAME_STR = "--name TEST_MODE@"
RUNFOLDERS = "/media/data3/share/testing"
RUNFOLDERS = "/media/data3/share/testing" #/media/runfolder_share/test_runs_bclconvert"
AD_LOGDIR = os.path.join(RUNFOLDERS, "automate_demultiplexing_logfiles")
MAIL_SETTINGS = MAIL_SETTINGS | { # Add test mail recipients
"pipeline_started_subj": f"{SCRIPT_MODE}. ALERT: Started pipeline for %s",
Expand All @@ -86,7 +86,7 @@

# DNAnexus upload agent path
UPLOAD_AGENT_EXE = f"{DOCUMENT_ROOT}/apps/dnanexus-upload-agent-1.5.17-linux/ua"
BCL2FASTQ_DOCKER = "seglh/bcl2fastq2:v2.20.0.422_60dbb5a"
BCLCONVERT_DOCKER = "seglh/bcl-convert:4.3.6"
GATK_DOCKER = (
"broadinstitute/gatk:4.1.8.1" # TODO this image should have a hash added in future
)
Expand Down Expand Up @@ -375,22 +375,31 @@ class DemultiplexConfig(PanelConfig):
"upload_flag_umis": "Runfolder contains UMIs. Runfolder will not be uploaded and requires manual upload: %s",
}
TESTING = TESTING
BCL2FASTQ2_CMD = (
f"docker run --rm --user %s:%s -v %s:/mnt/run -v %s:/mnt/run/%s {BCL2FASTQ_DOCKER} -R /mnt/run "
"--sample-sheet /mnt/run/%s --no-lane-splitting"
BCLCONVERT2_CMD = (
f"docker run --rm --user %s:%s -v %s:/data/input -v %s:/data/output "
f"-v %s:/var/log/bcl-convert "
f"-v %s:/samplesheet_input {BCLCONVERT_DOCKER} "
f"--force --bcl-input-directory /data/input "
f"--output-directory /data/output "
f"--sample-sheet /samplesheet_input/%s "
f"--no-lane-splitting true"
)
CD_CMD = (
f"docker run --rm --user %s:%s -v %s:/input_run {GATK_DOCKER} ./gatk CollectIlluminaLaneMetrics "
"--RUN_DIRECTORY /input_run --OUTPUT_DIRECTORY /input_run --OUTPUT_PREFIX %s"
)
DEMULTIPLEX_TEST_RUNFOLDERS = [
"999999_NB552085_0496_DEMUXINTEG",
"999999_M02353_0496_000000000-DEMUX",
"999999_A01229_0182_AHM2TSO500", # Used for testing demultiplex and sw scripts
"999999_M02631_0285_000000000-DEVOO",
"999999_NB551068_0285_OODEVINTEG",
"999999_M02631_0285_000000000-DVUMI",
"999999_NB552085_0320_ONCODEEP00", # Included as behaviour is slightly different to include copying the MasterFile
"999990_A01229_0420_AHLFLHDRX5",
#"240823_A01229_0364_BHHVYKDRX5",
#"240829_NB552085_0334_AHGMJ5AFX7",
#"240902_A01229_0367_AHHNMVDRX5"
#"999999_NB552085_0496_DEMUXINTEG",
#"999999_M02353_0496_000000000-DEMUX",
#"999999_A01229_0182_AHM2TSO500", # Used for testing demultiplex and sw scripts
#"999999_M02631_0285_000000000-DEVOO",
#"999999_NB551068_0285_OODEVINTEG",
#"999999_M02631_0285_000000000-DVUMI",
#"999999_NB552085_0320_ONCODEEP00", # Included as behaviour is slightly different to include copying the MasterFile
]
SEQUENCER_IDS = {
# Requires_ic denotes sequencers requiring md5 checksums from integrity check to be assessed
Expand Down Expand Up @@ -567,7 +576,7 @@ class ToolboxConfig(PanelConfig):
}
FLAG_FILES = {
"upload_started": "DNANexus_upload_started.txt", # Holds upload agent output
"bcl2fastqlog": "bcl2fastq2_output.log", # Holds bcl2fastq2 logs
"bclconvertlog": "bclconvert2_output.log", # Holds bclconvert2 logs
"md5checksum": "md5checksum.txt", # File holding checksum results
"sscheck_flag": "sscheck_flagfile.txt", # Denotes SampleSheet has been checked
"seq_complete": "RTAComplete.txt", # Sequencing complete file
Expand All @@ -585,9 +594,9 @@ class ToolboxConfig(PanelConfig):
"executable": "docker",
"test_cmd": f"docker run --rm {GATK_DOCKER} ./gatk CollectIlluminaLaneMetrics --version",
},
"bcl2fastq2": {
"bclconvert2": {
"executable": "docker",
"test_cmd": f"docker run --rm {BCL2FASTQ_DOCKER} --version",
"test_cmd": f"docker run --rm {BCLCONVERT_DOCKER} --version",
},
}

Expand Down
20 changes: 10 additions & 10 deletions config/log_msgs_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -75,10 +75,10 @@
"runfolder_names": "Runfolders identified for processing: %s",
"script_success": "Runfolder has been successfully processed by the demultiplex script: %s",
"demultiplexing_required": "Demultiplexing is required for this runfolder",
"bcl2fastq_start": "Demultiplexing started using bcl2fastq2 command: %s",
"bcl2fastq_complete": "Demultiplexing completed successfully for %s",
"bcl2fastq_failed": "Demultiplexing failed - bcl2fastq2 subprocess failed. Script exited. Stdout: %s. Stderr: %s",
"demux_already_complete": "Demultiplexing already completed. bcl2fastq2 log found @ %s",
"bclconvert_start": "Demultiplexing started using bclconvert2 command: %s",
"bclconvert_complete": "Demultiplexing completed successfully for %s",
"bclconvert_failed": "Demultiplexing failed - bclconvert2 subprocess failed. Script exited. Stdout: %s. Stderr: %s",
"demux_already_complete": "Demultiplexing already completed. bclconvert2 log found @ %s",
"skipping_runfolder": "Upload flagfile present denoting runfolder has been uploaded - skipping runfolder: %s",
"demux_not_complete": "Demultiplexing not yet completed. No demultiplex log found @ %s",
"run_finished": "Run finished - RTAComplete.txt found @ %s",
Expand All @@ -93,19 +93,19 @@
"ic_pass": "Integrity check passed. 'Checksums match' message present in md5checksum file: %s",
"ic_fail": "Integrity check failed. 'Checksums do not match' message present in md5checksum file: %s",
"unexpected_checksumfile_contents": "Contents of the md5checksum file are unexpected. See: %s",
"create_bcl2fastqlog_pass": "Created bcl2fastq2 logfile: %s",
"create_bcl2fastqlog_fail": "Failed to create bcl2fastq2 logfile. Script exited. Exception: %s",
"create_bclconvertlog_pass": "Created bclconvert2 logfile: %s",
"create_bclconvertlog_fail": "Failed to create bclconvert2 logfile. Script exited. Exception: %s",
"demux_not_required": "Runfolder is a %s",
"dev_run_umis": "Development run requires manual processing as it contains UMIs",
"tso_run": "TSO run identified",
"write_msg_to_bcl2fastqlog": "Message successfully written to bcl2fastq2_output.log",
"bcl2fastqlog_empty": "BCL2FASTQ2 logfile is empty for run %s. Please see logfile %s",
"write_msg_to_bclconvertlog": "Message successfully written to bclconvert2_output.log",
"bclconvertlog_empty": "BCLCONVERT2 logfile is empty for run %s. Please see logfile %s",
"running_cd": "Running the following command for cluster density calculation: %s",
"cd_success": "Cluster density calculation saved to %s",
"cd_fail": "Cluster density calculation failed. Error: %s. Script exited",
"file_copy_success": "File successfully copied from %s to %s",
"file_copy_fail": "Could not copy file - file does not exist: %s",
"re_demultiplex": "Invalid fastqs were identified. Bcl2fastq log has been removed to trigger re-demultiplex",
"re_demultiplex": "Invalid fastqs were identified. Bclconvert log has been removed to trigger re-demultiplex",
},
"sw": {
"runfolder_identified": "Identified runfolder: %s",
Expand All @@ -120,7 +120,7 @@
"demux_complete": "Run has been previously successfully processed by the demultiplexing script",
"success_string_absent": "Run has previously been demultiplexed but no success string is present",
"not_yet_demultiplexed": "Demultiplexing has not been performed",
"bcl2fastqlog_empty": "Bcl2fastq log file exists but is empty",
"bclconvertlog_empty": "Bclconvert log file exists but is empty",
"nonexistent_files": "Not all files exist: %s",
"view_users": "Users identifed that require VIEW project permissions: %s",
"admin_users": "Users identifed that require ADMINISTER project permissions: %s",
Expand Down
2 changes: 1 addition & 1 deletion config/panel_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
are applied at the point of analysis, so R and M numbers (test_number) for these are not listed below. These
pan numbers do not necessarily refer to bed files but rather project configuration (e.g. DNAnexus instances,
project layout etc.)
- Development runs have two options for pan numbers, one for runs that require standard processing with bcl2fastq
- Development runs have two options for pan numbers, one for runs that require standard processing with bclconvert
and one for runs that require manual processing as they have UMIs

Dictionary keys and values are as follows. Values are None where they are not
Expand Down
Loading
Loading