Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error in pepatac.py script #299

Open
ASNbioinf opened this issue Nov 19, 2024 · 6 comments
Open

error in pepatac.py script #299

ASNbioinf opened this issue Nov 19, 2024 · 6 comments

Comments

@ASNbioinf
Copy link

ASNbioinf commented Nov 19, 2024

### Pipeline run code and environment:

*              Command:  `/my/path/pepatac/pipelines/pepatac.py -O /my/path/OUTPUT -P 10 --sample-name sample_S1_R1_001 --input /my/path/FastQ/sample_S1_R1_001.fastq.gz --genome hg38 --single-or-paired single --trimmer trimmomatic --aligner bwa --deduplicator picard --peak-caller macs3 -gs 2.7e9 --peak-type fixed --keep --prealignment-names rCRSd --prealignment-index rCRSd=/my/path/pepatac/genome_folder/default/94e0d21feb576e6af61cd2a798ad30682ef2428bb7eabbb4 --genome-index /my/path/pepatac/genome_folder/alias/hg38/bwa_index/default/hg38 --chrom-sizes /my/path/pepatac/genome_folder/hg38.chrom.sizes --TSS-name /my/path/pepatac/genome_folder/alias/hg38/refgene_anno/default/hg38_TSS.bed --blacklist /my/path/pepatac/genome_folder/hg38-blacklist.v2.bed --anno-name /my/path/pepatac/genome_folder/alias/hg38/feat_annotation/default/hg38_annotations.bed.gz`
*         Compute host:  10ea48504247
*          Working dir:  /data
*            Outfolder:  /my/path/OUTPUT/sample_S1_R1_001/
*  Pipeline started at:   (11-19 13:59:38) elapsed: 2.0 _TIME_

### Version log:

*       Python version:  3.8.10
*          Pypiper dir:  `/usr/local/lib/python3.8/dist-packages/pypiper`
*      Pypiper version:  0.12.1
*         Pipeline dir:  `/my/path/pepatac/pipelines`
*     Pipeline version:  0.11.3
*        Pipeline hash:  9ea1043cb2eba3194a30c0f759a7cc3316c8b139
*      Pipeline branch:  * master
*        Pipeline date:  2024-11-12 15:59:11 -0500
*        Pipeline diff:  3 files changed, 20 insertions(+), 82 deletions(-)

### Arguments passed to pipeline:

*           `TSS_name`:  `/my/path/pepatac/genome_folder/alias/hg38/refgene_anno/default/hg38_TSS.bed`
*            `aligner`:  `bwa`
*          `anno_name`:  `/my/path/pepatac/genome_folder/alias/hg38/feat_annotation/default/hg38_annotations.bed.gz`
*          `blacklist`:  `/my/path/pepatac/genome_folder/hg38-blacklist.v2.bed`
*        `chrom_sizes`:  `/my/path/pepatac/genome_folder/hg38.chrom.sizes`
*        `config_file`:  `pepatac.yaml`
*              `cores`:  `10`
*       `deduplicator`:  `picard`
*              `dirty`:  `False`
*             `extend`:  `250`
*              `fasta`:  `None`
*       `force_follow`:  `False`
*     `frip_ref_peaks`:  `None`
*    `genome_assembly`:  `hg38`
*       `genome_index`:  `/my/path/pepatac/genome_folder/alias/hg38/bwa_index/default/hg38`
*        `genome_size`:  `2.7e9`
*              `input`:  `['/my/path/FastQ/sample_S1_R1_001.fastq.gz']`
*             `input2`:  `None`
*               `keep`:  `True`
*               `lite`:  `False`
*             `logdev`:  `False`
*                `mem`:  `4000`
*              `motif`:  `False`
*          `new_start`:  `False`
*            `no_fifo`:  `False`
*           `no_scale`:  `False`
*      `output_parent`:  `/my/path/OUTPUT`
*         `paired_end`:  `False`
*        `peak_caller`:  `macs3`
*          `peak_type`:  `fixed`
* `prealignment_index`:  `['rCRSd=/my/path/pepatac/genome_folder/default/94e0d21feb576e6af61cd2a798ad30682ef2428bb7eabbb4']`
* `prealignment_names`:  `['rCRSd']`
*         `prioritize`:  `False`
*            `recover`:  `False`
*        `sample_name`:  `sample_S1_R1_001`
*        `search_file`:  `None`
*             `silent`:  `False`
*   `single_or_paired`:  `single`
*             `skipqc`:  `False`
*                `sob`:  `False`
*           `testmode`:  `False`
*            `trimmer`:  `trimmomatic`
*          `verbosity`:  `None`

----------------------------------------

Removed existing flag: '/my/path/OUTPUT/sample_S1_R1_001/PEPATAC_failed.flag'
Local input file: /my/path/FastQ/sample_S1_R1_001.fastq.gz

> `File_mb`	882	2	_RES_

> `Read_type`	single	PEPATAC	_RES_

> `Genome`	hg38	PEPATAC	_RES_

### Merge/link and fastq conversion:  (11-19 13:59:38) elapsed: 0.0 _TIME_

Number of input file sets: 1
Target exists: `/my/path/OUTPUT/sample_S1_R1_001/raw/sample_S1_R1_001.fastq.gz`  
Local input file: '/my/path/OUTPUT/sample_S1_R1_001/raw/sample_S1_R1_001.fastq.gz'
Found .fastq.gz file
Found .fq.gz file; no conversion necessary
Target exists: `/my/path/OUTPUT/sample_S1_R1_001/fastq/sample_S1_R1_001_R1.fastq.gz`  

### Adapter trimming:  (11-19 13:59:38) elapsed: 0.0 _TIME_

trimmomatic local_input_files: ['/my/path/OUTPUT/sample_S1_R1_001/raw/sample_S1_R1_001.fastq.gz']
Target exists: `/my/path/OUTPUT/sample_S1_R1_001/fastq/sample_S1_R1_001_R1_trim.fastq`  

### Prealignments (11-19 13:59:38) elapsed: 0.0 _TIME_


### Map to rCRSd (11-19 13:59:38) elapsed: 0.0 _TIME_

Traceback (most recent call last):
  File "/my/path/pepatac/pipelines/pepatac.py", line 2779, in <module>
    sys.exit(main())
  File "/my/path/pepatac/pipelines/pepatac.py", line 971, in main
    unmap_fq1, unmap_fq2 = _align(
  File "/my/path/pepatac/pipelines/pepatac.py", line 346, in _align
    pm.run(cmd, mapped_bam)
UnboundLocalError: local variable 'cmd' referenced before assignment
Starting cleanup: 3 files; 3 conditional files for cleanup

Cleaning up flagged intermediate files. . .

Conditional flag found: []

These conditional files were left in place:

- /my/path/OUTPUT/sample_S1_R1_001/fastq/sample_S1_R1_001*.fastq
- /my/path/OUTPUT/sample_S1_R1_001/fastq/*.fastq
- /my/path/OUTPUT/sample_S1_R1_001/fastq/*.log
- /my/path/OUTPUT/sample_S1_R1_001/prealignments/tmpk3rzkr2d
- /my/path/OUTPUT/sample_S1_R1_001/prealignments/sample_S1_R1_001_rCRSd_all.bam
- /my/path/OUTPUT/sample_S1_R1_001/prealignments/sample_S1_R1_001_rCRSd_unmapped.bam

### Pipeline failed at:  (11-19 13:59:38) elapsed: 0.0 _TIME_
Total time: 0:00:03
Failure reason: Pipeline failure. See details above.
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/pypiper/manager.py", line 1799, in _exit_handler
    self.fail_pipeline(Exception("Pipeline failure. See details above."))
  File "/usr/local/lib/python3.8/dist-packages/pypiper/manager.py", line 1660, in fail_pipeline
    raise exc
Exception: Pipeline failure. See details above.

I run PEPATAC in a container. The "UnboundLocalError: local variable 'cmd' referenced before assignment" indicates to change the assignment of the variable “cmd” in the pepatac.py script

Is there a correct version of this file pepatac.py???

To fix the second error "Error in atexit._run_exitfuncs" I tried to reinstall pypiper with the command "pip install pypiper" but the problem persists.

@donaldcampbelljr
Copy link
Member

For the 2nd item (pypiper), try to pip install --force-reinstall and ensure the tool versions match those in the requirements.txt file. Specifically, please confirm the versions match for:
looper==1.6.0
piper==0.14.0
pipestat==0.6.0

pypiper is actually labeled as piper on PyPI, so that might be why attempting a fresh install did not work.

For the 1st error, looking at pepatac.py, I see that cmd may be referenced before assignment if paired is False:

        if paired:
            pm.run([cmd1, cmd2, cmd3, cmd4, filter_pair], out_fastq_r2_gz)
        else:
            if args.keep:
                pm.run(cmd, mapped_bam)
            else:
                pm.run(cmd, out_fastq_tmp_gz)

Command from pipeline log shows paired is false (its single) and keep is set to true:

`/my/path/pepatac/pipelines/pepatac.py -O /my/path/OUTPUT -P 10 --sample-name sample_S1_R1_001 --input /my/path/FastQ/sample_S1_R1_001.fastq.gz --genome hg38 --single-or-paired single --trimmer trimmomatic --aligner bwa --deduplicator picard --peak-caller macs3 -gs 2.7e9 --peak-type fixed --keep --prealignment-names rCRSd --prealignment-index rCRSd=/my/path/pepatac/genome_folder/default/94e0d21feb576e6af61cd2a798ad30682ef2428bb7eabbb4 --genome-index /my/path/pepatac/genome_folder/alias/hg38/bwa_index/default/hg38 --chrom-sizes /my/path/pepatac/genome_folder/hg38.chrom.sizes --TSS-name /my/path/pepatac/genome_folder/alias/hg38/refgene_anno/default/hg38_TSS.bed --blacklist /my/path/pepatac/genome_folder/hg38-blacklist.v2.bed --anno-name /my/path/pepatac/genome_folder/alias/hg38/feat_annotation/default/hg38_annotations.bed.gz`

Questions to answer:
What should cmd for mapped_bam? Why is it not being supplied here?

@ASNbioinf
Copy link
Author

ASNbioinf commented Nov 20, 2024

For the first error, I set -Q single option because the analysis is based on a single-end sequencing protocol. According to your advise @donaldcampbelljr I removed the variable --keep because the --keep option is generally used to retain intermediate files, but in single-end mode, it's not applicable since the pipeline is optimized for that specific setup. Finally, I skip the pre-alignment and I solve the problem.

For the 2nd error, I installed the "requirements.txt" and the version match for:
looper==1.6.0
piper==0.14.0
pipestat==0.6.0

However, it returns the following new error.

### Map to genome (11-20 21:26:45) elapsed: 0.0 _TIME_

Found lock file: /my/path/OUTPUT/sample_S1_R1_001/lock.aligned_hg38__sample_S1_R1_001_sort_dedup.bam
Overwriting target...
Target to produce: `/my/path/OUTPUT/sample_S1_R1_001/aligned_hg38/sample_S1_R1_001_sort_dedup.bam`  
[E::bwa_idx_load_from_disk] fail to locate the index files

> `bwa mem -t 10  -M  /my/path/pepatac/genome_folder/alias/hg38/bwa_index/default/hg38 /my/path/OUTPUT/sample_S1_R1_001/fastq/sample_S1_R1_001_R1_trim.fastq | samtools view -bS - -@ 1 | samtools sort - -@ 1 -T /my/path/OUTPUT/sample_S1_R1_001/aligned_hg38/tmpjnfx0a06 -o /my/path/OUTPUT/sample_S1_R1_001/aligned_hg38/sample_S1_R1_001_temp.bam` (81,82,83)
<pre>
[main_samview] fail to read the header from "-".
samtools sort: failed to read header from "-"
</pre>
Command completed. Elapsed time: 0:00:00. Running peak memory: 0.001GB.  
  PID: 81;	Command: bwa;	Return code: 1;	Memory used: 0.0GB  
  PID: 82;	Command: samtools;	Return code: 1;	Memory used: 0.001GB  
  PID: 83;	Command: samtools;	Return code: 1;	Memory used: 0.001GB

Starting cleanup: 1 files; 3 conditional files for cleanup

Cleaning up flagged intermediate files. . .

Conditional flag found: []

These conditional files were left in place:

- /my/path/OUTPUT/sample_S1_R1_001/fastq/sample_S1_R1_001*.fastq
- /my/path/OUTPUT/sample_S1_R1_001/fastq/*.fastq
- /my/path/OUTPUT/sample_S1_R1_001/fastq/*.log
- /my/path/OUTPUT/sample_S1_R1_001/aligned_hg38/tmpjnfx0a06

### Pipeline failed at:  (11-20 21:26:45) elapsed: 0.0 _TIME_

Total time: 0:00:00
Failure reason: Subprocess returned nonzero result. Check above output for details
Traceback (most recent call last):
  File "/my/path/pepatac/pipelines/pepatac.py", line 2779, in <module>
    sys.exit(main())
  File "/my/path/pepatac/pipelines/pepatac.py", line 1112, in main
    pm.run([cmd, cmd2], rmdup_bam, follow=check_alignment_genome)
  File "/usr/local/lib/python3.10/dist-packages/pypiper/manager.py", line 777, in run
    self.callprint(cmd_i, shell, lock_file, nofail, container)
  File "/usr/local/lib/python3.10/dist-packages/pypiper/manager.py", line 1028, in callprint
    self._triage_error(SubprocessError(msg), nofail)
  File "/usr/local/lib/python3.10/dist-packages/pypiper/manager.py", line 2131, in _triage_error
    self.fail_pipeline(e)
  File "/usr/local/lib/python3.10/dist-packages/pypiper/manager.py", line 1660, in fail_pipeline
    raise exc
pypiper.exceptions.SubprocessError: Subprocess returned nonzero result. Check above output for details

@donaldcampbelljr
Copy link
Member

Just reading the initial error log, it looks as though the next step of the pipeline is looking for an index file (.bai?).

Target to produce: `/my/path/OUTPUT/sample_S1_R1_001/aligned_hg38/sample_S1_R1_001_sort_dedup.bam`  
[E::bwa_idx_load_from_disk] fail to locate the index files

> `bwa mem -t 10  -M  /my/path/pepatac/genome_folder/alias/hg38/bwa_index/default/hg38 /my/path/OUTPUT/sample_S1_R1_001/fastq/sample_S1_R1_001_R1_trim.fastq | samtools view -bS - -@ 1 | samtools sort - -@ 1 -T /my/path/OUTPUT/sample_S1_R1_001/aligned_hg38/tmpjnfx0a06 -o /my/path/OUTPUT/sample_S1_R1_001/aligned_hg38/sample_S1_R1_001_temp.bam` (81,82,83)
<pre>
[main_samview] fail to read the header from "-".
samtools sort: failed to read header from "-"

Did you restart the pipeline from the initial failure? Or did you attempt a fresh restart?

Also, have you successfully run the PEPATAC tutorial on the tutorial files? https://pepatac.databio.org/en/latest/tutorial/

@ASNbioinf
Copy link
Author

ASNbioinf commented Nov 22, 2024

I attempt a fresh restart. Moreover, I downloaded the file with refgenie as explained on the tutorial:https://pepatac.databio.org/en/latest/run-container/
I used this command:refgenie pull hg38/bwa_index to pull the bwa_index into the path/my/path/pepatac/genome_folder/alias/hg38/bwa_index/default/hg38 in which there are the following files:

  • hg38.fa,
  • hg38.fa.amb,
  • hg38.fa.ann,
  • hg38.fa.bwt,
  • hg38.fa.pac,
  • hg38.fa.sa
    From the latter files should I extract .bai file? It can't seem to find the files in the indicated folder despite having the permissions to do so.

@donaldcampbelljr
Copy link
Member

After some discussion, because single end processing and using bwa aligner is failing, it is recommended to use bowtie as the aligner or, if possible, use paired until this issue can be fixed.

@donaldcampbelljr
Copy link
Member

  • From the latter files should I extract .bai file? It can't seem to find the files in the indicated folder despite having the permissions to do so.

The pipeline will create the .bai files it needs during processing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants