Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors with the real run of the example provided #12

Open
sylestiel opened this issue Aug 31, 2023 · 3 comments
Open

Errors with the real run of the example provided #12

sylestiel opened this issue Aug 31, 2023 · 3 comments

Comments

@sylestiel
Copy link

sylestiel commented Aug 31, 2023

Hello @JGASmits,

Although the dry run appears to have worked the real one did not.
Here is the log file:

(anansnake) iMac-Pro:anansnake pediatrics$ less Complete log: .snakemake/log/2023-08-31T143302.657342.snakemake.log
Complete: No such file or directory
log:: No such file or directory
Press RETURN to continue

plot_type : png

Resources
mem_mb : 48000
_cores : 12
deseq2 : 1

Conditions
group2 :
RNA-seq samples: ['1k-cell-1', '1k-cell-2', 'GSM1483740']
ATAC-seq samples: ['GSM3756606', 'GSM3756607', 'GSM3756608']
group1 :
RNA-seq samples: ['128-cell-1', '128-cell-2', 'GSM1483739']
ATAC-seq samples: ['GSM3756599', 'GSM3756600']

Building DAG of jobs...
Using shell: /bin/bash
Provided cores: 12
Rules claiming more threads will be scaled down.
Provided resources: mem_mb=48000, deseq2=1
Job stats:
job count min threads max threads


all 1 1 1
binding 2 1 1
deseq2 2 1 1
influence 2 1 1
maelstrom 1 12 12
motif2factors 1 12 12
network 2 1 1
pfmscorefile 1 12 12
plot 2 1 1
total 14 1 12

Select jobs to execute...

[Thu Aug 31 14:33:05 2023]
rule motif2factors:
input: /Users/pediatrics/anansnake/GRCz11
output: /Users/pediatrics/anansnake/example/outdir/gimme/GRCz11.gimme.vertebrate.v5.0.pfm
log: /Users/pediatrics/anansnake/example/outdir/gimme/log_GRCz11_m2f.txt
jobid: 5
reason: Missing output files: /Users/pediatrics/anansnake/example/outdir/gimme/GRCz11.gimme.vertebrate.v5.0.pfm
threads: 12
resources: tmpdir=/var/folders/2c/zzjsgs_53vqflzjl28hf1x7r0000gn/T

Activating conda environment: .snakemake/conda/3f88efe941f72bcdb4d5867b0d6db92f_
Activating conda environment: .snakemake/conda/3f88efe941f72bcdb4d5867b0d6db92f_
[Thu Aug 31 14:33:36 2023]
Error in rule motif2factors:
jobid: 5
input: /Users/pediatrics/anansnake/GRCz11
output: /Users/pediatrics/anansnake/example/outdir/gimme/GRCz11.gimme.vertebrate.v5.0.pfm
log: /Users/pediatrics/anansnake/example/outdir/gimme/log_GRCz11_m2f.txt (check log file(s) for error message)
conda-env: /Users/pediatrics/anansnake/.snakemake/conda/3f88efe941f72bcdb4d5867b0d6db92f_

RuleException:
CalledProcessError in line 24 of /Users/pediatrics/anaconda3/envs/anansnake/lib/python3.8/site-packages/anansnake/rules/gimme.smk:
Command 'source /Users/pediatrics/anaconda3/envs/anansnake/bin/activate '/Users/pediatrics/anansnake/.snakemake/conda/3f88efe941f72bcdb4d5867b0d6db92f_'; set -euo pipefail; python /Users/pediatrics/anansnake/.snakemake/scripts/tmp5kcimtt8.motif2factors.py' returned non-zero exit status 1.
File "/Users/pediatrics/anaconda3/envs/anansnake/lib/python3.8/site-packages/anansnake/rules/gimme.smk", line 24, in __rule_motif2factors
File "/Users/pediatrics/anaconda3/envs/anansnake/lib/python3.8/concurrent/futures/thread.py", line 57, in run
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2023-08-31T143302.657342.snakemake.log

Your input would be much appreciated!

Thank you!

@sylestiel
Copy link
Author

sylestiel commented Sep 5, 2023

@JGASmits
Kindly provide some assistance with debugging /deciphering the problem when you can spare some time.
The contents of the log file
log: /Users/pediatrics/anansnake/example/outdir/gimme/log_GRCz11_m2f.txt (check log file(s) for error message) reads as follows:
16:37:18 | INFO | Making a new reference for: /Users/pediatrics/anansnake/GRCz11.
16:37:18 | INFO | You say the gimme.vertebrate.v5.0 database is based on: GRCh38.p13 & GRCm38.p6.
16:37:18 | INFO | For better orthology inference we are also using these assemblies: danRer11 & UCB_Xtro_10.0 & GRCg6a & BraLan2 & oryLat2 & ARS-UCD1.2 & phaCin_unsw_v4.1 & rCheMyd1.pri.
16:37:18 | INFO | Using lenient strategy for orthology/name inference.
16:37:18 | INFO | genomes_dir: /Users/pediatrics/.local/share/genomes
16:37:18 | INFO | tmpdir: /Users/pediatrics/anansnake/example/outdir/tmp/motif2factors
16:37:18 | INFO | outdir: /Users/pediatrics/anansnake/example/outdir/gimme
16:37:18 | INFO | Downloading all assemblies.
16:37:18 | INFO | BraLan2 was already downloaded, using that version.
16:37:18 | INFO | GRCg6a was already downloaded, using that version.
16:37:18 | INFO | ARS-UCD1.2 was already downloaded, using that version.
16:37:18 | INFO | GRCm38.p6 was already downloaded, using that version.
16:37:18 | INFO | UCB_Xtro_10.0 was already downloaded, using that version.
16:37:18 | INFO | oryLat2 was already downloaded, using that version.
16:37:18 | INFO | rCheMyd1.pri was already downloaded, using that version.
16:37:18 | INFO | GRCz11 was already downloaded, using that version.
16:37:18 | INFO | GRCh38.p13 was already downloaded, using that version.
16:37:18 | INFO | danRer11 was already downloaded, using that version.
16:37:18 | INFO | phaCin_unsw_v4.1 was already downloaded, using that version.
16:37:18 | INFO | Taking the longest protein per gene per assembly.
16:37:18 | INFO | Processing BraLan2.
16:37:20 | INFO | Processing GRCg6a.
16:37:22 | INFO | Processing ARS-UCD1.2.
16:37:24 | INFO | Processing GRCm38.p6.
16:37:27 | INFO | Processing UCB_Xtro_10.0.
16:37:30 | INFO | Processing oryLat2.
16:37:30 | INFO | Processing rCheMyd1.pri.
16:37:32 | INFO | Processing GRCz11.
16:37:35 | INFO | Processing GRCh38.p13.
16:37:38 | INFO | Processing danRer11.
16:37:41 | INFO | Processing phaCin_unsw_v4.1.
16:37:43 | INFO | Running orthofinder to find orthologs.

With respect to the same I have the following questions:

  1. How can I tell which part of the code is generating the log output since searching for the strings... "Running orthofinder to find orthologs"
    ... and...
    "Running orthofinder"
    ... came up empty in the github/anansnake repository
  2. Do you need more information from my end to help troubleshoot the problem.

Thanks!

@JGASmits
Copy link
Contributor

Hi Sylestiel,

It seems to be crashing on orthofinder, i think because the anansnake example data is Zebrafish data.
At this step anansnake links genes of the sample to ortholog genes in the human data (on which the models were trained to predict TF binding). What is weird is that it crashes on the orthofinder step without generating a error in the logfile. @siebrenf any insights?

Luckely this step shouldnt be needed for human or mouse data (Please correct me if I'm wrong there @siebrenf ).

Greetings Jos

@siebrenf
Copy link
Member

Dear Sylestiel,

The example should work with the zebrafish test data(!), so I'm wondering if Orthofinder has an issue with Mac...

Like Jos said, the issue seems to occur in Orthofinder, which you can skip if you use human/mouse data, by setting get_orthologs: false in the config.

If you do not use human/mouse data, we need to figure out if this is fixable on our end... I've ran some tests, and I think the orthofinder stderr/stdout is not passed on to the anansnake log. The easiest solution here is to install orthofinder, and run this command:
orthofinder -f /Users/pediatrics/anansnake/example/outdir/gimme/orthofinder/prim_genes -t $THREADS (You should still have this folder from your previous run. And dont forget to set your preferred number of THREADS). Once this works, you can restart the anansnake run, and it should use the output you generated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants