Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cactus-hal2chains - KeyboardInterrupt error #1554

Open
KabitaBaral1 opened this issue Dec 2, 2024 · 2 comments
Open

cactus-hal2chains - KeyboardInterrupt error #1554

KabitaBaral1 opened this issue Dec 2, 2024 · 2 comments

Comments

@KabitaBaral1
Copy link

Hi, I am running cactus-hal2chains on the 447 mammalian genome hal file from Cactus.
I am using the following script
TOIL_SLURM_ARGS="--partition=apophis --time=1000" cactus-hal2chains ./jobstore hg38.447way.hal chains-dir --queryGenom
es Homo_sapiens --batchSystem slurm --doubleMem true --workDir /work/dk_lab/Cactus-447way/Input_chain/ --defaultCores
40 --defaultMemory 100Gi --defaultDisk 2Ti

It runs for hours and then stops with the following error:
sent = os.sendfile(outfd, infd, offset, blocksize)
KeyboardInterrupt

I am not sure exactly how to fix it. I looked it up and found some fixes like upgrading python but my python is 3.12.
The file is very large since its the 447 mammalian hal file. I was wondering if you have nay solution.

Thank you. I have also attached the full error log file.
cactus_hal2chains_error.txt

Kabita

@glennhickey
Copy link
Collaborator

I've seen this (very misleading!) error when my job gets evicted due to using too much time on our slurm cluster. So I suspect that's what's happening here.

And it seems to be happening after spending several hours trying to copy the input HAL file to

 Reading HAL file from job store to /work/dk_lab/Cactus-447way/Input_chain/toilwf-d862202f82e557698a5cec860c8a2e30/6945/87ee/tmp775b92wa/hg38.447way.hal

So I guess you are on a pretty slow network drive? Anyway, you should be able to resolve this by boosting up -time=1000" to something a fair bit larger...

@KabitaBaral1
Copy link
Author

Hi Glen,

Thank you for the response. The problem is that my job is running just on leader thread. I am not sure I understand what parameters to change when I run cactus-hal2chains on slurm.
this is the script that I am running:
#!/bin/bash
#SBATCH --partition=apophis
#SBATCH --job-name=hal2chains
#SBATCH --output=hal2chains_%a.out
#SBATCH --error=hal2chains_%a.err
#SBATCH --cpus-per-task=40
#SBATCH --time=168:00:00
TOIL_SLURM_ARGS="--partition=apophis --time=8000" cactus-hal2chains ./jobstore hg38.447way.hal chains-dir --queryGenomes Homo_sapiens --batchSystem slurm --doubleMem true --workDir /work/dk_lab/Cactus-447way/Input_chain/ --defaultCores 40 --defaultMemory 100Gi --defaultDisk 2Ti
Could you please advice me on what parameters specific to slurm am I missing that is causing my job to run just on leader thread. I looked up both help doc for cactus-hal2chains and toil but I am still struggling.
Thank you. Much appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants