cactus-hal2chains - KeyboardInterrupt error #1554

KabitaBaral1 · 2024-12-02T21:40:36Z

Hi, I am running cactus-hal2chains on the 447 mammalian genome hal file from Cactus.
I am using the following script
TOIL_SLURM_ARGS="--partition=apophis --time=1000" cactus-hal2chains ./jobstore hg38.447way.hal chains-dir --queryGenom
es Homo_sapiens --batchSystem slurm --doubleMem true --workDir /work/dk_lab/Cactus-447way/Input_chain/ --defaultCores
40 --defaultMemory 100Gi --defaultDisk 2Ti

It runs for hours and then stops with the following error:
sent = os.sendfile(outfd, infd, offset, blocksize)
KeyboardInterrupt

I am not sure exactly how to fix it. I looked it up and found some fixes like upgrading python but my python is 3.12.
The file is very large since its the 447 mammalian hal file. I was wondering if you have nay solution.

Thank you. I have also attached the full error log file.
cactus_hal2chains_error.txt

Kabita

glennhickey · 2024-12-03T17:05:07Z

I've seen this (very misleading!) error when my job gets evicted due to using too much time on our slurm cluster. So I suspect that's what's happening here.

And it seems to be happening after spending several hours trying to copy the input HAL file to

 Reading HAL file from job store to /work/dk_lab/Cactus-447way/Input_chain/toilwf-d862202f82e557698a5cec860c8a2e30/6945/87ee/tmp775b92wa/hg38.447way.hal

So I guess you are on a pretty slow network drive? Anyway, you should be able to resolve this by boosting up -time=1000" to something a fair bit larger...

KabitaBaral1 · 2024-12-05T19:15:15Z

Hi Glen,

Thank you for the response. The problem is that my job is running just on leader thread. I am not sure I understand what parameters to change when I run cactus-hal2chains on slurm.
this is the script that I am running:
#!/bin/bash
#SBATCH --partition=apophis
#SBATCH --job-name=hal2chains
#SBATCH --output=hal2chains_%a.out
#SBATCH --error=hal2chains_%a.err
#SBATCH --cpus-per-task=40
#SBATCH --time=168:00:00
TOIL_SLURM_ARGS="--partition=apophis --time=8000" cactus-hal2chains ./jobstore hg38.447way.hal chains-dir --queryGenomes Homo_sapiens --batchSystem slurm --doubleMem true --workDir /work/dk_lab/Cactus-447way/Input_chain/ --defaultCores 40 --defaultMemory 100Gi --defaultDisk 2Ti
Could you please advice me on what parameters specific to slurm am I missing that is causing my job to run just on leader thread. I looked up both help doc for cactus-hal2chains and toil but I am still struggling.
Thank you. Much appreciated.

glennhickey mentioned this issue Dec 3, 2024

Error message when job killed for running out of slurm time is very misleading DataBiosphere/toil#5177

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cactus-hal2chains - KeyboardInterrupt error #1554

cactus-hal2chains - KeyboardInterrupt error #1554

KabitaBaral1 commented Dec 2, 2024

glennhickey commented Dec 3, 2024

KabitaBaral1 commented Dec 5, 2024

cactus-hal2chains - KeyboardInterrupt error #1554

cactus-hal2chains - KeyboardInterrupt error #1554

Comments

KabitaBaral1 commented Dec 2, 2024

glennhickey commented Dec 3, 2024

KabitaBaral1 commented Dec 5, 2024