Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CRITICAL: job status reported by "cromshell list -u" is incorrect and never updates. #182

Open
dalessioluca opened this issue Nov 3, 2021 · 6 comments
Assignees
Labels
bug Something isn't working

Comments

@dalessioluca
Copy link

dalessioluca commented Nov 3, 2021

cromshell list -u is supposed to check completion status of all unfinished jobs.
However sometimes it reports incorrect values while cromshell status reports the correct ones.
Even after running cromshell status with a specific job id, cromshell list -u keep listing the old incorrect status.

The implication is that the status reported by cromshell list -u is unreliable.
This could lead to job keep running silently while the user believe that those job were terminated and therefore this is a critical bug.

I have not figure out how to replicate the problem.
However here there are 8 examples of jobs that are listed as running but are in fact terminated.

Screen Shot 2021-11-29 at 9 32 18 AM

@dalessioluca dalessioluca added the bug Something isn't working label Nov 3, 2021
@lbergelson
Copy link
Member

Huh.... I wonder why thats happening.

@jonn-smith
Copy link
Collaborator

Probably has to do with how the TSV gets updated when you query / update it.

Somewhere in teh status function the ~/.cromshell/<TSV> file is updated. That's almost certainly where the problem lies.

@SHuang-Broad
Copy link
Contributor

Priority of for list -u in cromshell 2.0 bumped. @bshifaw

@dalessioluca
Copy link
Author

dalessioluca commented Nov 3, 2021

I have just noted that the jobs with the wrong status are present in 3 tsv files. Could that be part of the problem?

Screen Shot 2021-11-03 at 3 12 49 PM

@dalessioluca
Copy link
Author

You can place this script in your .cromshell directory to check the status of your jobs. It simply runs cromshell status in a loop.

  1 #!/bin/bash
  2 cat all.workflow.database.tsv | awk '{print $(NF-2)}' | sort | uniq > id_to_check.txt  #check only most current ids
  3 # cat all.workflow.database.tsv* | awk '{print $(NF-2)}' | sort | uniq > id_to_check.txt # check all ids
  4 lines=$( cat id_to_check.txt )
  5 
  6 
  7 rm -rf status.txt
  8 for job_id in $lines
  9 do
 10 >-------if [ $job_id != 'WDL_NAME' ]; then
 11 >------->-------status=$(cromshell status $job_id | grep "status" )
 12 >------->-------echo $job_id $status >> status.txt
 13 >-------fi
 14 done
 15 
 16 echo "The following jobs are running:"
 17 cat status.txt | grep "unning"

@jonn-smith
Copy link
Collaborator

The multiple files shouldn't be an issue - it should only be looking in all.workflow.database.tsv.

I'll take a look at this very soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants