Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SystemError, Xenon, ssh adaptor, session down #78

Closed
marcvdijk opened this issue Sep 27, 2018 · 1 comment
Closed

SystemError, Xenon, ssh adaptor, session down #78

marcvdijk opened this issue Sep 27, 2018 · 1 comment
Assignees
Labels

Comments

@marcvdijk
Copy link
Member

When running a MDStudio Workflow that includes lie_md component endpoints that are using Cerise, a SystemError is sometimes raised.

System specs: OSX 10.11.6, lie_md running 'standalone' using python3.6, cerise client is Binac cluster GPU queue.

History leading up to error: call made from lie_workflow, running solvent-ligand MD simulation. At least the first run of the workflow starting from a clean cerise specialization docker (created the first time by lie_md) finishes without the SystemError being raised. Only in the second run or later using the same cerise specialization docker still running is the following System error raise (from cerise_backend.log):

[2018-09-27 12:18:12.978] [DEBUG] State is now SystemError [cerise.back_end.execution_manager]
[2018-09-27 12:18:12.978] [DEBUG] Deleting job 272923c41d334e93a1efa95360583772 [cerise.back_end.execution_manager]
[2018-09-27 12:18:12.982] [CRITICAL] An internal error occurred when processing job 272923c41d334e93a1efa95360583772 [cerise.back_end.execution_manager]
[2018-09-27 12:18:12.982] [CRITICAL] Traceback (most recent call last):
  File "cerise/../cerise/back_end/execution_manager.py", line 152, in _process_jobs
    self._delete_job(job_id, job)
  File "cerise/../cerise/back_end/execution_manager.py", line 74, in _delete_job
    self._remote_files.delete_job(job_id)
  File "cerise/../cerise/back_end/xenon_remote_files.py", line 219, in delete_job
    self._rm_remote_dir(job_id, '')
  File "cerise/../cerise/back_end/xenon_remote_files.py", line 372, in _rm_remote_dir
    self._x_recursive_delete(x_remote_path)
  File "cerise/../cerise/back_end/xenon_remote_files.py", line 418, in _x_recursive_delete
    if self._x.files().exists(x_remote_path):
jpype._jexception.nl.esciencecenter.xenon.XenonExceptionPyRaisable: nl.esciencecenter.xenon.XenonException: ssh adaptor: session is down
 [cerise.back_end.execution_manager]

The lie_md output leading up to this point:

2018-09-27T14:12:18+0200 Crossbar host is: localhost
2018-09-27T14:12:18+0200 Collecting logs on session "MDWampApi"
2018-09-27T14:12:18+0200 Uploaded schemas for MDWampApi
2018-09-27T14:12:18+0200 MDWampApi: 2 procedures successfully registered
2018-09-27T14:12:37+0200 starting liemd task_id: 4602185954418892
2018-09-27T14:12:37+0200 store output in: /tmp/mdstudio/lie_md/4602185954418892
2018-09-27T14:12:37+0200 Searching for pending jobs in DB
2018-09-27T14:12:37+0200 There are no pending jobs!
2018-09-27T14:12:37+0200 Created a new Cerise-client service
2018-09-27T14:12:37+0200 Creating Cerise-client job
2018-09-27T14:12:37+0200 Only ligand_file defined, perform SOLVENT-LIGAND MD
2018-09-27T14:12:37+0200 CWL worflow is: /Users/mvdijk/Documents/WorkProjects/liestudio-master/lie_md/lie_md/data/solvent_ligand.cwl
2018-09-27T14:12:37+0200 Running the job in a remote machine using docker: mdstudio/cerise-mdstudio-binac:gpu
2018-09-27T14:12:39+0200 Added service to mongoDB
2018-09-27T14:12:39+0200 There was an error: SystemError
2018-09-27T14:12:39+0200 Cerise log stored at: /tmp/mdstudio/lie_md/4602185954418892/cerise.log
2018-09-27T14:12:39+0200 removing job: 272923c41d334e93a1efa95360583772 from Cerise-client
2018-09-27T14:12:39+0200 Extracting output from: /tmp/mdstudio/lie_md/4602185954418892
@LourensVeen
Copy link
Member

Ah, looks like the SSH connection went down, and Cerise doesn't automatically reconnect. That's a known issue (see #25), it should of course try to reconnect and continue going. IIRC, I actually put that functionality into Cerulean, so it should come for free with the switch from Xenon to Cerulean. I'll get to that ASAP.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants