You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When running a MDStudio Workflow that includes lie_md component endpoints that are using Cerise, a SystemError is sometimes raised.
System specs: OSX 10.11.6, lie_md running 'standalone' using python3.6, cerise client is Binac cluster GPU queue.
History leading up to error: call made from lie_workflow, running solvent-ligand MD simulation. At least the first run of the workflow starting from a clean cerise specialization docker (created the first time by lie_md) finishes without the SystemError being raised. Only in the second run or later using the same cerise specialization docker still running is the following System error raise (from cerise_backend.log):
[2018-09-27 12:18:12.978] [DEBUG] State is now SystemError [cerise.back_end.execution_manager]
[2018-09-27 12:18:12.978] [DEBUG] Deleting job 272923c41d334e93a1efa95360583772 [cerise.back_end.execution_manager]
[2018-09-27 12:18:12.982] [CRITICAL] An internal error occurred when processing job 272923c41d334e93a1efa95360583772 [cerise.back_end.execution_manager]
[2018-09-27 12:18:12.982] [CRITICAL] Traceback (most recent call last):
File "cerise/../cerise/back_end/execution_manager.py", line 152, in _process_jobs
self._delete_job(job_id, job)
File "cerise/../cerise/back_end/execution_manager.py", line 74, in _delete_job
self._remote_files.delete_job(job_id)
File "cerise/../cerise/back_end/xenon_remote_files.py", line 219, in delete_job
self._rm_remote_dir(job_id, '')
File "cerise/../cerise/back_end/xenon_remote_files.py", line 372, in _rm_remote_dir
self._x_recursive_delete(x_remote_path)
File "cerise/../cerise/back_end/xenon_remote_files.py", line 418, in _x_recursive_delete
if self._x.files().exists(x_remote_path):
jpype._jexception.nl.esciencecenter.xenon.XenonExceptionPyRaisable: nl.esciencecenter.xenon.XenonException: ssh adaptor: session is down
[cerise.back_end.execution_manager]
The lie_md output leading up to this point:
2018-09-27T14:12:18+0200 Crossbar host is: localhost
2018-09-27T14:12:18+0200 Collecting logs on session "MDWampApi"
2018-09-27T14:12:18+0200 Uploaded schemas for MDWampApi
2018-09-27T14:12:18+0200 MDWampApi: 2 procedures successfully registered
2018-09-27T14:12:37+0200 starting liemd task_id: 4602185954418892
2018-09-27T14:12:37+0200 store output in: /tmp/mdstudio/lie_md/4602185954418892
2018-09-27T14:12:37+0200 Searching for pending jobs in DB
2018-09-27T14:12:37+0200 There are no pending jobs!
2018-09-27T14:12:37+0200 Created a new Cerise-client service
2018-09-27T14:12:37+0200 Creating Cerise-client job
2018-09-27T14:12:37+0200 Only ligand_file defined, perform SOLVENT-LIGAND MD
2018-09-27T14:12:37+0200 CWL worflow is: /Users/mvdijk/Documents/WorkProjects/liestudio-master/lie_md/lie_md/data/solvent_ligand.cwl
2018-09-27T14:12:37+0200 Running the job in a remote machine using docker: mdstudio/cerise-mdstudio-binac:gpu
2018-09-27T14:12:39+0200 Added service to mongoDB
2018-09-27T14:12:39+0200 There was an error: SystemError
2018-09-27T14:12:39+0200 Cerise log stored at: /tmp/mdstudio/lie_md/4602185954418892/cerise.log
2018-09-27T14:12:39+0200 removing job: 272923c41d334e93a1efa95360583772 from Cerise-client
2018-09-27T14:12:39+0200 Extracting output from: /tmp/mdstudio/lie_md/4602185954418892
The text was updated successfully, but these errors were encountered:
Ah, looks like the SSH connection went down, and Cerise doesn't automatically reconnect. That's a known issue (see #25), it should of course try to reconnect and continue going. IIRC, I actually put that functionality into Cerulean, so it should come for free with the switch from Xenon to Cerulean. I'll get to that ASAP.
When running a MDStudio Workflow that includes lie_md component endpoints that are using Cerise, a SystemError is sometimes raised.
System specs: OSX 10.11.6, lie_md running 'standalone' using python3.6, cerise client is Binac cluster GPU queue.
History leading up to error: call made from lie_workflow, running solvent-ligand MD simulation. At least the first run of the workflow starting from a clean cerise specialization docker (created the first time by lie_md) finishes without the SystemError being raised. Only in the second run or later using the same cerise specialization docker still running is the following System error raise (from cerise_backend.log):
The lie_md output leading up to this point:
The text was updated successfully, but these errors were encountered: