-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Detect and flatten nested WDL directories #268
Conversation
src/cromshell/submit/command.py
Outdated
@@ -65,6 +68,11 @@ def main(config, wdl, wdl_json, options_json, dependencies_zip, no_validation): | |||
|
|||
http_utils.assert_can_communicate_with_server(config=config) | |||
|
|||
if io_utils.has_nested_dependencies(wdl): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be a good idea to have the option to disable this feature.
Also Logger.debug message saying this is taking place within the if statement
src/cromshell/utilities/io_utils.py
Outdated
@@ -1,11 +1,13 @@ | |||
import json | |||
import logging | |||
import re | |||
import os |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If possible use avoid using os package and use pathlib.Path library for file/directly handling. The pathlib module is a newer module in Python that provides a more object-oriented way of handling file paths. This makes it more intuitive and easier to use, and it also provides a number of features that are not available in the os module.
src/cromshell/utilities/io_utils.py
Outdated
if l.startswith('import'): | ||
m = re.match(r'import "(.+)"', l) | ||
|
||
if "../" in m.group(1): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An edge case that this function would miss would be nested wdl that uses absolute paths for their imports. Highly unlikely though as I haven't seen wdls like that often or at all.
src/cromshell/utilities/io_utils.py
Outdated
return False | ||
|
||
|
||
def get_flattened_filename(tempdir: tempfile.TemporaryDirectory, wdl_path: str or Path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add the output type being returned by the function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Added some suggestions.
- If viable unit tests for the new functions and an integration tests would be valuable. For the integration tests I think you can simply add a simple nested wdl to the workflows
/tests/workflows
dir and add the workflow to the exists tests in the paramterazied input for test_submit - Also to automatically fix linting issues you can use
tox -e lint-edit
Co-authored-by: bshifaw <[email protected]>
Co-authored-by: bshifaw <[email protected]>
src/cromshell/utilities/io_utils.py
Outdated
@@ -1,7 +1,10 @@ | |||
import json | |||
import logging | |||
import re | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src/cromshell/utilities/io_utils.py
Outdated
@@ -1,7 +1,10 @@ | |||
import json | |||
import logging | |||
import re | |||
|
|||
# import os |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# import os |
src/cromshell/utilities/io_utils.py
Outdated
if line.startswith("import"): | ||
m = re.match(r'import "(.+)"', line) | ||
imported_wdl_name = m.group(1) | ||
imported_wdl_path = (Path(wdl_dir) / imported_wdl_name).absolute() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be better to use .resolve()
, which will replace ".." with actual folder names
imported_wdl_path = (Path(wdl_dir) / imported_wdl_name).absolute() | |
imported_wdl_path = (Path(wdl_dir) / imported_wdl_name).resolve() |
* minor updates to io_utils.py * added tests for flatten wdl functions * removed task from test workflow that imports its task --------- Co-authored-by: bshifaw <[email protected]>
One common problem we encounter is path resolution for WDLs within a nested directory structure. While a nested file organization is better for readability and maintainability, Cromwell cannot always resolve the file locations properly. Within Terra, dependencies are resolved via Dockstore. Outside of Terra, no convenient mechanism exists.
This PR adds such functionality to the
submit
command. First, we automatically detect when a workflow has relative imports (simply looking for the '../' indicator in the import path). Next, we create a temporary directory. We then recurse the dependency structure of the submitted WDL. We rewrite each WDL to the temporary directory, adjusting the imports to point to the temporary directory as we go. We also ensure that the aliasing of WDLs remains correct by adding it where necessary.To see this in action, consider the following directory structure. The
HelloWorkflow.wdl
imports files from directories above it, which works in Dockstore/Terra, but fails to run through Cromwell via cromshell.To fix this,
cromshell submit
flattens out the directory structure within a temporary directory, as below.The submitted workflow, HelloWorkflow.wdl, is also rewritten. Before:
After:
And finally, the
dependencies_zip
parameter is overwritten to point to the temporary directory, enabling the automatic zipping of imports.