Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect and flatten nested WDL directories #268

Merged
merged 7 commits into from
Sep 27, 2023
Merged

Conversation

kvg
Copy link
Contributor

@kvg kvg commented Jul 26, 2023

One common problem we encounter is path resolution for WDLs within a nested directory structure. While a nested file organization is better for readability and maintainability, Cromwell cannot always resolve the file locations properly. Within Terra, dependencies are resolved via Dockstore. Outside of Terra, no convenient mechanism exists.

This PR adds such functionality to the submit command. First, we automatically detect when a workflow has relative imports (simply looking for the '../' indicator in the import path). Next, we create a temporary directory. We then recurse the dependency structure of the submitted WDL. We rewrite each WDL to the temporary directory, adjusting the imports to point to the temporary directory as we go. We also ensure that the aliasing of WDLs remains correct by adding it where necessary.

To see this in action, consider the following directory structure. The HelloWorkflow.wdl imports files from directories above it, which works in Dockstore/Terra, but fails to run through Cromwell via cromshell.

[ 192]  /Users/kiran/repositories/aou-lr/wdl/
├── [ 352]  lib
│   ├── [ 480]  Utility
│   │   ├── [ 85K]  Utils.wdl
├── [  96]  pipelines
│   └── [ 580]  HelloWorkflow.wdl
├── [  96]  structs
│   └── [ 251]  Structs.wdl
└── [  96]  tasks
    └── [ 462]  HelloTask.wdl

To fix this, cromshell submit flattens out the directory structure within a temporary directory, as below.

[ 192]  /var/folders/jp/l0z21gnj4f531jw12fvm0bx80000gq/T/cromshell_alce64p_
├── [ 85K]  Users-kiran-repositories-aou-lr-wdl-lib-Utility-Utils.wdl
├── [ 655]  Users-kiran-repositories-aou-lr-wdl-pipelines-HelloWorkflow.wdl
├── [ 251]  Users-kiran-repositories-aou-lr-wdl-structs-Structs.wdl
└── [ 462]  Users-kiran-repositories-aou-lr-wdl-tasks-HelloTask.wdl

The submitted workflow, HelloWorkflow.wdl, is also rewritten. Before:

version 1.0

import "../tasks/HelloTask.wdl" as Hello
import "../lib/Utility/Utils.wdl"

workflow HelloWorkflow {
    meta {
        description: "Example workflow."
    }

    parameter_meta {
        greeting: "The message to print"
    }

    input {
        String greeting
    }

    # Run a task locally defined in this repo
    call Hello.Print { input: message = greeting }

    # Run a task remotely defined in the long-read-pipelines repo
    call Utils.Sum { input: ints = [1, 2, 3] }

    output {
        String message = Print.text
        Int sum = Sum.sum
    }
}

After:

version 1.0

import "Users-kiran-repositories-aou-lr-wdl-tasks-HelloTask.wdl" as Hello
import "Users-kiran-repositories-aou-lr-wdl-lib-Utility-Utils.wdl" as Utils

workflow HelloWorkflow {
    meta {
        description: "Example workflow."
    }

    parameter_meta {
        greeting: "The message to print"
    }

    input {
        String greeting
    }

    # Run a task locally defined in this repo
    call Hello.Print { input: message = greeting }

    # Run a task remotely defined in the long-read-pipelines repo
    call Utils.Sum { input: ints = [1, 2, 3] }

    output {
        String message = Print.text
        Int sum = Sum.sum
    }
}

And finally, the dependencies_zip parameter is overwritten to point to the temporary directory, enabling the automatic zipping of imports.

@kvg kvg requested a review from bshifaw July 26, 2023 04:57
@@ -65,6 +68,11 @@ def main(config, wdl, wdl_json, options_json, dependencies_zip, no_validation):

http_utils.assert_can_communicate_with_server(config=config)

if io_utils.has_nested_dependencies(wdl):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be a good idea to have the option to disable this feature.
Also Logger.debug message saying this is taking place within the if statement

@@ -1,11 +1,13 @@
import json
import logging
import re
import os
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible use avoid using os package and use pathlib.Path library for file/directly handling. The pathlib module is a newer module in Python that provides a more object-oriented way of handling file paths. This makes it more intuitive and easier to use, and it also provides a number of features that are not available in the os module.

if l.startswith('import'):
m = re.match(r'import "(.+)"', l)

if "../" in m.group(1):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An edge case that this function would miss would be nested wdl that uses absolute paths for their imports. Highly unlikely though as I haven't seen wdls like that often or at all.

return False


def get_flattened_filename(tempdir: tempfile.TemporaryDirectory, wdl_path: str or Path):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add the output type being returned by the function

Copy link
Collaborator

@bshifaw bshifaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Added some suggestions.
  • If viable unit tests for the new functions and an integration tests would be valuable. For the integration tests I think you can simply add a simple nested wdl to the workflows /tests/workflows dir and add the workflow to the exists tests in the paramterazied input for test_submit
  • Also to automatically fix linting issues you can use tox -e lint-edit

@@ -1,7 +1,10 @@
import json
import logging
import re

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

@@ -1,7 +1,10 @@
import json
import logging
import re

# import os
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# import os

if line.startswith("import"):
m = re.match(r'import "(.+)"', line)
imported_wdl_name = m.group(1)
imported_wdl_path = (Path(wdl_dir) / imported_wdl_name).absolute()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be better to use .resolve(), which will replace ".." with actual folder names

Suggested change
imported_wdl_path = (Path(wdl_dir) / imported_wdl_name).absolute()
imported_wdl_path = (Path(wdl_dir) / imported_wdl_name).resolve()

* minor updates to io_utils.py

* added tests for flatten wdl functions

* removed task from test workflow that imports its task

---------

Co-authored-by: bshifaw <[email protected]>
@kvg kvg merged commit 9473930 into main Sep 27, 2023
@kvg kvg deleted the kvg_flatten_nested_wdl_dirs branch September 27, 2023 00:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants