Refined Node execution/validation; flow variable substitution #57

reelmatt · 2020-04-17T18:34:53Z

Node execution/validation

The main Node class now implements a validation method that checks all Parameters, calling their option.validate() methods. The calls to validate within pyworkflow/workflow.py have also been removed. This addresses the use case mentioned where a Node could potentially be invalid when added to the workspace. Now, Nodes are validated upon saving the configuration, or the 'node update' endpoint.

Also changed is the order of operations for Node execution. The main logic in the Workflow class has been simplified to:

# Load predecessor data and FlowNode values
preceding_data = self.load_input_data(node_to_execute.node_id)
flow_nodes = self.load_flow_nodes(node_to_execute.option_replace)

# Validate input data, and replace flow variables
node_to_execute.validate_input_data(len(preceding_data))
execution_options = node_to_execute.get_execution_options(flow_nodes)

# Pass in data to current Node to use in execution
output = node_to_execute.execute(preceding_data, execution_options)

The thinking behind moving input data validation and flow variable substitution into the Workflow class, is anyone developing a custom Node would not have to duplicate this work. Even for the built-in Nodes there was a lot of duplication where each Node validated input and parameters.

The validate_input_data and newly-named get_execution_options were moved from the NodeUtils class into the main Node class. This approach seems cleaner to me, but there may be use cases I'm not considering.

Flow variable substitution

There are currently no front-end options to create or pass these flow variables around, but this PR should handle most of the back-end functionality and updates the endpoint responses for the front-end to use. Currently, flow variables work by adding the following attributes to a Node (for example ReadCsv, written here in JSON:

"option_replace": {
    "sep": {
        "node_id": "2",
        "is_global": true
    }
}

On execution, the back-end would substitute the default_value stored in the global FlowNode with ID "2", for whatever the current value is for "sep" in the ReadCsvNode. Should the user want to revert "sep" back to whatever stored value was there, they would remove the flow variable (perhaps a checkbox, or empty selection from a dropdown?).

To pull this information in to the "Node Configuration" pop-up in the front-end, the action would need to call the GET /node/{node_id} endpoint which has been updated to return the Node's information as well as any flow variable options (all global variables, and any local variables connected as predecessors).

TODO:

This PR updates all Nodes to use the Parameter classes @reddigari introduced in #53. Only Read/WriteCsv and the JoinNode have been updated to use these Parameters during execution; the rest still pass in **self.options) which may cause issues.

There is no validation to ensure a FlowNode matches the Parameter type of a given option. E.g., there are no checks to prevent a StringNode value from replacing a BooleanParameter value.

Integration with front-end. Nested-forms might be the solution to provide the JSON data the back-end currently is using, but both back/front might need some changes as we integrate.

Made all Parameters one line per argument for readability/version control. Increases number of lines for 'small' parameters, but should make any future changes clearer.

`validation` defined in parent Node class. It checks all options are valid, raising `ParameterValidationError` if not. Calls to `node.validate()` are removed from `pyworkflow` when updating/adding node or edge between Nodes. A call to `validate` occurs when the POST route is hit for updating a Node to check any changes to the configuration are valid.

Removes `__init__` from FilterNode, changes `execute` to raise `NotImplementedError()` in not present.

All Nodes we have implemented required the defined number of input nodes, so logic can be simplified to whether two values are equal.

reddigari

This is phenomenal refactoring both from an extensibility standpoint and general DRYness. Awesome work!

I am having a little trouble keeping all the flow variable- and options-wrangling methods straight in my head. We should think about a clear way to present the approach (within our class diagrams or elsewhere) for milestone 3 and the final defense.

reddigari · 2020-04-18T20:44:13Z

Am I understanding correctly that local flow vars are just regular nodes in the graph, which are caught when parsing the predecessor input? Whereas global ones aren't connected to anything, but are available from within the node config form to override a particular parameter?

If you could give me a high-level list of what needs to happen on the front end for this to be supported, that would help a ton! If I am understanding correctly, this includes:

Adding flow input ports that local flow nodes can connect to
Add a way to mark a flow node as global
Within the node config form, identify all global flow vars, and provide a way to use them as parameters for the node?

reelmatt · 2020-04-18T22:14:12Z

The list you have hits most of the high-level needs of the front-end. I made a few in-depth comments to elaborate on these (and the endpoints to hit) in issues #64 and #65.

In general, your understanding of how the flow variables work is correct. A Workflow contains:

graph: the visual workflow, consisting of "local" Nodes, stored as a DiGraph()
flow_vars: the global flow variables, stored as a separate Graph()

The differentiator between the two is whether a Node has the is_global attribute set to true or false. My thought was the only way to include this flag is where the Node is defined. A separate area for displaying/adding global flow variables could help with that.

Global variables are defined in a separate Graph() to prevent connections to specific nodes in the "main" graph. By hitting the GET /node/<node_id> endpoint, all potential flow variables (both local and global) are retrieved and returned in a JSON response.

Factors out some duplicate code. PR #57 adds a `to_json()` method that might be able to replace the `extract_node_info()` method here. TBD.

diegostruk

I think this is a great refactor! I added a minor comment and after running seems like everything is good. I will keep going through the changes but I am fine merging this.

diegostruk · 2020-04-23T23:47:02Z

pyworkflow/pyworkflow/node.py

    """
    display_name = "Flow Control"

+    def execute(self, predecessor_data, flow_vars):
+        return


should this raise a NotImplementedError instead?

FlowNodes don't really have execute behavior like the other Nodes; their main logic lies in the get_replacement_value() method instead, so I don't think a NotImplementedError is needed here.

This also allows for the children (only the StringNode) to avoid adding empty execute() methods.

hcat-pge added 9 commits April 17, 2020 08:36

style: Update Node parameters to new Parameter class

b00b766

Made all Parameters one line per argument for readability/version control. Increases number of lines for 'small' parameters, but should make any future changes clearer.

fix: Exception-handling for 'new'; include 'id' key in POST

8e6fd2e

chore: Unify different Node classes

7d57166

Removes `__init__` from FilterNode, changes `execute` to raise `NotImplementedError()` in not present.

refactor: Move validate_input_data to Node class

52c22ea

All Nodes we have implemented required the defined number of input nodes, so logic can be simplified to whether two values are equal.

feat: Global/local flow variable replacement, updated execution

8bffc79

doc: Described new flow methods, clearer variable names

0115161

feat: Make Nodes JSON-serializable

3926e68

feat: Retrieve list of all FlowNode options for a given Node config

3f9f9b1

reelmatt requested review from cesarnda, diegostruk, reddigari, matthew-t-smith and cesaragv April 17, 2020 18:34

reddigari approved these changes Apr 18, 2020

View reviewed changes

reelmatt pushed a commit that referenced this pull request Apr 21, 2020

refactor: Merge node listings to one endpoint

896d3e6

Factors out some duplicate code. PR #57 adds a `to_json()` method that might be able to replace the `extract_node_info()` method here. TBD.

diegostruk approved these changes Apr 23, 2020

View reviewed changes

reelmatt merged commit 8470b51 into master Apr 24, 2020

reelmatt deleted the dev/mthomas branch April 24, 2020 19:24

reelmatt mentioned this pull request Apr 25, 2020

Reads to stdin and writes to stdout, batch execution of workflows #69

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refined Node execution/validation; flow variable substitution #57

Refined Node execution/validation; flow variable substitution #57

reelmatt commented Apr 17, 2020

reddigari left a comment

reddigari commented Apr 18, 2020

reelmatt commented Apr 18, 2020

diegostruk left a comment

diegostruk Apr 23, 2020

reelmatt Apr 24, 2020

Refined Node execution/validation; flow variable substitution #57

Refined Node execution/validation; flow variable substitution #57

Conversation

reelmatt commented Apr 17, 2020

Node execution/validation

Flow variable substitution

TODO:

reddigari left a comment

Choose a reason for hiding this comment

reddigari commented Apr 18, 2020

reelmatt commented Apr 18, 2020

diegostruk left a comment

Choose a reason for hiding this comment

diegostruk Apr 23, 2020

Choose a reason for hiding this comment

reelmatt Apr 24, 2020

Choose a reason for hiding this comment