Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refined Node execution/validation; flow variable substitution #57

Merged
merged 9 commits into from
Apr 24, 2020

Conversation

reelmatt
Copy link
Member

Node execution/validation

The main Node class now implements a validation method that checks all Parameters, calling their option.validate() methods. The calls to validate within pyworkflow/workflow.py have also been removed. This addresses the use case mentioned where a Node could potentially be invalid when added to the workspace. Now, Nodes are validated upon saving the configuration, or the 'node update' endpoint.

Also changed is the order of operations for Node execution. The main logic in the Workflow class has been simplified to:

# Load predecessor data and FlowNode values
preceding_data = self.load_input_data(node_to_execute.node_id)
flow_nodes = self.load_flow_nodes(node_to_execute.option_replace)

# Validate input data, and replace flow variables
node_to_execute.validate_input_data(len(preceding_data))
execution_options = node_to_execute.get_execution_options(flow_nodes)

# Pass in data to current Node to use in execution
output = node_to_execute.execute(preceding_data, execution_options)

The thinking behind moving input data validation and flow variable substitution into the Workflow class, is anyone developing a custom Node would not have to duplicate this work. Even for the built-in Nodes there was a lot of duplication where each Node validated input and parameters.

The validate_input_data and newly-named get_execution_options were moved from the NodeUtils class into the main Node class. This approach seems cleaner to me, but there may be use cases I'm not considering.

Flow variable substitution

There are currently no front-end options to create or pass these flow variables around, but this PR should handle most of the back-end functionality and updates the endpoint responses for the front-end to use. Currently, flow variables work by adding the following attributes to a Node (for example ReadCsv, written here in JSON:

"option_replace": {
    "sep": {
        "node_id": "2",
        "is_global": true
    }
}

On execution, the back-end would substitute the default_value stored in the global FlowNode with ID "2", for whatever the current value is for "sep" in the ReadCsvNode. Should the user want to revert "sep" back to whatever stored value was there, they would remove the flow variable (perhaps a checkbox, or empty selection from a dropdown?).

To pull this information in to the "Node Configuration" pop-up in the front-end, the action would need to call the GET /node/{node_id} endpoint which has been updated to return the Node's information as well as any flow variable options (all global variables, and any local variables connected as predecessors).

TODO:

This PR updates all Nodes to use the Parameter classes @reddigari introduced in #53. Only Read/WriteCsv and the JoinNode have been updated to use these Parameters during execution; the rest still pass in **self.options) which may cause issues.

There is no validation to ensure a FlowNode matches the Parameter type of a given option. E.g., there are no checks to prevent a StringNode value from replacing a BooleanParameter value.

Integration with front-end. Nested-forms might be the solution to provide the JSON data the back-end currently is using, but both back/front might need some changes as we integrate.

Made all Parameters one line per argument for readability/version control. Increases number of lines for 'small' parameters, but should make any future changes clearer.
`validation` defined in parent Node class. It checks all options are valid, raising `ParameterValidationError` if not.

Calls to `node.validate()`  are removed from `pyworkflow` when updating/adding node or edge between Nodes. A call to `validate` occurs when the POST route is hit for updating a Node to check any changes to the configuration are valid.
Removes `__init__` from FilterNode, changes `execute` to raise `NotImplementedError()` in not present.
All Nodes we have implemented required the defined number of input nodes, so logic can be simplified to whether two values are equal.
Copy link
Collaborator

@reddigari reddigari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is phenomenal refactoring both from an extensibility standpoint and general DRYness. Awesome work!

I am having a little trouble keeping all the flow variable- and options-wrangling methods straight in my head. We should think about a clear way to present the approach (within our class diagrams or elsewhere) for milestone 3 and the final defense.

@reddigari
Copy link
Collaborator

Am I understanding correctly that local flow vars are just regular nodes in the graph, which are caught when parsing the predecessor input? Whereas global ones aren't connected to anything, but are available from within the node config form to override a particular parameter?

If you could give me a high-level list of what needs to happen on the front end for this to be supported, that would help a ton! If I am understanding correctly, this includes:

  • Adding flow input ports that local flow nodes can connect to
  • Add a way to mark a flow node as global
  • Within the node config form, identify all global flow vars, and provide a way to use them as parameters for the node?

@reelmatt
Copy link
Member Author

The list you have hits most of the high-level needs of the front-end. I made a few in-depth comments to elaborate on these (and the endpoints to hit) in issues #64 and #65.

In general, your understanding of how the flow variables work is correct. A Workflow contains:

  • graph: the visual workflow, consisting of "local" Nodes, stored as a DiGraph()
  • flow_vars: the global flow variables, stored as a separate Graph()

The differentiator between the two is whether a Node has the is_global attribute set to true or false. My thought was the only way to include this flag is where the Node is defined. A separate area for displaying/adding global flow variables could help with that.

Global variables are defined in a separate Graph() to prevent connections to specific nodes in the "main" graph. By hitting the GET /node/<node_id> endpoint, all potential flow variables (both local and global) are retrieved and returned in a JSON response.

reelmatt pushed a commit that referenced this pull request Apr 21, 2020
Factors out some duplicate code.

PR #57 adds a `to_json()` method that might be able to replace the `extract_node_info()` method here. TBD.
Copy link
Collaborator

@diegostruk diegostruk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a great refactor! I added a minor comment and after running seems like everything is good. I will keep going through the changes but I am fine merging this.

"""
display_name = "Flow Control"

def execute(self, predecessor_data, flow_vars):
return
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this raise a NotImplementedError instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FlowNodes don't really have execute behavior like the other Nodes; their main logic lies in the get_replacement_value() method instead, so I don't think a NotImplementedError is needed here.

This also allows for the children (only the StringNode) to avoid adding empty execute() methods.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants