Reads to stdin and writes to stdout, batch execution of workflows #69

diegostruk · 2020-04-24T10:45:51Z

Adds logic to read from stdin:
pyworkflow execute < input-file

Uploads the input file to current working directory.
If there is a ReadCsv node in the workflow it overrides the input file from the pyworkflow json to read from the stdin file instead.

Adds logic to write to stdout:

pyworkflow execute > output-file

Adds input validation and checks.
Adds batch execution logic pyworkflow execute

TODO:
For reading from stdin I created a method in workflow and an execute specific one in ReadCsvNode class. I would like to refactor this to re-use the already existing methods. While some basic validations are in place there is still some work needed to be done.

…ifiying a location

reddigari · 2020-04-25T17:05:19Z

Lookin good! I left a comment about finding a way not to duplicate all the execute logic (but no actual solution haha). I also imagine this is going to have hella conflicts with Matt's significant node refactoring in #70.

reelmatt · 2020-04-25T19:54:17Z

Agree with Samir, this is looking good so far! I would try updating this branch from master to incorporate changes from #57—there's a refined execute() method there which may/may not make the CLI execution easier (or harder). (#70 shouldn't actually have that many conflicts here I think)

One suggestion: switch the --file-directory option into an argument to the execute command. Would make the command line something like, pyworkflow execute file1 [...] (and should help "batch mode" by allowing multiple workflow files).

One question: I see the simplicity writing stdin to a temp file to pass in to execution, but is there a downside to passing the raw input directly? pandas.read_csv() takes both string/file inputs so passing something like stdin_text.read() should work, and that lines up more closely with how I understand redirection in the shell (but I could be totally off there).

…tion

diegostruk · 2020-04-28T20:12:14Z

@reelmatt thanks for the suggestion of passing the file directories as an argument, I added it to this PR. I don't see any downsides of passing the raw input directly, I mainly just thought it is useful to have the file uploaded to the directory for future use, but I can look into it and push it in the next PR.
@reddigari there was one merge conflict only, however, like you mentioned I had to update some logic to work with the new refactoring. This PR contains those changes. I tried re-using the current execute flow but wasn't able to find a quick fix, so will likely spend some time and include in the next PR.
This PR now also contains batch execution of workflows.

reelmatt

One thing to change before merging is the comment below about saving the executed node back to the workflow. A few different ways we can go about it. Reading from stdin is still working though just limited to one file, correct?

I also might be missing writing to stdout, but if I do pyworkflow execute workflow_file > output-file, the contents of output-file is just the print statements logged to the terminal (e.g. "Loading workflow file..."). The behavior I was thinking of was printing the data of a Write CSV node. Based on how you have the Read CSV node working, you could probably do a simple len(stdin_files) > 0 and if true, do print(df.to_json()) before/instead of writing the file. That would still include the other printed statements, but at least it gets the data into output-file which I think is more ideal. Thoughts?

reelmatt · 2020-04-28T21:31:36Z

pyworkflow/pyworkflow/workflow.py

+                # delete file at index 0
+                del stdin_files[0]
+            else:
+                workflow_instance.execute(node)


I ran into this bug the other day where the graph didn't update after execution. The execute call returns the executed node for the front-end to then call workflow.update_or_add_node() to actually store the node.data attribute. Without this, I got complaints that predecessor data was missing when it's actually written to disk.

Changing both if/else to include executed_node = workflow_instance.<execute_method_here>... and then workflow_instance.update_or_add_node(executed_node) outside the if/else but within the for loop solves the issue. Come to think of it, we should probably update the execute method to have this behavior and update the execute endpoint as well (to avoid double-saving).

I see, I've modified the code to do exactly what you suggested and seems to be working fine.

I've added a modification that includes the dataframe output in sdtout. Didn't need to duplicate the exact functionality as ReadCsv I was able to print output = node_to_execute.execute(preceding_data, execution_options). I'm still not happy with the repeating of execute code in workflow, so will be refactoring this in the next PR.

CLI/cli.py

README.md

reelmatt

Looking better now!

reelmatt · 2020-04-30T21:06:00Z

README.md

-3. Run it as: pyworkflow --file-directory (path-to-json-workflow-file) execute
+3. Run it as: pyworkflow execute workflow-file
+
+Also accepts reading input from std (i.e < file.csv) and writing to sdt out (i.e > output.csv)


nitpick: Should be stdin and stdout.

reddigari · 2020-04-25T17:03:06Z

pyworkflow/pyworkflow/node.py

@@ -109,6 +109,16 @@ def execute(self, predecessor_data, flow_vars):
        except Exception as e:
            raise NodeException('read csv', str(e))

+    def execute_for_read(self, predecessor_data, flow_vars, file_to_read):


Is it possible to alter the node configuration (probably through node.option_values) to point to the stdin-copied file rather than create a new method? I think this would resolve the need for duplicated execute calls in the workflow object as you mentioned in the comments there.

Diego Struk added 9 commits April 18, 2020 17:19

Exception catching to display user friendly error message

ff2fccc

Modified wording and preventing the user to run workflow without spec…

144f037

…ifiying a location

Catching empty file directory in execute function

d3e624c

Removing print statement

67d1b15

Reading input from stdin and executing workflow based on that input

d0cd15a

Removing print statment

a72e01a

Continue with execution if there is no input from stdin

aab4a38

Allowing to execute workflow from CLI with or without input from stdin

69d8b8c

Handles multiple file input from std_in

2d3dfe9

diegostruk changed the title ~~Dev/cli~~ WIP: Reads to stdin and writes to stdout Apr 24, 2020

Diego Struk added 6 commits April 27, 2020 16:55

Merges master into branch

91c8f3d

Adapts reading from stdin to node refactoring

98cdef8

Removed print statement

8559dc3

Modify command for execution to pyworkflow execute workflow-file-loca…

7fe89c6

…tion

Batch execution of workflow

f798249

Updates readme

48879db

diegostruk requested review from reddigari, reelmatt, cesaragv and matthew-t-smith April 28, 2020 20:12

diegostruk changed the title ~~WIP: Reads to stdin and writes to stdout~~ Reads to stdin and writes to stdout Apr 28, 2020

diegostruk changed the title ~~Reads to stdin and writes to stdout~~ Reads to stdin and writes to stdout, batch execution of workflows Apr 28, 2020

reelmatt requested changes Apr 28, 2020

View reviewed changes

fixes review comments

206998f

diegostruk requested a review from reelmatt April 30, 2020 20:48

reelmatt approved these changes Apr 30, 2020

View reviewed changes

reddigari approved these changes May 1, 2020

View reviewed changes

reelmatt merged commit c794f7a into master May 1, 2020

diegostruk deleted the dev/cli branch May 5, 2020 15:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reads to stdin and writes to stdout, batch execution of workflows #69

Reads to stdin and writes to stdout, batch execution of workflows #69

diegostruk commented Apr 24, 2020 •

edited

Loading

reddigari commented Apr 25, 2020

reelmatt commented Apr 25, 2020

diegostruk commented Apr 28, 2020

reelmatt left a comment

reelmatt Apr 28, 2020

diegostruk Apr 29, 2020

diegostruk Apr 30, 2020

reelmatt left a comment

reelmatt Apr 30, 2020

reddigari Apr 25, 2020

Reads to stdin and writes to stdout, batch execution of workflows #69

Reads to stdin and writes to stdout, batch execution of workflows #69

Conversation

diegostruk commented Apr 24, 2020 • edited Loading

reddigari commented Apr 25, 2020

reelmatt commented Apr 25, 2020

diegostruk commented Apr 28, 2020

reelmatt left a comment

Choose a reason for hiding this comment

reelmatt Apr 28, 2020

Choose a reason for hiding this comment

diegostruk Apr 29, 2020

Choose a reason for hiding this comment

diegostruk Apr 30, 2020

Choose a reason for hiding this comment

reelmatt left a comment

Choose a reason for hiding this comment

reelmatt Apr 30, 2020

Choose a reason for hiding this comment

reddigari Apr 25, 2020

Choose a reason for hiding this comment

diegostruk commented Apr 24, 2020 •

edited

Loading