-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updated/new documentation for custom nodes, CLI, tests #83
Merged
Merged
Changes from 5 commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
9b0f0a1
chore: Rename coverage badge for pyworkflow
hcat-pge d6cb3aa
docs: Update README for style, reflect separate docs
hcat-pge 7398bf1
docs: Add initial info for CLI
hcat-pge aea5917
docs: Add info on creating custom nodes
hcat-pge c4584e1
docs: Copy info on tests into separate file
hcat-pge e8c70cb
docs: Update Docker instructions
hcat-pge File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,96 +1,157 @@ | ||
# Introduction | ||
![Postman Tests](https://github.com/matthew-t-smith/visual-programming/workflows/Postman%20Tests/badge.svg) | ||
![Code Coverage](./docs/media/coverage.svg) | ||
# PyWorkflow | ||
| | | | ||
|------------|--------| | ||
| Docker | TBD | | ||
| Back-end | ![Postman Tests](https://github.com/matthew-t-smith/visual-programming/workflows/Postman%20Tests/badge.svg) | | ||
| Front-end | TBD | | ||
| PyWorkflow | ![Code Coverage](./docs/media/pyworkflow_coverage.svg) | | ||
| CLI | TBD | | ||
| Jest | TBD | | ||
|
||
PyWorkflow is a visual programming application for building data science | ||
pipelines and workflows. It is inspired by [KNIME](https://www.knime.com) | ||
and aims to bring the desktop-based experience to a web-based environment. | ||
PyWorkflow takes a Python-first approach and leverages the power of *pandas* | ||
DataFrames to bring data-science to the masses. | ||
|
||
![Pyworkflow UI](./docs/media/pyworkflow-ui.png) | ||
|
||
So far the app comprises a Django app and a SPA React app (bootstrapped with | ||
create-react-app). For React to request data from Django, the `proxy` field is | ||
set in `front-end/package.json`, telling the dev server to fetch non-static | ||
data from `localhost:8000` **where the Django app must be running**. | ||
|
||
## Django | ||
|
||
### Install Dependencies | ||
1. Install `pipenv` from home directory | ||
|
||
- **Homebrew**: | ||
|
||
- `brew install pipenv` | ||
|
||
- **pip**: | ||
|
||
- `pip install pipenv` | ||
- or depending on your versioning setup: | ||
- `pip3 install pipenv` | ||
|
||
- You can install at the User level using **pip** via: `pip install --user pipenv` | ||
|
||
2. `cd` to top level of project (contains `Pipfile` and `Pipefile.lock`) | ||
|
||
3. Install dependencies | ||
|
||
- `pipenv install` | ||
|
||
4. Activate and exit the shell | ||
|
||
- `pipenv shell` | ||
- `exit` | ||
|
||
5. Or, run single commands | ||
|
||
- `pipenv run python [COMMAND]` | ||
|
||
### Installing new packages | ||
- Simply install via: `pipenv install [package-name]` | ||
|
||
### Create dotenv file with app secret | ||
- `echo "SECRET_KEY='TEMPORARY SECRET KEY'" > vp/.environment` | ||
|
||
### Start dev server from app root | ||
- `cd vp` | ||
- `pipenv run python manage.py runserver` | ||
|
||
--- | ||
## React | ||
|
||
### Install Prerequisites | ||
- `cd front-end` | ||
- `npm install` | ||
|
||
### Start dev server | ||
- `npm start` | ||
|
||
--- | ||
## CLI | ||
1. Run pipenv shell. | ||
2. Create a workflow using UI and save it. | ||
3. Run it as: pyworkflow execute workflow-file | ||
|
||
Also accepts reading input from std (i.e < file.csv) and writing to sdt out (i.e > output.csv) | ||
|
||
|
||
|
||
--- | ||
## Tests | ||
PyWorkflow currently has two sets of tests: API endpoints and unit tests. | ||
The API tests are written in Postman and can be run individually, by importing | ||
the collection and environment into your Postman application, or via the command | ||
line by [installing Newman](https://www.npmjs.com/package/newman) and running: | ||
|
||
- `cd Postman` | ||
- `newman run PyWorkflow-runner.postman_collection.json --environment Local-env.postman_environment.json` | ||
|
||
Unit tests for the PyWorkflow package are run using Python's built-in `unittest` | ||
package. | ||
|
||
- `cd pyworkflow/pyworkflow` | ||
- `pipenv run python3 -m unittest tests/*.py` | ||
|
||
To see coverage, you can use the `coverage` package. This is included in the Pipfile | ||
but must be installed with `pipenv install -dev`. Then, while still in the pyworkflow | ||
directory, you can run | ||
|
||
- `coverage run -m unittest tests/*.py` | ||
- `coverage report` (to see a report via the CLI) | ||
- `coverage html && open /htmlcov/index.html` (to view interactive coverage) | ||
# Introduction | ||
PyWorkflow was developed with a few key principles in mind: | ||
|
||
1) Easily deployed. PyWorkflow can be deployed locally or remotely with pre-built | ||
Docker containers. | ||
|
||
2) Highly extensible. PyWorkflow has a few key nodes built-in to perform common | ||
operations, but it is built with custom nodes in mind. Any user can write a | ||
custom node of their own to perform *pandas* operations, or other data science | ||
packages. | ||
|
||
3) Advanced features for everyone. PyWorkflow is meant to cater to users with | ||
no programming experience, all the way to someone who writes Python code daily. | ||
An easy-to-use command line interface allows for batch workflow execution and | ||
scheduled runs with a tool like `cron`. | ||
|
||
To meet these principles, the user interface is built on | ||
[react-diagrams](https://github.com/projectstorm/react-diagrams) | ||
to enable drag-and-drop nodes and edge creation. These packaged nodes provide | ||
basic *pandas* functionality and easy customization options for users to create | ||
workflows tailored to their specific needs. For users looking to create custom | ||
nodes, please [reference the documentation on how to write your own class](docs/custom_nodes.md). | ||
|
||
On the back-end, a computational graph stores the nodes, edges, and | ||
configuration options using the [NetworkX package](https://networkx.github.io). | ||
All data operations are saved in JSON format which allows for easy readability | ||
and transfer of data to other environments. | ||
|
||
# Getting Started | ||
The back-end consists of the PyWorkflow package, to perform all graph-based | ||
operations, file storage/retrieval, and execution. These methods are triggered | ||
either via API calls from the Django web-server, or from the CLI application. | ||
|
||
The front-end is a SPA React app (bootstrapped with create-react-app). For React | ||
to request data from Django, the `proxy` field is set in `front-end/package.json`, | ||
telling the dev server to fetch non-static data from `localhost:8000` **where | ||
the Django app must be running**. | ||
|
||
## Docker | ||
|
||
The easiest way to get started is by deploying both Docker containers on your | ||
local machine. For help installing Docker, [reference the documentation for your | ||
specific system](https://docs.docker.com/get-docker/). Once Docker is installed, | ||
from the root directory of the repository, run | ||
|
||
`docker-compose up` | ||
|
||
This builds both the front- and back-end Docker images and runs them with | ||
networking between the images. To use the GUI, open http://localhost:3000 in your | ||
web-browser. To use the CLI... | ||
|
||
**Installing new Python packages** | ||
If you write custom nodes that require additional packages, you can add these | ||
to the Docker image by running: | ||
``` | ||
docker exec pipenv install [package-name] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let's remove this until we figure out if it will work |
||
``` | ||
|
||
|
||
## Serve locally | ||
|
||
Alternatively, the front- and back-ends can be compiled separately and run on | ||
your local machine. | ||
|
||
### Server (Django) | ||
|
||
1. Install `pipenv` | ||
|
||
- **Homebrew** | ||
|
||
``` | ||
brew install pipenv | ||
``` | ||
|
||
- **pip** | ||
|
||
``` | ||
pip install pipenv OR pip3 install pipenv | ||
``` | ||
2. Install dependencies | ||
Go to the `back-end` directory with `Pipfile` and `Pipfile.lock`. | ||
``` | ||
cd back-end | ||
pipenv install | ||
``` | ||
3. Setup your local environment | ||
|
||
- Create environment file with app secret | ||
``` | ||
echo "SECRET_KEY='TEMPORARY SECRET KEY'" > vp/.environment | ||
``` | ||
|
||
4. Start dev server from app root | ||
``` | ||
cd vp | ||
pipenv run python3 manage.py runserver | ||
``` | ||
|
||
If you have trouble running commands individually, you can also enter the | ||
virtual environment created by `pipenv` by running `pipenv shell`. | ||
|
||
### Client (react-diagrams) | ||
In a separate terminal window, perform the following steps to start the | ||
front-end. | ||
|
||
1. Install Prerequisites | ||
``` | ||
cd front-end | ||
npm install | ||
``` | ||
2. Start dev server | ||
``` | ||
npm start | ||
``` | ||
|
||
# CLI | ||
PyWorkflow also provides a command-line interface to execute pre-built workflows | ||
without the client or server running. The CLI is packaged in the `back-end` | ||
directory and can be accessed through a deployed Docker container, or locally | ||
through the `pipenv shell`. | ||
|
||
The CLI syntax for PyWorkflow is: | ||
``` | ||
pyworkflow execute workflow-file... | ||
``` | ||
|
||
For help reading from stdin, writing to stdout, batch-processing, and more | ||
[check out the CLI docs](docs/cli.md) for more information. | ||
|
||
# Tests | ||
PyWorkflow has several automated tests that are run on each push to the GitHub | ||
repository through GitHub Actions. The status of each can be seen in the various | ||
badges at the top of this README. | ||
|
||
PyWorkflow currently has unit tests for both the back-end (the PyWorkflow | ||
package) and the front-end (react-diagrams). There are also API tests | ||
using Postman to test the integration between the front- and back-ends. For more | ||
information on these tests, and how to run them, [read the documentation for more | ||
information](docs/tests.md). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,80 @@ | ||
# Command-line Interface | ||
|
||
PyWorkflow is first-and-foremost a visual programming application, designed to | ||
help data scientists and many others build workflows to view, manipulate, and | ||
output their data into new formats. Therefore, all workflows must first be | ||
created via the user-interface and saved for later execution. | ||
|
||
However, it may not always be ideal to have the client and server deployed | ||
locally or on a remote server just to run your workflows. Power-users want the | ||
ability to running multiple workflows at once, schedule workflow runs, and | ||
dynamically pass data from workflows via stdin/stdout in traditional shell | ||
scripts. This is where the inclusion of PyWorkflow's CLI really shines. | ||
|
||
## Command-line syntax | ||
|
||
``` | ||
pyworkflow execute workflow-file... | ||
``` | ||
### Commands | ||
|
||
#### Execute | ||
Accepts one or more workflow files as arguments to execute. PyWorkflow will load | ||
the file(s) specified and output status messages to `stdout`. If a workflow | ||
fails to run because of an exception, these will be logged to `stderr`. | ||
|
||
**Single-file example** | ||
``` | ||
pyworkflow execute ./workflows/my_workflow.json | ||
``` | ||
|
||
**Batch processing** | ||
|
||
Many shells offer different wildcards that can be used to work with multiple | ||
files on the command line, or in scripts. A useful one is the `*` wildcard that | ||
matches matches anything. Used in the following example, it has the effect of | ||
passing all files located within the `workflows` directory to the `execute` | ||
command. | ||
|
||
``` | ||
pyworkflow execute ./workflows/* | ||
``` | ||
|
||
## Using `stdin`/`stdout` to modify workflows | ||
|
||
Two powerful tools when writing shell scripts are redirection and pipes, which | ||
allow you to dynamically pass data from one command to another. Using these | ||
tools, you can pass different data in to and out of workflows that define what | ||
standard behavior should occur. | ||
|
||
PyWorkflow comes with a Read CSV input node and Write CSV output node. When data | ||
is provided via `stdin` on the command-line, it will modify the workflow | ||
behavior to redirect the Read CSV node to that data. Similarly, if a destination | ||
is specified for `stdout`, the Write CSV node output will be redirected there. | ||
|
||
Input data can be passed to PyWorkflow in a few ways. | ||
1) Redirection | ||
``` | ||
# Data from sample_file.csv is passed to a Read CSV node | ||
pyworkflow execute my_workflow.json < sample_file.csv | ||
``` | ||
2) Pipes | ||
``` | ||
# Two CSV files are combined and passed in to a Read CSV node | ||
cat sample_file.csv more_data.csv | pyworkflow execute my_workflow.json | ||
|
||
# Data from a 'csv_exporter' tool is passed to a Read CSV node | ||
csv_exporter generate | pyworkflow execute my_workflow.json | ||
``` | ||
|
||
Output data can be passed from PyWorkflow in a few ways. | ||
1) Redirection | ||
``` | ||
# Output from a Write CSV node is stored in a new file 'output.csv' | ||
pyworkflow execute my_workflow.json > output.csv | ||
``` | ||
2) Pipes | ||
``` | ||
# Output from a Write CSV node is searched for the phrase 'foobar' | ||
pyworkflow execute my_workflow.json | grep "foobar" | ||
``` |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: the running process is a container, the static (downloadable) artifact is the image