-
Notifications
You must be signed in to change notification settings - Fork 12
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge branch 'master' into dev/diego
- Loading branch information
Showing
6 changed files
with
455 additions
and
94 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,97 +1,168 @@ | ||
# Introduction | ||
![Postman Tests](https://github.com/matthew-t-smith/visual-programming/workflows/Postman%20Tests/badge.svg) | ||
![Code Coverage](./docs/media/coverage.svg) | ||
# PyWorkflow | ||
| | | | ||
|------------|--------| | ||
| Docker | TBD | | ||
| Back-end | ![Postman Tests](https://github.com/matthew-t-smith/visual-programming/workflows/Postman%20Tests/badge.svg) | | ||
| Front-end | TBD | | ||
| PyWorkflow | ![Code Coverage](./docs/media/pyworkflow_coverage.svg) | | ||
| CLI | TBD | | ||
| Jest | TBD | | ||
|
||
PyWorkflow is a visual programming application for building data science | ||
pipelines and workflows. It is inspired by [KNIME](https://www.knime.com) | ||
and aims to bring the desktop-based experience to a web-based environment. | ||
PyWorkflow takes a Python-first approach and leverages the power of *pandas* | ||
DataFrames to bring data-science to the masses. | ||
|
||
![Pyworkflow UI](./docs/media/pyworkflow-ui.png) | ||
|
||
So far the app comprises a Django app and a SPA React app (bootstrapped with | ||
create-react-app). For React to request data from Django, the `proxy` field is | ||
set in `front-end/package.json`, telling the dev server to fetch non-static | ||
data from `localhost:8000` **where the Django app must be running**. | ||
|
||
## Django | ||
|
||
### Install Dependencies | ||
1. Install `pipenv` from home directory | ||
|
||
- **Homebrew**: | ||
|
||
- `brew install pipenv` | ||
|
||
- **pip**: | ||
|
||
- `pip install pipenv` | ||
- or depending on your versioning setup: | ||
- `pip3 install pipenv` | ||
|
||
- You can install at the User level using **pip** via: `pip install --user pipenv` | ||
|
||
2. `cd` to top level of project (contains `Pipfile` and `Pipefile.lock`) | ||
|
||
3. Install dependencies | ||
|
||
- `pipenv install` | ||
|
||
4. Activate and exit the shell | ||
|
||
- `pipenv shell` | ||
- `exit` | ||
|
||
5. Or, run single commands | ||
|
||
- `pipenv run python [COMMAND]` | ||
|
||
### Installing new packages | ||
- Simply install via: `pipenv install [package-name]` | ||
|
||
### Create dotenv file with app secret | ||
- `echo "SECRET_KEY='TEMPORARY SECRET KEY'" > vp/.environment` | ||
|
||
### Start dev server from app root | ||
- `cd vp` | ||
- `pipenv run python manage.py runserver` | ||
|
||
--- | ||
## React | ||
|
||
### Install Prerequisites | ||
- `cd front-end` | ||
- `npm install` | ||
|
||
### Start dev server | ||
- `npm start` | ||
|
||
--- | ||
## CLI | ||
1. Run pipenv shell. | ||
2. Create a workflow using UI and save it. | ||
3. Run it as: pyworkflow execute workflow-file | ||
|
||
- Also accepts reading input from std (i.e < file.csv) and writing to sdt out (i.e > output.csv) | ||
- To see node execution details please use the verbose option (i.e pyworkflow execute --verbose workflow-file) | ||
|
||
|
||
|
||
--- | ||
## Tests | ||
PyWorkflow currently has two sets of tests: API endpoints and unit tests. | ||
The API tests are written in Postman and can be run individually, by importing | ||
the collection and environment into your Postman application, or via the command | ||
line by [installing Newman](https://www.npmjs.com/package/newman) and running: | ||
|
||
- `cd Postman` | ||
- `newman run PyWorkflow-runner.postman_collection.json --environment Local-env.postman_environment.json` | ||
|
||
Unit tests for the PyWorkflow package are run using Python's built-in `unittest` | ||
package. | ||
|
||
- `cd pyworkflow/pyworkflow` | ||
- `pipenv run python3 -m unittest tests/*.py` | ||
|
||
To see coverage, you can use the `coverage` package. This is included in the Pipfile | ||
but must be installed with `pipenv install -dev`. Then, while still in the pyworkflow | ||
directory, you can run | ||
|
||
- `coverage run -m unittest tests/*.py` | ||
- `coverage report` (to see a report via the CLI) | ||
- `coverage html && open /htmlcov/index.html` (to view interactive coverage) | ||
# Introduction | ||
PyWorkflow was developed with a few key principles in mind: | ||
|
||
1) Easily deployed. PyWorkflow can be deployed locally or remotely with pre-built | ||
Docker containers. | ||
|
||
2) Highly extensible. PyWorkflow has a few key nodes built-in to perform common | ||
operations, but it is built with custom nodes in mind. Any user can write a | ||
custom node of their own to perform *pandas* operations, or other data science | ||
packages. | ||
|
||
3) Advanced features for everyone. PyWorkflow is meant to cater to users with | ||
no programming experience, all the way to someone who writes Python code daily. | ||
An easy-to-use command line interface allows for batch workflow execution and | ||
scheduled runs with a tool like `cron`. | ||
|
||
To meet these principles, the user interface is built on | ||
[react-diagrams](https://github.com/projectstorm/react-diagrams) | ||
to enable drag-and-drop nodes and edge creation. These packaged nodes provide | ||
basic *pandas* functionality and easy customization options for users to create | ||
workflows tailored to their specific needs. For users looking to create custom | ||
nodes, please [reference the documentation on how to write your own class](docs/custom_nodes.md). | ||
|
||
On the back-end, a computational graph stores the nodes, edges, and | ||
configuration options using the [NetworkX package](https://networkx.github.io). | ||
All data operations are saved in JSON format which allows for easy readability | ||
and transfer of data to other environments. | ||
|
||
# Getting Started | ||
The back-end consists of the PyWorkflow package, to perform all graph-based | ||
operations, file storage/retrieval, and execution. These methods are triggered | ||
either via API calls from the Django web-server, or from the CLI application. | ||
|
||
The front-end is a SPA React app (bootstrapped with create-react-app). For React | ||
to request data from Django, the `proxy` field is set in `front-end/package.json`, | ||
telling the dev server to fetch non-static data from `localhost:8000` **where | ||
the Django app must be running**. | ||
|
||
## Docker | ||
|
||
The easiest way to get started is by deploying both Docker containers on your | ||
local machine. For help installing Docker, [reference the documentation for your | ||
specific system](https://docs.docker.com/get-docker/). | ||
|
||
The Docker container for PyWorkflow is built from 2 images: the `front-end` and | ||
the `back-end`. The `docker-compose.yml` defines how to combine and run the two. | ||
|
||
In order to build each image individually, from the root of the application: | ||
- `docker build front-end --tag FE_IMAGE[:TAG]` | ||
- `docker build back-end --tag BE_IMAGE[:TAG]` | ||
ex. - `docker build back-end --tag backendtest:2.0` | ||
|
||
Each individual image can be run by changing to the `front-end` or `back-end` directory and running: | ||
- `docker run -p 3000:3000 --name FE_CONTAINER_NAME FE_IMAGE[:TAG]` | ||
- `docker run -p 8000:8000 --name BE_CONTAINER_NAME BE_IMAGE[:TAG]` | ||
ex. - `docker run -p 8000:8000 --name pyworkflow-be backendtest:2.0` | ||
|
||
Note: there [is a known issue with `react-scripts` v3.4.1](https://github.com/facebook/create-react-app/issues/8688) | ||
that may cause the front-end container to exit with code 0. If this happens, | ||
you can add `-e CI=true` to the `docker-run` command above for the front-end. | ||
|
||
To compose and run the entire application container, from the root of the application: | ||
- `docker-compose up` | ||
|
||
You can then kill the container gracefully with: | ||
- `docker-compose down` | ||
|
||
NOTE: For development, change ./front-end/package.json from "proxy": "http://back-end:8000" to "http://localhost:8000" to work. | ||
|
||
|
||
## Serve locally | ||
|
||
Alternatively, the front- and back-ends can be compiled separately and run on | ||
your local machine. | ||
|
||
### Server (Django) | ||
|
||
1. Install `pipenv` | ||
|
||
- **Homebrew** | ||
|
||
``` | ||
brew install pipenv | ||
``` | ||
- **pip** | ||
|
||
``` | ||
pip install pipenv OR pip3 install pipenv | ||
``` | ||
2. Install dependencies | ||
Go to the `back-end` directory with `Pipfile` and `Pipfile.lock`. | ||
``` | ||
cd back-end | ||
pipenv install | ||
``` | ||
3. Setup your local environment | ||
|
||
- Create environment file with app secret | ||
``` | ||
echo "SECRET_KEY='TEMPORARY SECRET KEY'" > vp/.environment | ||
``` | ||
|
||
4. Start dev server from app root | ||
``` | ||
cd vp | ||
pipenv run python3 manage.py runserver | ||
``` | ||
|
||
If you have trouble running commands individually, you can also enter the | ||
virtual environment created by `pipenv` by running `pipenv shell`. | ||
|
||
### Client (react-diagrams) | ||
In a separate terminal window, perform the following steps to start the | ||
front-end. | ||
|
||
1. Install Prerequisites | ||
``` | ||
cd front-end | ||
npm install | ||
``` | ||
2. Start dev server | ||
``` | ||
npm start | ||
``` | ||
|
||
# CLI | ||
PyWorkflow also provides a command-line interface to execute pre-built workflows | ||
without the client or server running. The CLI is packaged in the `back-end` | ||
directory and can be accessed through a deployed Docker container, or locally | ||
through the `pipenv shell`. | ||
|
||
The CLI syntax for PyWorkflow is: | ||
``` | ||
pyworkflow execute workflow-file... | ||
``` | ||
|
||
For help reading from stdin, writing to stdout, batch-processing, and more | ||
[check out the CLI docs](docs/cli.md) for more information. | ||
|
||
# Tests | ||
PyWorkflow has several automated tests that are run on each push to the GitHub | ||
repository through GitHub Actions. The status of each can be seen in the various | ||
badges at the top of this README. | ||
|
||
PyWorkflow currently has unit tests for both the back-end (the PyWorkflow | ||
package) and the front-end (react-diagrams). There are also API tests | ||
using Postman to test the integration between the front- and back-ends. For more | ||
information on these tests, and how to run them, [read the documentation for more | ||
information](docs/tests.md). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,80 @@ | ||
# Command-line Interface | ||
|
||
PyWorkflow is first-and-foremost a visual programming application, designed to | ||
help data scientists and many others build workflows to view, manipulate, and | ||
output their data into new formats. Therefore, all workflows must first be | ||
created via the user-interface and saved for later execution. | ||
|
||
However, it may not always be ideal to have the client and server deployed | ||
locally or on a remote server just to run your workflows. Power-users want the | ||
ability to running multiple workflows at once, schedule workflow runs, and | ||
dynamically pass data from workflows via stdin/stdout in traditional shell | ||
scripts. This is where the inclusion of PyWorkflow's CLI really shines. | ||
|
||
## Command-line syntax | ||
|
||
``` | ||
pyworkflow execute workflow-file... | ||
``` | ||
### Commands | ||
|
||
#### Execute | ||
Accepts one or more workflow files as arguments to execute. PyWorkflow will load | ||
the file(s) specified and output status messages to `stdout`. If a workflow | ||
fails to run because of an exception, these will be logged to `stderr`. | ||
|
||
**Single-file example** | ||
``` | ||
pyworkflow execute ./workflows/my_workflow.json | ||
``` | ||
|
||
**Batch processing** | ||
|
||
Many shells offer different wildcards that can be used to work with multiple | ||
files on the command line, or in scripts. A useful one is the `*` wildcard that | ||
matches matches anything. Used in the following example, it has the effect of | ||
passing all files located within the `workflows` directory to the `execute` | ||
command. | ||
|
||
``` | ||
pyworkflow execute ./workflows/* | ||
``` | ||
|
||
## Using `stdin`/`stdout` to modify workflows | ||
|
||
Two powerful tools when writing shell scripts are redirection and pipes, which | ||
allow you to dynamically pass data from one command to another. Using these | ||
tools, you can pass different data in to and out of workflows that define what | ||
standard behavior should occur. | ||
|
||
PyWorkflow comes with a Read CSV input node and Write CSV output node. When data | ||
is provided via `stdin` on the command-line, it will modify the workflow | ||
behavior to redirect the Read CSV node to that data. Similarly, if a destination | ||
is specified for `stdout`, the Write CSV node output will be redirected there. | ||
|
||
Input data can be passed to PyWorkflow in a few ways. | ||
1) Redirection | ||
``` | ||
# Data from sample_file.csv is passed to a Read CSV node | ||
pyworkflow execute my_workflow.json < sample_file.csv | ||
``` | ||
2) Pipes | ||
``` | ||
# Two CSV files are combined and passed in to a Read CSV node | ||
cat sample_file.csv more_data.csv | pyworkflow execute my_workflow.json | ||
# Data from a 'csv_exporter' tool is passed to a Read CSV node | ||
csv_exporter generate | pyworkflow execute my_workflow.json | ||
``` | ||
|
||
Output data can be passed from PyWorkflow in a few ways. | ||
1) Redirection | ||
``` | ||
# Output from a Write CSV node is stored in a new file 'output.csv' | ||
pyworkflow execute my_workflow.json > output.csv | ||
``` | ||
2) Pipes | ||
``` | ||
# Output from a Write CSV node is searched for the phrase 'foobar' | ||
pyworkflow execute my_workflow.json | grep "foobar" | ||
``` |
Oops, something went wrong.