Skip to content

Latest commit

 

History

History
174 lines (135 loc) · 6.78 KB

README.md

File metadata and controls

174 lines (135 loc) · 6.78 KB

PyWorkflow

Status
Docker Test Docker build
Back-end Test back-end
Front-end Test front-end
PyWorkflow Code Coverage
UI Code Coverage

PyWorkflow is a visual programming application for building data science pipelines and workflows. It is inspired by KNIME and aims to bring the desktop-based experience to a web-based environment. PyWorkflow takes a Python-first approach and leverages the power of pandas DataFrames to bring data-science to the masses.

Pyworkflow UI

Introduction

PyWorkflow was developed with a few key principles in mind:

  1. Easily deployed. PyWorkflow can be deployed locally or remotely with pre-built Docker containers.

  2. Highly extensible. PyWorkflow has a few key nodes built-in to perform common operations, but it is built with custom nodes in mind. Any user can write a custom node of their own to perform pandas operations, or other data science packages.

  3. Advanced features for everyone. PyWorkflow is meant to cater to users with no programming experience, all the way to someone who writes Python code daily. An easy-to-use command line interface allows for batch workflow execution and scheduled runs with a tool like cron.

To meet these principles, the user interface is built on react-diagrams to enable drag-and-drop nodes and edge creation. These packaged nodes provide basic pandas functionality and easy customization options for users to create workflows tailored to their specific needs. For users looking to create custom nodes, please reference the documentation on how to write your own class.

On the back-end, a computational graph stores the nodes, edges, and configuration options using the NetworkX package. All data operations are saved in JSON format which allows for easy readability and transfer of data to other environments.

Getting Started

The back-end consists of the PyWorkflow package, to perform all graph-based operations, file storage/retrieval, and execution. These methods are triggered either via API calls from the Django web-server, or from the CLI application.

The front-end is a SPA React app (bootstrapped with create-react-app). For React to request data from Django, the proxy field is set in front-end/package.json, telling the dev server to fetch non-static data from localhost:8000 where the Django app must be running.

Docker

The easiest way to get started is by deploying both Docker containers on your local machine. For help installing Docker, reference the documentation for your specific system.

To run the application with docker-compose, run docker-compose up from the root directory (or docker-compose up --build to rebuild the images first).

Use docker-compose down to shut down the application gracefully.

The application comprises running containers of two images: the front-end and the back-end. The docker-compose.yml defines how to combine and run the two.

In order to build each image individually, from the root of the application:

  • docker build front-end --tag FE_IMAGE[:TAG]
  • docker build back-end --tag BE_IMAGE[:TAG] ex. - docker build back-end --tag backendtest:2.0

Each individual container can be run by changing to the front-end or back-end directory and running:

  • docker run -p 3000:3000 --name FE_CONTAINER_NAME FE_IMAGE[:TAG]
  • docker run -p 8000:8000 --name BE_CONTAINER_NAME BE_IMAGE[:TAG] ex. - docker run -p 8000:8000 --name pyworkflow-be backendtest:2.0

Note: there is a known issue with react-scripts v3.4.1 that may cause the front-end container to exit with code 0. If this happens, you can add -e CI=true to the docker-run command above for the front-end.

NOTE: For development outside of Docker, change ./front-end/package.json from "proxy": "http://back-end:8000" to "proxy": http://localhost:8000" to work.

After the Docker containers are started, and both front- and back-ends are running, you can access the application by loading http://localhost:3000/ in your browser.

Serve locally

Alternatively, the front- and back-ends can be compiled separately and run on your local machine.

Server (Django)

  1. Install pipenv
  • Homebrew
brew install pipenv
  • pip
pip install pipenv OR pip3 install pipenv
  1. Install dependencies Go to the back-end directory with Pipfile and Pipfile.lock.
cd back-end
pipenv install
  1. Setup your local environment
  • Create environment file with app secret
echo "SECRET_KEY='TEMPORARY SECRET KEY'" > vp/.environment
  1. Start dev server from app root
cd vp
pipenv run python3 manage.py runserver

If you have trouble running commands individually, you can also enter the virtual environment created by pipenv by running pipenv shell.

Client (react-diagrams)

In a separate terminal window, perform the following steps to start the front-end.

  1. Install Prerequisites
cd front-end
npm install
  1. Start dev server
npm start

By default, the react-scripts should open your default browser to the main application page. If not, you can go to http://localhost:3000/ in your browser.

CLI

PyWorkflow also provides a command-line interface to execute pre-built workflows without the client or server running. The CLI is packaged in the back-end directory and can be accessed through a deployed Docker container, or locally through the pipenv shell.

The CLI syntax for PyWorkflow is:

pyworkflow execute workflow-file...

For help reading from stdin, writing to stdout, batch-processing, and more check out the CLI docs for more information.

Tests

PyWorkflow has several automated tests that are run on each push to the GitHub repository through GitHub Actions. The status of each can be seen in the various badges at the top of this README.

PyWorkflow currently has unit tests for both the back-end (the PyWorkflow package) and the front-end (react-diagrams). There are also API tests using Postman to test the integration between the front- and back-ends. For more information on these tests, and how to run them, read the documentation for more information.