-
Notifications
You must be signed in to change notification settings - Fork 0
Visualization project
Distributed, scalable, and fast interactive big data visualization in the IPython Jupyter notebook with WebGL
Today's datasets are massive, and moving them is expensive. Therefore, scientists are increasingly analyzing their data remotely. There is consequently a critical need for distributed data analysis and visualization platforms.
Deploying data analysis platforms on remote servers also enables users to access their data without installing heavy software components on their local machines.
Such platforms need to support very large datasets (dealing with GBs or TBs of datasets is not uncommon nowadays). In particular, visualization technologies need to keep up with the amount of data available.
Scientists frequently run analyses interactively. In effect, exploratory research requires trials and errors, quick numerical experiments, and interactive visualization of intermediate results. All of these tasks need to execute fast. The computer has to keep up with the flow of thoughts of the scientist, which is of the order of magnitude of the second.
This notably implies that interactive visualization technologies need to be fast, even in the presence of massive datasets.
We propose to build the foundations of distributed, fast, and interactive data visualization platforms for massive datasets. We will rely on the IPython/Jupyter notebook, Vispy, and WebGL.
The IPython/Jupyter notebook is an increasingly popular platform for data analysis and visualization. Based on Web technologies, it is designed from the ground up for remote and distributed computing. Originally created as an interactive interface to the Python scientific ecosystem, it now supports many other languages (Julia, R, Haskell, Ruby, etc.).
Since IPython 2.0, one can create interactive widgets in the notebook. These widgets can interact with the Python objects in real-time through a set of carefully-designed protocols and APIs.
Vispy is a new data visualization library in Python based on OpenGL. It leverages the computational power of graphics processing units for visualizing massive datasets in real time.
WebGL is a modern technology bringing hardware-accelerated visualization to the browser. It is supported by most recent browsers, including those on mobile devices. It can handle large datasets efficiently.
In this project, we will bring all these technologies together. Our goal will be to provide scientists with a set of tools for creating distributed, fast, and interactive data visualization platforms in Python.
Rather than creating new, innovative technology, the challenge will be to make many different technologies work together within a coherent, robust, and flexible framework. These technologies consist in programming languages, libraries, and protocols:
- Python
- Javascript
- WebGL
- GLSL shaders
- Vispy
- The IPython/Jupyter notebook, which itself relies on:
- the Tornado server
- WebSockets
- backbone.js
Concretely, we propose three different approaches to bring hardware-accelerated interactive visualization to the IPython notebook. All three approaches will be interesting for different use-cases.
-
Server-side rendering. Python does all the rendering, JavaScript just captures user events and sends them to Python. Python sends PNG images to JavaScript in real-time. Requires binary WebSocket for good performance. Also requires offscreen rendering support in Vispy, which is hard to implement properly on all operating systems and graphics drivers. Useful when huge datasets cannot be sent to the browser, or with low-end clients that do not support WebGL.
-
Client-side online rendering. Python receives user events from JavaScript, processes them, emits GLIR commands, and sends them back to JavaScript instead of interpreting them. JavaScript receives the GLIR commands and executes them with WebGL. These commands sometimes come with data buffers (VBOs, textures...). These buffers may have to be transmitted via WebSocket for high performance, or via other techniques (PNG-compressed data buffers, base64 encoding/decoding (slow but generic)).
-
Client-side offline rendering. Here, a visualization script is written in Python, and exported into a standalone HTML/JavaScript/WebGL document by a Python exporter. This document contains JavaScript functions that emit GLIR commands in response to draw events or user events. The user is given the possibility to manually customize this export through a user-friendly JavaScript API. For example, the user may want to take this bit of standalone JavaScript code and integrate it in an external JavaScript application. The idea is to let users prototype interactive visualization GUIs in Python, and then move to JavaScript, instead of writing everything in JavaScript. Having a light numpy.js library will be critical here.
(2) and (3) may be combined. Light interaction patterns (like pan & zoom) may be implemented entirely in the browser with (3), while heavier interaction patterns (like loading a new dataset) require client-server communication with (2). A challenge will be to design a clean and flexible API to support both types of communication seamlessly.
This ambitious and challenging project can be broken down into a number of smaller tasks, that may be tackled by interested contributors with various skillsets.
Here is the list of the major tasks that need to be undertaken to realize this project. We sometimes precise the priority [HP,LP] and difficulty [Hard,Easy].
An old proof of concept can be found here. It shows how to take a Vispy visualization written in Python, and export it automatically to a standalone JavaScript/WebGL document. The approach we propose here is going to be radically different though. Vispy will let us export GLIR commands from a Python visualization. These commands can be interpreted by both Python and JavaScript.
-
[HP] Implement a GLIR interpretor in JavaScript/WebGL. This PR implements the Python part. We want a simple JavaScript API that takes GLIR commands as input, and executes them with WebGL.
-
[LP] Implement a JavaScript port of
vispy.gloo
, that would emit GLIR commands. This would be useful for users who want to customize their WebGL exports manually.
-
[HP,hard] Implementing an event loop in the notebook based on Tornado and PeriodicCallback. See also this thread. We want a reusable backend in Vispy that lets us have an event loop in the notebook. It would work like this:
- In JavaScript, all user events are captured and stored in an event queue (mouse move, key pressed, etc.). See this protocol draft.
- A Python callback function is called every X ms in the kernel.
- This function pops the last events from the JavaScript queue.
- Then, the framework accepts a Python function that takes the events as input, processes them, and returns some output (JSON object).
- This object is finally sent back to the browser for display.
This framework could be used for server-side or client-side rendering. It would also have to support binary websocket at some point (see this IPython PR). This will be implemented as a custom IPython widget based on backbone.js. The same framework would work similarly for both timer-based animations (regular draw updates) and user-triggered visualizations (draw updates triggered by mouse moves).
-
[HP] Exchanging data between Python and the browser: implementing multiple methods (base64, PNG, binary WebSocket).
-
[LP] Creating an IPython popup widget. This is useful when creating a complex GUI in the notebook, because one may want to put a certain visualization widget on a different screen, make it fullscreen, etc.
-
[LP] Creating dockable IPython widgets that can be organized within a responsive grid layout (example).
-
[HP, hard] Creating a very light and basic numpy.js library (possible starting point). The goal is to convert simple Python/NumPy code to JavaScript automatically. Having a basic NumPy.js library with the same API than vanilla NumPy will make it much easier. We want at least:
- ndarray structure
- indexing
- reshaping, transpose
- concatenation
- element-wise operations
- important special functions (exp, log, trigonometric...)
-
[LP, hard] Write a very basic Python to JavaScript translator (see this proof of concept, based on code from the Pythonium project). The idea is to parse Python code into an AST and generate JavaScript code dynamically. Supporting the whole Python syntax is out of question: we just want to support small Python functions implementing sequences of simple mathematical operations on NumPy arrays (also
if
statements andfor
loops, but no classes, objects, dynamic programming features, etc.). Many interaction patterns can be implemented with this small set of Python syntax. The key point is to have basic NumPy support in JavaScript, such that the heavy lifting is done by NumPy rather than pure Python.
- [HP, hard] Bringing complete offscreen rendering support to Vispy. See this issue and this example.