HoloViews: Philosophy

It is quite difficult to express what HoloViews is to newcomers.

HoloViews isn't a plotting library and it isn't an analysis library. It could be called a 'visualization library' which, although partially correct, fails to convey the core idea. Calling HoloViews a 'visualization library' also undersell the core concept.

This issue is because HoloViews defies easy categorization: it is a different and fundamentally more powerful way of working with data. This isn't because HoloViews is some radically new idea but because it extends the way we think and work with simple data types (e.g scalar and string literals) to the sorts of data relevant to research.

The classic model

The process of either plotting or visualizing data has historically been seen as a separate process from the actual generation or acquisition of the data itself. Here is a simplified view of the classic pipeline:

+----------------+                                              
|  Initial data  |    +-------------------+     +--------------+
|  generation or +----+Processing/Analysis+-----+Visualization |
|  acquisition   |    +-------------------+     +--------------+
+----------------+

Although more interactive workflows as exemplified by the IPython Notebook are becoming more common, these artificial distinctions persist. Fundamentally, the philosophy behind HoloViews is that data without some corresponding visualization has no meaningful semantics, and that a visualization that does not also include the corresponding raw data is both impoverished and unsuitable for research work.

Representation is everything

In themselves, bytes on disk or in memory have no intrinsic value. In order to work with any data, you need a handle on it. Without some way of viewing your data you won't know what those bytes represent or even that they exist. Invariably this means rendering something to screen, whether a simple string (e.g a filename) or a complex visualization.

To illustrate, the only reason you may be aware of a file on disk is because the filename may be rendered to screen. Or as an alternative example, the easiest way to learn about a Python variable (the variable a, for instance) is to examine its printed representation, rendered to screen by the Python interpreter:

>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

What you are looking at isn't the array itself but an impoverished, textual representation of it. This can be illustrated as follows:

>>> np.set_printoptions(threshold=3)
>>> a
array([0, 1, 2, ..., 7, 8, 9])

The purpose of this textual representation is not necessarily to allow us to recreate array a but to give us an idea of how it is structured and some idea regarding what it contains. What we need is enough information to be able to work with this data productively.

The key point here is that for this simple data structure, we don't think of the printed textual representation as a 'visualization' of a even though there is an involved process rendering the chosen string to screen. For simple literals, we don't normally consider the visual representation of the data as a 'visualization' that exists separately from the data itself. It is natural to think of the textual representation as the data, not as a visualization of it.

This is what HoloViews is trying to achieve for more complex data structures.

Note that even in our trivial example of printing the array a, the pipeline model (acquisition -> processing-> visualization) is in fact applicable. You need a pointer to the relevant array in memory ('acquisition'), you need to iterate over it and build a string representation (the 'processing' stage occurs in the __repr__ method) and then you need to render your string to screen ('visualization'). The reason we don't think of this simple example as either a 'visualization' or 'rendering' task is this process happens so transparently and quickly that we don't notice it as such. The speed and ease with which the link is made between the representation and the data itself is why we can think of the string array([0, 1, 2, ..., 7, 8, 9]) in some ways as being a.

Visualization is representation

Unfortunately, for more complex data, the connection between the data and its represented form is typically broken:

from matplotlib import pyplot as plt
plt.imshow(np.random.rand(10,10))
<matplotlib.image.AxesImage at 0x7f357beade10>

We now have an AxesImage object (or more typically a Figure object) that is clearly not the data and is no substitute for it. Before matplotlib version 1.0, it wasn't even possible to pickle figure objects. Even if you were supplied an unpickled figure object, this would be no good for further processing and analysis: you want the original data. It isn't a fault particular to matplotlib, instead this is an issue with how visualization is consider distinct from the data itself.

Fundamentally, there is no real difference between a textual representation and a rich animation or plot: although the latter have a richer set of display options and probably takes longer to generate, both these types of representations are summaries about the data and not the data itself. We are generally happy with how Python automatically uses the textual output of __repr__ to denote the object itself so wouldn't a richer display (a Curve or a Raster say) be even more useful?

HoloViews aims to re-establishes the connection between the data and its visual representation in this way. By supplying metadata about the semantics of your data, the visualization comes for free, transparently and in the background, just as when we printed the array a. To illustrate:

>>> from holoviews import Image
>>> data = np.random.rand(10,10)
>>> im = Image(data)
>>> im.data is data
True

When you look at an the Image im in an IPython notebook, you are looking at an (impoverished) rendering of your data although this is far less impoverished then the text representation and far more useful. With HoloViews, the visualization is used to denote the data instead of being divorced from it. You are looking at your actual data.

This is why working with HoloViews is different from working with a plotting library that treats visualization as some superficial addition to the core process. There are always difficult decisions to make regarding the representation of your data that must be addressed in some way, whether or not you treat visualization as a separate concern. The philosophy behind HoloViews is to enrich your data, to enable your data to present and express itself with a more useful, more powerful visual language. As a consequence, HoloViews lets you work with your data in more natural way, where representation and data remain intertwined.

HoloViews is a flexible, adjustable mirror that allows your data to present itself. Where you data has unaccountably many different faces for you to explore.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HoloViews: Philosophy

The classic model

Representation is everything

Visualization is representation

Clone this wiki locally