-
Notifications
You must be signed in to change notification settings - Fork 7
HoloViews: Philosophy
It is quite difficult to express what HoloViews is to newcomers.
HoloViews isn't a plotting library and it isn't an analysis library. It could be called a 'visualization library' which, although partially correct, fails to convey the core idea. Calling HoloViews a 'visualization library' also undersell the core concept.
This issue is because HoloViews defies easy categorization: it is a different and fundamentally more powerful way of working with data. This isn't because HoloViews is some radically new idea but because it extends the way we think and work with simple data types (e.g scalar and string literals) to the sorts of data relevant to research.
The process of either plotting or visualizing data has historically been seen as a separate process from the actual generation or acquisition of the data itself. Here is a simplified view of the classic pipeline:
+----------------+
| Initial data | +-------------------+ +--------------+
| generation or +----+Processing/Analysis+-----+Visualization |
| acquisition | +-------------------+ +--------------+
+----------------+
Although more interactive workflows as exemplified by the IPython Notebook are becoming more common, these artificial distinctions persist. Fundamentally, the philosophy behind HoloViews is that data without some corresponding visualization has no meaningful semantics, and that a visualization that does not also include the corresponding raw data is both impoverished and unsuitable for research work.
In themselves, bytes on disk or in memory have no intrinsic value. In order to work with any data, you need a handle on it. Without some way of viewing your data you won't know what those bytes represent or even that they exist. Invariably this means rendering something to screen, whether a simple string (e.g a filename) or a complex visualization.
To illustrate, the only reason you may be aware of a file on disk is because
the filename may be rendered to screen. Or as an alternative example, the easiest way
to learn about a Python variable (the variable a
, for instance) is to examine
its printed representation, rendered to screen by the Python interpreter:
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
What you are looking at isn't the array itself but an impoverished, textual representation of it. This can be illustrated as follows:
>>> np.set_printoptions(threshold=3)
>>> a
array([0, 1, 2, ..., 7, 8, 9])
The purpose of this textual representation is not necessarily to allow us to recreate
array a
but to give us an idea of how it is structured and some idea regarding
what it contains. What we need is enough information to be able to work with
this data productively.
The key point here is that for this simple data structure, we don't think of the
printed textual representation as a 'visualization' of a
even though there is
an involved process rendering the chosen string to screen. For simple literals,
we don't normally consider the visual representation of the data as a 'visualization'
that exists separately from the data itself. It is natural to think of the
textual representation as the data, not as a visualization of it.
This is what HoloViews is trying to achieve for more complex data structures.
Note that even in our trivial example of printing the array a
, the pipeline model
(acquisition -> processing-> visualization
) is in fact applicable. You need a pointer
to the relevant array in memory ('acquisition'), you need to iterate over it and build a
string representation (the 'processing' stage occurs in the __repr__
method) and then
you need to render your string to screen ('visualization'). The reason we don't think of
this simple example as either a 'visualization' or 'rendering' task is this process happens
so transparently and quickly that we don't notice it as such. The speed and ease with which
the link is made between the representation and the data itself is why we can think of
the string array([0, 1, 2, ..., 7, 8, 9])
in some ways as being a
.
Unfortunately, for more complex data, the connection between the data and its represented form is typically broken:
from matplotlib import pyplot as plt
plt.imshow(np.random.rand(10,10))
<matplotlib.image.AxesImage at 0x7f357beade10>
We now have an AxesImage
object (or more typically a Figure
object) that is clearly not
the data and is no substitute for it. Before matplotlib version 1.0, it wasn't even possible to
pickle figure objects. Even if you were supplied an unpickled figure object, this would
be no good for further processing and analysis: you want the original data. It isn't a
fault particular to matplotlib, instead this is an issue with how visualization
is consider distinct from the data itself.
Fundamentally, there is no real difference between a textual representation and a rich animation or
plot: although the latter have a richer set of display options and probably takes longer to generate,
both these types of representations are summaries about the data and not the data itself. We are
generally happy with how Python automatically uses the textual output of __repr__
to denote the
object itself so wouldn't a richer display (a Curve
or a Raster
say) be even more useful?
HoloViews aims to re-establishes the connection between the data and its visual representation in this way. By supplying metadata about the semantics of your data, the visualization comes for free, transparently and in the background, just as when we printed the array a
. To illustrate:
>>> from holoviews import Image
>>> data = np.random.rand(10,10)
>>> im = Image(data)
>>> im.data is data
True
When you look at an the Image im
in an IPython notebook, you are looking
at an (impoverished) rendering of your data although this is far less impoverished
then the text representation and far more useful. With HoloViews, the visualization is used
to denote the data instead of being divorced from it. You are looking at your actual data.
This is why working with HoloViews is different from working with a plotting library that treats visualization as some superficial addition to the core process. There are always difficult decisions to make regarding the representation of your data that must be addressed in some way, whether or not you treat visualization as a separate concern. The philosophy behind HoloViews is to enrich your data, to enable your data to present and express itself with a more useful, more powerful visual language. As a consequence, HoloViews lets you work with your data in more natural way, where representation and data remain intertwined.
HoloViews is a flexible, adjustable mirror that allows your data to present itself. Where you data has unaccountably many different faces for you to explore.