Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Better Data Sources #262

Open
wants to merge 31 commits into
base: 5.0.0-dev
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
3714bd5
Fixes for array data source tests.
corranwebster Dec 17, 2014
d9ba227
Use unitest2.
corranwebster Dec 17, 2014
8bfa702
More robust checking of get_bounds()
corranwebster Dec 17, 2014
6edd34e
Fix bare except statement.
corranwebster Dec 17, 2014
262c2bc
Add image data tests, minor fixes for array data tests.
corranwebster Dec 17, 2014
ffa0e9e
Replace unittest with unittest2
corranwebster Dec 17, 2014
5e16075
Add image reading tests for ImageData.
corranwebster Dec 17, 2014
bee330b
Ensure libpng is installed on Travis.
corranwebster Dec 17, 2014
e7c0551
...and zlib for travis.
corranwebster Dec 17, 2014
1f182e2
try again with Travis
corranwebster Dec 17, 2014
ddd8c40
Install PIL via apt get
corranwebster Dec 17, 2014
6eeb229
Yet one more attempt to get PIL to includ PNG support.
corranwebster Dec 17, 2014
2d90823
Add tests for function data source.
corranwebster Dec 17, 2014
530f2bb
Test range changes.
corranwebster Dec 17, 2014
0479ba4
Add tests for metadata.
corranwebster Dec 17, 2014
b7f87fb
Modernise/improve grid data source tests.
corranwebster Dec 17, 2014
60aed7c
Add multi array data source test case.
corranwebster Dec 17, 2014
93772da
Tests around serialization methods and reverse maps.
corranwebster Dec 18, 2014
9897383
Update CHANGES.txt.
corranwebster Dec 18, 2014
f25cb9f
Flake8.
corranwebster Dec 18, 2014
bfb0883
Remove set literal from test for 2.6 comptibility.
corranwebster Dec 18, 2014
2b25fda
Add setup to array data source.
corranwebster Dec 19, 2014
d743dd9
Add setUp method to multiarray data source test case.
corranwebster Dec 23, 2014
bc02978
Improvements based on suggestions in PR.
corranwebster Dec 23, 2014
6ba47ec
Fix typo.
corranwebster Dec 23, 2014
717aea8
Add a BaseArrayDataSource and tests, plus chnges to ABC.
corranwebster Dec 29, 2014
1423fb1
Remove references to DimensionTrait.
corranwebster Dec 29, 2014
23a0df3
More careful use of format.
corranwebster Dec 29, 2014
71291c6
Merge branch 'enh/data-source-tests' into feature/better-data-sources
corranwebster Dec 29, 2014
ffc4953
Refactor BaseArrayDataSource into BaseDataSource.
corranwebster Jan 5, 2015
b879647
Add PointTrait back for backwards compatibility.
corranwebster Jan 5, 2015
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,11 @@ python:
before_install:
- sudo apt-get update
- sudo apt-get install python-numpy swig
# Simlinks for PIL compilation
- sudo ln -s /usr/lib/`uname -i`-linux-gnu/libfreetype.so /usr/lib/
- sudo ln -s /usr/lib/`uname -i`-linux-gnu/libjpeg.so /usr/lib/
- sudo ln -s /usr/lib/`uname -i`-linux-gnu/libpng.so /usr/lib/
- sudo ln -s /usr/lib/`uname -i`-linux-gnu/libz.so /usr/lib/
- source .travis_before_install
install:
- pip install --install-option="--no-cython-compile" cython
Expand Down
4 changes: 4 additions & 0 deletions CHANGES.txt
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,10 @@ What's new in Chaco 4.6.0

Change summary since 4.5.0

Enhancements

* More comprehensive testing for AbstractDataSource subclasses (PR #244).

Fixes

* Workaround RuntimeWarnings from nanmin and nanmax in ImageData.get_bounds
Expand Down
2 changes: 2 additions & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
@@ -1 +1,3 @@
include chaco/*.h
include chaco/tests/data/PngSuite/*.png
include chaco/tests/data/PngSuite/LICENSE.txt
184 changes: 109 additions & 75 deletions chaco/abstract_data_source.py
Original file line number Diff line number Diff line change
@@ -1,136 +1,170 @@
"""
Defines the AbstractDataSource class.

This is the abstract base class for all sources which provide data to Chaco
plots and renderers.


"""

from __future__ import absolute_import, division, print_function, unicode_literals

from traits.api import Bool, Dict, Event, HasTraits
from traits.api import ABCHasTraits, Dict, Event, Int, Str

# Local relative imports
from .base import DimensionTrait
from .base import ValueType

class AbstractDataSource(HasTraits):
""" This abstract interface must be implemented by any class supplying data
to Chaco.

Chaco does not have a notion of a "data format". For the most part, a data
source looks like an array of values with an optional mask and metadata.
If you implement this interface, you are responsible for adapting your
domain-specific or application-specific data to meet this interface.
class AbstractDataSource(ABCHasTraits):
""" Abstract interface for data sources used by Chaco renderers

This abstract interface must be implemented by any class supplying data
to Chaco renderers. Chaco does not have a notion of a "data format".
For the most part, a data source looks like an array of values with an
optional mask and metadata. If you implement this interface, you are
responsible for adapting your domain-specific or application-specific data
to meet this interface.

Chaco provides some basic data source implementations. In most cases, the
easiest strategy is to create one of these basic data source with the
numeric data from a domain model. In cases when this strategy is not
possible, domain classes (or an adapter) must implement AbstractDataSource.
"""

# The dimensionality of the value at each index point.
# Subclasses re-declare this trait as a read-only trait with
# the right default value.
value_dimension = DimensionTrait
Notes
-----

# The dimensionality of the indices into this data source.
# Subclasses re-declare this trait as a read-only trait with
# the right default value.
index_dimension = DimensionTrait
The contract implied by the AbstractDataSource interface is that data
arrays provided by the get methods of the class should not be treated as
read-only arrays, and that any change to the data or mask (such as by
subclasses which provide a `set_data` method) will be accompanied by the
`data_changed` event being fired.

# A dictionary keyed on strings. In general, it maps to indices (or tuples
# of indices, depending on **value_dimension**), as in the case of
# selections and annotations. Applications and renderers can add their own
# custom metadata, but must avoid using keys that might result in name
# collision.
metadata = Dict
"""

# Event that fires when the data values change.
#: The dimension of the values provided by the data source.
#: Implementations of the interface will typically redefine this as a
#: read-only trait with a particular value.
value_type = ValueType('scalar')

#: The dimension of the indices into the data source.
#: Implementations of the interface will typically redefine this as a
#: read-only trait with a particular value.
dimension = Int(1)

#: The metadata for the data source.
#: Metadata values are typically used for annotations and selections
#: on the data source, and so each keyword corresponds to a collection of
#: indices into the data source. Applications and renderers can add their
#: own custom metadata, but must avoid using keys that might result in name
#: collision.
metadata = Dict(Str)

#: Event that fires when the data values change.
data_changed = Event

# Event that fires when just the bounds change.
#: Event that fires when the bounds (ie. the extent of the values) change.
bounds_changed = Event

# Event that fires when metadata structure is changed.
#: Event that fires when metadata structure is changed.
metadata_changed = Event

# Should the data that this datasource refers to be serialized when
# the datasource is serialized?
persist_data = Bool(True)

#------------------------------------------------------------------------
# Abstract methods
# AbstractDataSource interface
#------------------------------------------------------------------------

def get_data(self):
"""get_data() -> data_array
"""Get an array representing the data stored in the data source.

Returns
-------

Returns a data array of the dimensions of the data source. This data
array must not be altered in-place, and the caller must assume it is
read-only. This data is contiguous and not masked.
data_array : array
An array of the dimensions specified by the index and value
dimension traits. This data array must not be altered in-place,
and the caller must assume it is read-only. This data is
contiguous and not masked.

In the case of structured (gridded) 2-D data, this method may return
two 1-D ArrayDataSources as an optimization.
"""
raise NotImplementedError

def get_data_mask(self):
"""get_data_mask() -> (data_array, mask_array)
"""Get arrays representing the data and the mask of the data source.

Returns
-------

Returns the full, raw, source data array and a corresponding binary
mask array. Treat both arrays as read-only.
data_array, mask: array of values, array of bool
Returns the full, raw, source data array and a corresponding binary
mask array. Treat both arrays as read-only.

The mask is a superposition of the masks of all upstream data sources.
The length of the returned array may be much larger than what
get_size() returns; the unmasked portion, however, matches what
get_size() returns.

The mask is a superposition of the masks of all upstream data sources.
The length of the returned array may be much larger than what
get_size() returns; the unmasked portion, however, matches what
get_size() returns.
"""
raise NotImplementedError

def is_masked(self):
"""is_masked() -> bool
"""Whether or not the data is masked.

Returns
-------

is_masked : bool
True if this data source's data uses a mask. In this case,
to retrieve the data, call get_data_mask() instead of get_data().

Returns True if this data source's data uses a mask. In this case,
to retrieve the data, call get_data_mask() instead of get_data().
If you call get_data() for this data source, it returns data, but that
data might not be the expected data.
"""
raise NotImplementedError

def get_size(self):
"""get_size() -> int
"""The size of the data.

Returns an integer estimate or the exact size of the dataset that
get_data() returns for this object. This method is useful for
down-sampling.
"""
raise NotImplementedError
This method is useful for down-sampling.

def get_bounds(self):
"""get_bounds() -> tuple(min, max)
Returns
-------

Returns a tuple (min, max) of the bounding values for the data source.
In the case of 2-D data, min and max are 2-D points that represent the
bounding corners of a rectangle enclosing the data set. Note that
these values are not view-dependent, but represent intrinsic properties
of the data source.
size : int or tuple of ints
An estimate (or the exact size) of the dataset that get_data()
returns for this object. For data sets with n-dimensional index
values, this can return an n-tuple indicating the size in each
dimension.

If data is the empty set, then the min and max vals are 0.0.
"""
raise NotImplementedError

def get_bounds(self):
"""Get the minimum and maximum finite values of the data.

### Persistence ###########################################################
Returns
-------

def _metadata_default(self):
return {"selections":[], "annotations":[]}
bounds : tuple of min, max
A tuple (min, max) of the bounding values for the data source.
In the case of n-dimensional data values, min and max are
n-dimensional points that represent the bounding corners of a
rectangle enclosing the data set. Note that these values are not
view-dependent, but represent intrinsic properties of the data
source.

def __getstate__(self):
state = super(AbstractDataSource,self).__getstate__()
Raises
------

# everything but 'metadata'
for key in ['value_dimension', 'index_dimension', 'persist_data']:
if state.has_key(key):
del state[key]
TypeError:
If data's value type is not amenable to sorting, a TypeError can
be raised.

return state
ValueError:
If data is empty, all NaN, or otherwise has no sensible ordering,
then this should raise a ValueError.

"""
raise NotImplementedError


### Trait defaults #######################################################

# EOF
def _metadata_default(self):
return {"selections":[], "annotations":[]}
2 changes: 1 addition & 1 deletion chaco/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@

from __future__ import absolute_import, division, print_function, unicode_literals

from .base import NumericalSequenceTrait, PointTrait, ImageTrait, DimensionTrait, \
from .base import NumericalSequenceTrait, PointTrait, ImageTrait, \
SortOrderTrait, bin_search, reverse_map_1d, right_shift, \
left_shift, sort_points, find_runs, arg_find_runs, \
point_line_distance
Expand Down
65 changes: 35 additions & 30 deletions chaco/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,34 +8,50 @@
from math import radians, sqrt

# Major library imports
from numpy import (array, argsort, concatenate, cos, dot, empty, nonzero,
pi, sin, take, ndarray)
from numpy import (array, argsort, concatenate, cos, dot, empty, nonzero, pi,
sin, take, ndarray)

# Enthought library imports
from traits.api import CArray, Enum, Trait
from traits.api import ArrayOrNone, Either, Enum


# Exceptions

class DataUpdateError(RuntimeError):
pass

class DataInvalidError(ValueError):
pass

class DataBoundsError(ValueError):
pass

# Dimensions

# A single array of numbers.
NumericalSequenceTrait = Trait(None, None, CArray(value=empty(0)))
NumericalSequenceTrait = ArrayOrNone(shape=(None,), value=empty(0))

# A single array of arbitrary length vectors, or a collection of sequences.
SequenceVectorTrait = ArrayOrNone(shape=(None, None), value=empty(shape=(0, 0)))

# A sequence of pairs of numbers, i.e., an Nx2 array.
PointTrait = Trait(None, None, CArray(value=empty(0)))
PointSequenceTrait = ArrayOrNone(shape=(None, 2), value=empty(shape=(0, 2)))
PointTrait = PointSequenceTrait

# An NxM array of numbers.
ImageTrait = Trait(None, None, CArray(value=empty(0)))
ScalarImageTrait = ArrayOrNone(shape=(None, None), value=empty(shape=(0, 0)))
ColorImageTrait = ArrayOrNone(shape=(None, None, (3, 4)), value=empty(shape=(0, 0, 3)))
ImageTrait = Either(ScalarImageTrait, ColorImageTrait)

# An 3D array of numbers of shape (Nx, Ny, Nz)
CubeTrait = Trait(None, None, CArray(value=empty(0)))
CubeTrait = ArrayOrNone(shape=(None, None, None), value=empty(shape=(0, 0, 0)))

#: The fundamental value types that data sources can take. These can be
#: agumented by adding to `ValueType.values`.
ValueType = Enum("scalar", "point", "color", "index", "mask", "text",
"datetime")

# This enumeration lists the fundamental mathematical coordinate types that
# Chaco supports.
DimensionTrait = Enum("scalar", "point", "image", "cube")

# Linear sort order.
#: Linear sort order.
SortOrderTrait = Enum("ascending", "descending", "none")


Expand All @@ -62,35 +78,25 @@ def n_gon(center, r, nsides, rot_degrees=0):
return [poly_point(center, r, i*theta+rotation) for i in range(nsides)]


# Ripped from Chaco 1.0's plot_base.py
def bin_search(values, value, ascending):
"""
Performs a binary search of a sorted array looking for a specified value.

Returns the lowest position where the value can be found or where the
array value is the last value less (greater) than the desired value.
Returns -1 if *value* is beyond the minimum or maximum of *values*.
Returns -1 if `value` is beyond the minimum or maximum of `values`.
"""
ascending = ascending > 0
if ascending:
if ascending > 0:
if (value < values[0]) or (value > values[-1]):
return -1
index = values.searchsorted(value, 'right') - 1
else:
if (value < values[-1]) or (value > values[0]):
return -1
lo = 0
hi = len( values )
while True:
mid = (hi + lo) // 2
midval = values[ mid ]
if midval == value:
return mid
elif (ascending and midval > value) or (not ascending and midval < value):
hi = mid
else:
lo = mid
if lo >= (hi - 1):
return lo
ascending_values = values[::-1]
index = len(values) - ascending_values.searchsorted(value, 'left') - 1
return index


def reverse_map_1d(data, pt, sort_order, floor_only=False):
"""Returns the index of *pt* in the array *data*.
Expand Down Expand Up @@ -122,7 +128,6 @@ def reverse_map_1d(data, pt, sort_order, floor_only=False):
if ndx == -1:
raise IndexError("value outside array data range")


# Now round the index to the closest matching index. Do this
# by determining the width (in value space) of each cell and
# figuring out which side of the midpoint pt falls into. Since
Expand Down
Loading