Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added framework to easily translate between Data and other classes #2049

Merged
merged 22 commits into from
Jan 8, 2020

Conversation

astrofrog
Copy link
Member

@astrofrog astrofrog commented Jul 25, 2019

As discussed in glue-viz/glue-jupyter#30, it would be nice to have a way of easily being able to get more 'intuitive' data objects that users are used to rather than glue Data objects. This PR is a work in progress to add a new data translator registry and methods on the data collection to be able to set or get data as non-glue data objects, and being able to also get subsets as these kinds of objects.

With this infrastructure, one can define a data translator as follows:

from glue.config import data_translator
from glue.core import Data, Subset
from glue.core.coordinates import WCSCoordinates

from astropy import units as u
from astropy.nddata import CCDData


@data_translator(CCDData)
class CCDDataHandler:

    def to_data(self, obj):
        data = Data()
        if obj.wcs is not None:
            data.coords = WCSCoordinates(wcs=obj.wcs)
        data['array'] = obj.data
        data.get_component('array').units = str(obj.unit)
        data.meta.update(obj.meta)
        return data

    def to_object(self, data):
        if isinstance(data, Subset):
            subset = data
            data = subset.data
        else:
            subset = None
        if isinstance(data.coords, WCSCoordinates):
            wcs = data.coords.wcs
        else:
            wcs = None
        cid = data.main_components[0]
        comp = data.get_component(cid)
        unit = u.Unit(comp.units)
        ccddata = CCDData(data[cid], wcs=wcs, unit=unit)
        if subset is not None:
            ccddata.mask = subset.to_mask()
        return ccddata

with this translator registered (thanks to the @data_translator decorator), we can then do:

In [6]: import numpy as np                                                                                                                             

In [7]: from glue.core import DataCollection                                                                                                           

In [8]: dc = DataCollection()                                                                                                                          

In [9]: from astropy.nddata import CCDData                                                                                                             

In [10]: ccddata = CCDData(np.random.random((3, 3)), unit='mJy')                                                                                       

In [12]: dc['image'] = ccddata                                                                                                                         

In [13]: dc['image'].get_object()                                                                                                                        
Out[13]: 
CCDData([[0.92975129, 0.43873554, 0.93865833],
         [0.62192035, 0.04336501, 0.21733942],
         [0.85222676, 0.55321588, 0.20401203]])

In [14]: dc['image']                                                                                                                                         
Out[14]: Data (label: image)

In [15]: dc.new_subset_group(subset_state=dc['image'].id['array'] > 0.5, label='values > 0.5')                                                         
Out[15]: <glue.core.subset_group.SubsetGroup at 0x11852d240>

In [16]: subset = dc['image'].get_subset_object(subset_id=0) 
    ...: subset                                                                                                                                        
Out[16]: 
CCDData([[0.92975129, 0.43873554, 0.93865833],
         [0.62192035, 0.04336501, 0.21733942],
         [0.85222676, 0.55321588, 0.20401203]])

In [18]: subset.mask                                                                                                                                   
Out[18]: 
array([[ True, False,  True],
       [ True, False, False],
       [ True,  True, False]])

@codecov
Copy link

codecov bot commented Jul 26, 2019

Codecov Report

Merging #2049 into master will increase coverage by 0.09%.
The diff coverage is 98.3%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2049      +/-   ##
==========================================
+ Coverage   89.63%   89.72%   +0.09%     
==========================================
  Files         399      401       +2     
  Lines       38694    39095     +401     
==========================================
+ Hits        34684    35079     +395     
- Misses       4010     4016       +6
Impacted Files Coverage Δ
glue/viewers/common/qt/base_widget.py 73.38% <ø> (ø) ⬆️
glue/viewers/common/qt/data_viewer.py 96.66% <ø> (ø) ⬆️
glue/qglue.py 93.63% <100%> (+0.56%) ⬆️
glue/core/data_factories/pandas.py 85% <100%> (+1.98%) ⬆️
glue/core/tests/test_data_collection.py 100% <100%> (ø) ⬆️
glue/core/subset.py 87.77% <100%> (+0.02%) ⬆️
glue/core/data_factories/tests/test_pandas.py 100% <100%> (ø)
glue/utils/misc.py 86.17% <90.9%> (+0.46%) ⬆️
glue/core/data_collection.py 89.52% <95.45%> (+0.57%) ⬆️
glue/core/data.py 89.76% <96.07%> (+0.48%) ⬆️
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ff949ad...3d1a874. Read the comment docs.

@astrofrog
Copy link
Member Author

astrofrog commented Jul 26, 2019

I've now also added the ability to get translate the selection definitions - for example if I define the following translation class for astropy regions:

from glue.config import subset_definition_translator
from glue.core.subset import RoiSubsetState
from glue.core.roi import RectangularROI

from regions import RectanglePixelRegion, PixCoord


@subset_definition_translator('astropy-regions')
class AstropyRegionsHandler:

    def to_object(self, subset):

        data = subset.data

        if data.ndim != 2:
            raise NotImplementedError("Can only handle 2-d datasets at this time")

        subset_state = subset.subset_state

        if isinstance(subset_state, RoiSubsetState):

            if subset_state.xatt != data.pixel_component_ids[1]:
                raise ValueError('subset state xatt should be x pixel coordinate')

            if subset_state.yatt != data.pixel_component_ids[0]:
                raise ValueError('subset state yatt should be y pixel coordinate')

            roi = subset_state.roi
            if isinstance(roi, RectangularROI):
                xcen = 0.5 * (roi.xmin + roi.xmax)
                ycen = 0.5 * (roi.ymin + roi.ymax)
                width = roi.xmax - roi.xmin
                height = roi.ymax - roi.ymin
                return RectanglePixelRegion(PixCoord(xcen, ycen), width, height)
            else:
                raise NotImplementedError("ROIs of type {0} are not yet supported".format(roi.__class__.__name__))

        else:
            raise NotImplementedError("Subset states of type {0} are not yet supported".format(subset_state.__class__.__name__))

I can then do the following:

In [8]: from glue.core.roi import RectangularROI                                                                                                       

In [9]: from glue.core.subset import RoiSubsetState                                                                                                    

In [10]: subset_state = RoiSubsetState(dc['image'].pixel_component_ids[1], 
    ...:                               dc['image'].pixel_component_ids[0], 
    ...:                               RectangularROI(1, 3.5, -0.2, 3.3))                                                                              

In [12]: dc.new_subset_group(subset_state=subset_state, label='spatial selection')                                                                     
Out[12]: <glue.core.subset_group.SubsetGroup at 0x10bd9e8d0>

In [13]: dc['image'].get_selection_definition(format='astropy-regions')                                                                                         
Out[13]: <RectanglePixelRegion(PixCoord(x=2.25, y=1.5499999999999998), width=2.5, height=3.5, angle=0.0 deg)>

Note that by default if there is just one dataset and one subset, as above, you don't need to specify the data label or subset label.

In terms of where translators should live for e.g. CCDData, Spectrum1D, and astropy regions, I think it's finally time for me to start a glue-astronomy package to contain astronomy-specific things. If we write translation functions for e.g. pandas data frames and so on, those could live in the core package.

Anyway, this is ready to play around with - we should discuss the API once you've had a chance to take a looks!

@astrofrog astrofrog force-pushed the high-level-object-translation branch 2 times, most recently from a5d3a7b to 2de4e60 Compare August 15, 2019 15:28
@astrofrog
Copy link
Member Author

Just FYI I've started adding translators for CCDData, Spectrum1D, and astropy regions to a new glue-astronomy plugin: https://glue-astronomy.readthedocs.io/en/latest/

@eteq
Copy link
Contributor

eteq commented Aug 16, 2019

I played around with this for a bit today w/ glue-jupyter, @astrofrog (not the regions part yet, but just the data part) - you can see my results here:
https://gist.github.com/eteq/67c5991b9e4736312e7e9c4442025cf7 (of course the widgets don't render, but hopefully you can get the gist anyway).

So the first-level thing is: in the case of your example from above, it worked great, so that's promising!

However, second-level, the current state is: I struggled a lot with understanding how to make my own translators, which I was intentionally trying to do on my own as sort of a "canary in a coal mine" experiment on the theory that users or downstream devs might have lots of their own native objects they want to carry around in glue. I think my initial take-aways from that are:

  1. I got myself quite caught up in whether I was wrapping a component vs an actual data object. In fact in the first example what I really needed was a component-level to_object rather than a data-level translator. I'm not sure whether that means these need to be treated separately, or if it means I needed to be cleverer about how I wrote the translator, but hopefully you can get the idea from the gist?

  2. The "to and then from" approach the translators use leads to issues. As you can see from the spectrum example, I ended up sort of "throwing up my hands" and literally stuffing the whole object into the meta and then returning it again, because it wasn't clear to me how I should map some of the spectrum attributes onto their glue equivalents

2 in particular is worrying me a bit, but also suggests a possible adaptation/alteration of this approach: is it possible for the "internal" representation to actually be the native data object, and essentially have to_data in the translator sort of be a "lazy" operation? That is, it only extracts data into a glue-appropriate representation when glue asks for it, and then to_object is not necessary because you just return the true native data object which was there all along? Or is that too radical of a change?

@astrofrog
Copy link
Member Author

astrofrog commented Aug 20, 2019

@eteq - thanks for testing this out! Here are some thoughts/comments:

  1. I agree that when used in a glue-jupyter context some additional work is needed to avoid having to worry about the data collection. I've done some work in Allow data as strings and change to kwarg-only arguments for viewers glue-jupyter#127 to start to address this.

  2. This is trickier - the issue is that we need to be able to construct data objects (e.g. Spectrum1D or a pandas DataFrame) for subsets as well as datasets, so we need a recipe for how to construct these objects from Data or Subset, not just a way to return the original object. I think this is clearer for the case of DataFrame or Astropy Table where you need a way to say how to construct a DataFrame/Table of the subset. I think a better approach will likely be to just write better documentation and make fixes to Data to make sure that it is easy to construct. For the specific example of:

         if obj.wcs is not None:
             data.coords = WCSCoordinates(wcs=obj.wcs)

the issue is actually with specutils, because obj.wcs is neither a FITS-WCS object nor an APE-14 compliant WCS (at least it does not inherit from the APE-14 base class for now) so glue doesn't know what to do with it. For simplicity, my implementation uses SpectralCoordinate (see here).

I don't think we can easily make glue understand arbitrary third-party coordinate objects, and some effort is required when writing the translator to convert things to what glue understands. But glue could do better type checking and give more explicit errors when doing the things you tried.

@astrofrog
Copy link
Member Author

@eteq - with glue-viz/glue-jupyter#127 you should be able to do:

app.add_data(galaxies=datadct)
app.scatter2d(x='distance', y='velocity', data='galaxies')

instead of:

data = app.add_data(galaxies=datadct)[0]
app.scatter2d('distance', 'velocity', data=data)

and

ccddata = CCDData(np.random.random((3, 3)), unit='mJy')  
app.add_data(image=ccddata)
app.imshow(data='image')

instead of

ccddata = CCDData(np.random.random((3, 3)), unit='mJy')  
app.data_collection['image'] = ccddata 
app.imshow(data=app.data_collection['image'])

Does that help a little in terms of the confusion with the data collection stuff? I could then add convenience methods on app to get the native object back for a dataset or subset, and you should then never have to use the data collection directly.

@astrofrog
Copy link
Member Author

Note to self: I should also add DataCollection.keys()

@ibusko
Copy link

ibusko commented Sep 19, 2019

I looked at the code examples, but didn't try to run anything. It seems to me a very good idea. Allows users to stay in their comfort zone by letting them access their objects with their familiar APIs, even though they are also glue Data instances. This ultimately contributes for a higher rate of buy-in by the user community.

The only concern that came to mind is that it has the potential to generate help calls, from users that are not familiar with the innards and capabilities of glue Data objects, but venture anyways in developing their own translators. We can imagine that there will be cases where mapping in between the two worlds will be difficult, obscure, or even impossible. This can be of course mitigated by good documentation, including some worked examples and plenty of pointers to the already good glue docs.

@ibusko
Copy link

ibusko commented Sep 24, 2019

I am getting import errors like this
image
after I pip-installed Tom's fork:branch astrofrog:high-level-object-translation. I am missing something, but can't figure out what.

The environment has these versions:
image

@eteq
Copy link
Contributor

eteq commented Sep 25, 2019

@ibusko - the notebook I referenced above was made with these versions:

glue-core                 0.14.0.dev0
glue-jupyter              0.0.0                     dev_0    <develop> <- SHA 8d6fcfdf9b5f5f0f4f7703261fd426202a9eb591
glue-vispy-viewers        0.11.dev0                pypi_0    pypi

Maybe you can try those first just to see if that works for you ? I think to work with the suggestions from #2049 (comment) you'll need to get a more recent glue-jupyter, though, so it's possible something it that might break part of my notebook? Easiest thing is probably to just try it and see...

@ibusko
Copy link

ibusko commented Sep 26, 2019

I tried
pip install 'glue-core==0.14.0' 'glue-vispy-viewers==0.11' 'glue-jupyter==0.0.0' --force-reinstall
and some partial variants of it. None worked. Either because of cross-requirements, but ultimately because glue-jupyter==0.0.0 can't be found.

@brechmos-stsci
Copy link

I started (re-) implementing @eteq's gist into a notebook and found some things...

  1. app.data_collection['spec'] = something can be run multiple times and the DataCollection increases in size, which is kinda fine until you try and do app.data_collection['spec'].get_object() as it throws an exception ValueError: Several datasets were found with the label 'spec'. I kinda think that if one does app.data_collection['spec'] = spec1d it should replace not create a new item in the data collection with the same meta name.

  2. ND-type spectrum1D. I tried doing:

spec_vec = Spectrum1D(spectral_axis=np.arange(10)*u.um, flux=np.random.random((6, 10))*u.Jy)

app.data_collection['spec_vec'] = spec_vec
spec_vec_out = app.data_collection['spec_vec'].get_object()

but it threw an exception ValueError: The dimensions of component flux are incompatible with the dimensions of this data: (6, 10) vs (10,)

  1. Spectral-Cube, I believe there is a similar issue with trying to interpret more than one flux axis.
@data_translator(SpectralCube)
class SpectralCubeHandler:
    def to_data(self, obj):
        data = Data()
        data['spectral_axis'] = obj.spectral_axis
        data['flux'] = np.array(obj)
        data.get_component('flux').units = str(obj.unit)
        data.meta['s3d'] = obj
        return data

    def to_object(self, data):
        return data.meta['s3d']
import spectral_cube
from astropy.utils import data
from spectral_cube import SpectralCube
fn = data.get_pkg_data_filename('tests/data/adv.fits', 'spectral_cube')
cube = SpectralCube.read(fn)
app.data_collection['cube'] = cube
spec_vec_out = app.data_collection['cube'].get_object()

gives me:

ValueError: The dimensions of component flux are incompatible with the dimensions of this data: (4, 3, 2) vs (4,)

For all of these I could be mis-inerpreting things.

@eteq @astrofrog

@astrofrog astrofrog force-pushed the high-level-object-translation branch from 8d40d2e to 5d34e5b Compare October 28, 2019 11:03
@astrofrog
Copy link
Member Author

I'm working on this again!

@brechmos-stsci - thanks for the detailed testing. The first issue should now be fixed here (setting e.g. dc['label'] = ... will overwrite previous existing data)

@astrofrog
Copy link
Member Author

Regarding the issue with the multi-dimensional flux columns, this will require some more thought, because glue data objects haven't previously been used to store data where the data has more dimensions than the number of coordinates. We can definitely get this to work somehow, but let's discuss that in glue-viz/glue-astronomy#2 instead. For now, let's focus on cases where there are no 'vector' attributes for the sake of getting this PR merged.

@astrofrog astrofrog force-pushed the high-level-object-translation branch from 5d34e5b to 80257aa Compare November 1, 2019 15:37
@astrofrog
Copy link
Member Author

In the interest of moving things forward, I am going to merge this then will try and improve the implementations of various translations in glue-astronomy. Once this is ready, we can review whether the current API is suitable or not, but perfect is the enemy of good so I'd rather get something in :)

@astrofrog astrofrog merged commit 61cb152 into glue-viz:master Jan 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants