Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nesting columns prior to crossmatching creates delayed error #551

Open
2 of 3 tasks
gitosaurus opened this issue Jan 27, 2025 · 1 comment
Open
2 of 3 tasks

Nesting columns prior to crossmatching creates delayed error #551

gitosaurus opened this issue Jan 27, 2025 · 1 comment
Labels
bug Something isn't working

Comments

@gitosaurus
Copy link
Contributor

gitosaurus commented Jan 27, 2025

Bug report

# LSDB version 0.4.4
import lsdb
ztf_db = lsdb.read_hats(
    '/data3/epyc/data3/hats/catalogs/ztf_dr22/ztf_lc',
    margin_cache='/data3/epyc/data3/hats/catalogs/ztf_dr22/ztf_lc_10arcs')
nest_lc = ['hmjd', 'mag', 'magerr']
ztf_db = ztf_db.nest_lists(
    base_columns=[c for c in ztf_db.columns if c not in nest_lc],
    list_columns=nest_lc,
    name='lc')

gaia3 = lsdb.read_hats(
    'https://data.lsdb.io/hats/gaia_dr3/gaia', 
    margin_cache='https://data.lsdb.io/hats/gaia_dr3/gaia_10arcs')

cm = gaia3.crossmatch(ztf_db)
cmc = cm.compute()

The above yields the following error:

File ~/.conda/envs/dtj1s-py3.12/lib/python3.12/site-packages/pyarrow/array.pxi:4079, in pyarrow.lib.StructArray.from_arrays()

TypeError: Expected Array, got <class 'pyarrow.lib.ChunkedArray'>

If ztf_db is not nested prior to the crossmatch, the crossmatch succeeds.

Before submitting
Please check the following:

  • I have described the situation in which the bug arose, including what code was executed, information about my environment, and any applicable data others will need to reproduce the problem.
  • I have included available evidence of the unexpected behavior (including error messages, screenshots, and/or plots) as well as a description of what I expected instead.
  • If I have a solution in mind, I have provided an explanation and/or pseudocode and/or task list.
@gitosaurus gitosaurus added the bug Something isn't working label Jan 27, 2025
@hombit
Copy link
Contributor

hombit commented Jan 28, 2025

I think it is fixed in nested-pandas v0.3.4, see lincc-frameworks/nested-pandas#190.

But now I have a different issue

import lsdb
ztf_db = lsdb.read_hats(
    '/data3/epyc/data3/hats/catalogs/ztf_dr22/ztf_lc',
    columns=['objectid', 'objra', 'objdec', 'hmjd', 'mag', 'magerr'],
    margin_cache='/data3/epyc/data3/hats/catalogs/ztf_dr22/ztf_lc_10arcs')
nest_lc = ['hmjd', 'mag', 'magerr']
ztf_db = ztf_db.nest_lists(
    base_columns=[c for c in ztf_db.columns if c not in nest_lc],
    list_columns=nest_lc,
    name='lc')

gaia3 = lsdb.read_hats(
    'https://data.lsdb.io/hats/gaia_dr3/gaia',
    margin_cache='https://data.lsdb.io/hats/gaia_dr3/gaia_10arcs')

cm = gaia3.partitions[0].crossmatch(ztf_db)
cmc = cm.compute()
File ~/.virtualenvs/default/lib/python3.10/site-packages/dask/dataframe/utils.py:363, in check_meta(x, meta, funcname, numeric_equal)
    357         return x
    358     errmsg = "Partition type: `{}`\n{}".format(
    359         typename(type(meta)),
    360         asciitable(["", "dtype"], [("Found", x.dtype), ("Expected", meta.dtype)]),
    361     )
--> 363 raise ValueError(
    364     "Metadata mismatch found%s.\n\n"
    365     "%s" % ((" in `%s`" % funcname if funcname else ""), errmsg)
    366 )

ValueError: Metadata mismatch found in `from_delayed`.

Partition type: `nested_pandas.nestedframe.core.NestedFrame`
+-------------+-------------------------------------------------------+----------------------------------------------------------------------------------------------------+
| Column      | Found                                                 | Expected                                                                                           |
+-------------+-------------------------------------------------------+----------------------------------------------------------------------------------------------------+
| 'lc_ztf_lc' | nested<hmjd: [double], mag: [float], magerr: [float]> | nested<hmjd: [list<element: double>], mag: [list<element: float>], magerr: [list<element: float>]> |
+-------------+-------------------------------------------------------+----------------------------------------------------------------------------------------------------+

This expected dtype is wrong

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: No status
Development

No branches or pull requests

2 participants