Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some skysim5000 hdf5 files have an inconsistent schema which prevent their conversion to parquet #426

Open
boutigny opened this issue Jul 2, 2021 · 2 comments
Assignees

Comments

@boutigny
Copy link

boutigny commented Jul 2, 2021

A small fraction of the skysim5000 healpixels (52 out of 1568) in hdf5 format have an inconsistent schema for some native quantities which prevent their conversion to parquet format.
The following fields have been identified as possibly problematic:
lightcone_replication: int64
lightcone_rotation: int64
baseDC2/source_halo_mvir
The inconsistency is between the files corresponding to the 3 redshift intervals.

While it would be better to fix this problem upstream, it is also possible to hack the conversion script as in https://github.com/LSSTDESC/DC2-production/tree/u/boutigny/fix_schema_parquet_skysim

@evevkovacs
Copy link
Contributor

@patricialarsen @yymao The fact that this inconsistency is occurring in just a few healpixels is mysterious. Can you provide a list of the healpixels which have a problem so that we can investigate further? The first 2 variables are copied from the input files used for the production pipeline, so it is possible that those input files have an issue. The last variable is copied from UniverseMachine inputs. What exactly is the problem with baseDC2/source_halo_mvir? None of the above variables are actually produced by the production code and have been included in the catalog for completeness and provenance. Once we have tracked down the cause, it would be possible to regenerate the subset of affected healpixels.

@boutigny
Copy link
Author

boutigny commented Jul 2, 2021

@evevkovacs Here is the list of problematic healpixels:
skysim5000_v1.1.1_healpix6093.parquet
skysim5000_v1.1.1_healpix6087.parquet
skysim5000_v1.1.1_healpix6107.parquet
skysim5000_v1.1.1_healpix6220.parquet
skysim5000_v1.1.1_healpix6087.parquet
skysim5000_v1.1.1_healpix6093.parquet
skysim5000_v1.1.1_healpix6107.parquet
skysim5000_v1.1.1_healpix6220.parquet
skysim5000_v1.1.1_healpix6465.parquet
skysim5000_v1.1.1_healpix6483.parquet
skysim5000_v1.1.1_healpix6747.parquet
skysim5000_v1.1.1_healpix6848.parquet
skysim5000_v1.1.1_healpix6870.parquet
skysim5000_v1.1.1_healpix7491.parquet
skysim5000_v1.1.1_healpix7504.parquet
skysim5000_v1.1.1_healpix7641.parquet
skysim5000_v1.1.1_healpix7755.parquet
skysim5000_v1.1.1_healpix7756.parquet
skysim5000_v1.1.1_healpix7895.parquet
skysim5000_v1.1.1_healpix7897.parquet
skysim5000_v1.1.1_healpix8287.parquet
skysim5000_v1.1.1_healpix9813.parquet
skysim5000_v1.1.1_healpix9036.parquet
skysim5000_v1.1.1_healpix9284.parquet
skysim5000_v1.1.1_healpix9809.parquet
skysim5000_v1.1.1_healpix10176.parquet
skysim5000_v1.1.1_healpix10675.parquet
skysim5000_v1.1.1_healpix11296.parquet
skysim5000_v1.1.1_healpix11297.parquet
skysim5000_v1.1.1_healpix11377.parquet
skysim5000_v1.1.1_healpix11456.parquet
skysim5000_v1.1.1_healpix11457.parquet
skysim5000_v1.1.1_healpix11458.parquet
skysim5000_v1.1.1_healpix11459.parquet
skysim5000_v1.1.1_healpix8395.parquet
skysim5000_v1.1.1_healpix9161.parquet
skysim5000_v1.1.1_healpix9164.parquet
skysim5000_v1.1.1_healpix9291.parquet
skysim5000_v1.1.1_healpix9553.parquet
skysim5000_v1.1.1_healpix9679.parquet
skysim5000_v1.1.1_healpix9937.parquet
skysim5000_v1.1.1_healpix10665.parquet
skysim5000_v1.1.1_healpix8032.parquet
skysim5000_v1.1.1_healpix8288.parquet
skysim5000_v1.1.1_healpix9551.parquet
skysim5000_v1.1.1_healpix9935.parquet
skysim5000_v1.1.1_healpix10076.parquet
skysim5000_v1.1.1_healpix10674.parquet
skysim5000_v1.1.1_healpix10884.parquet
skysim5000_v1.1.1_healpix10904.parquet
skysim5000_v1.1.1_healpix11374.parquet
skysim5000_v1.1.1_healpix11376.parquet
skysim5000_v1.1.1_healpix8667.parquet
skysim5000_v1.1.1_healpix9792.parquet
skysim5000_v1.1.1_healpix11382.parquet
skysim5000_v1.1.1_healpix12179.parquet
skysim5000_v1.1.1_healpix8543.parquet

Regarding baseDC2/source_halo_mvir, this is also a dtype mismatch in the files corresponding to the 3 redshift intervals

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants