Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parquet reader error: schemaElement.repetition_type == thrift::FieldRepetitionType::REPEATED #7777

Closed
qqibrow opened this issue Nov 28, 2023 · 3 comments
Labels
bug Something isn't working parquet triage Newly created issue that needs attention.

Comments

@qqibrow
Copy link
Collaborator

qqibrow commented Nov 28, 2023

Bug description

E1128 23:20:04.616051 1266726 Exceptions.h:69] Line: ../../velox/dwio/parquet/reader/ParquetReader.cpp:283, Function:getParquetColumnInfo, Expression: schemaElement.repetition_type == thrift::FieldRepetitionType::REPEATED (OPTIONAL vs. REPEATED), Source: RUNTIME, ErrorCode: INVALID_STATE
terminate called after throwing an instance of 'facebook::velox::VeloxRuntimeError'
  what():  Exception: VeloxRuntimeError
Error Source: RUNTIME
Error Code: INVALID_STATE
Reason: (OPTIONAL vs. REPEATED)
Retriable: False
Expression: schemaElement.repetition_type == thrift::FieldRepetitionType::REPEATED
Function: getParquetColumnInfo
File: ../../velox/dwio/parquet/reader/ParquetReader.cpp
Line: 283
Stack trace:
# 0  std::shared_ptr<facebook::velox::VeloxException::State const> facebook::velox::VeloxException::State::make<facebook::velox::VeloxException::make(char const*, unsigned long, char const*, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, bool, facebook::velox::VeloxException::Type, std::basic_string_view<char, std::char_traits<char> >)::{lambda(auto:1&)#1}>(facebook::velox::VeloxException::Type, facebook::velox::VeloxException::make(char const*, unsigned long, char const*, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, bool, facebook::velox::VeloxException::Type, std::basic_string_view<char, std::char_traits<char> >)::{lambda(auto:1&)#1})
# 1  facebook::velox::VeloxException::VeloxException(char const*, unsigned long, char const*, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, bool, facebook::velox::VeloxException::Type, std::basic_string_view<char, std::char_traits<char> >)
# 2  facebook::velox::VeloxRuntimeError::VeloxRuntimeError(char const*, unsigned long, char const*, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, bool, std::basic_string_view<char, std::char_traits<char> >)
# 3  void facebook::velox::detail::veloxCheckFail<facebook::velox::VeloxRuntimeError, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&>(facebook::velox::detail::VeloxCheckFailArgs const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
# 4  facebook::velox::parquet::ReaderBase::getParquetColumnInfo(unsigned int, unsigned int, unsigned int, unsigned int&, unsigned int&) const
# 5  facebook::velox::parquet::ReaderBase::getParquetColumnInfo(unsigned int, unsigned int, unsigned int, unsigned int&, unsigned int&) const
# 6  facebook::velox::parquet::ReaderBase::initializeSchema()
# 7  facebook::velox::parquet::ReaderBase::ReaderBase(std::unique_ptr<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput> >, facebook::velox::dwio::common::ReaderOptions const&)
# 8  void __gnu_cxx::new_allocator<facebook::velox::parquet::ReaderBase>::construct<facebook::velox::parquet::ReaderBase, std::unique_ptr<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput> >, facebook::velox::dwio::common::ReaderOptions const&>(facebook::velox::parquet::ReaderBase*, std::unique_ptr<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput> >&&, facebook::velox::dwio::common::ReaderOptions const&)
# 9  void std::allocator_traits<std::allocator<facebook::velox::parquet::ReaderBase> >::construct<facebook::velox::parquet::ReaderBase, std::unique_ptr<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput> >, facebook::velox::dwio::common::ReaderOptions const&>(std::allocator<facebook::velox::parquet::ReaderBase>&, facebook::velox::parquet::ReaderBase*, std::unique_ptr<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput> >&&, facebook::velox::dwio::common::ReaderOptions const&)
# 10 std::_Sp_counted_ptr_inplace<facebook::velox::parquet::ReaderBase, std::allocator<facebook::velox::parquet::ReaderBase>, (__gnu_cxx::_Lock_policy)2>::_Sp_counted_ptr_inplace<std::unique_ptr<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput> >, facebook::velox::dwio::common::ReaderOptions const&>(std::allocator<facebook::velox::parquet::ReaderBase>, std::unique_ptr<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput> >&&, facebook::velox::dwio::common::ReaderOptions const&)
# 11 std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count<facebook::velox::parquet::ReaderBase, std::allocator<facebook::velox::parquet::ReaderBase>, std::unique_ptr<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput> >, facebook::velox::dwio::common::ReaderOptions const&>(facebook::velox::parquet::ReaderBase*&, std::_Sp_alloc_shared_tag<std::allocator<facebook::velox::parquet::ReaderBase> >, std::unique_ptr<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput> >&&, facebook::velox::dwio::common::ReaderOptions const&)
# 12 std::__shared_ptr<facebook::velox::parquet::ReaderBase, (__gnu_cxx::_Lock_policy)2>::__shared_ptr<std::allocator<facebook::velox::parquet::ReaderBase>, std::unique_ptr<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput> >, facebook::velox::dwio::common::ReaderOptions const&>(std::_Sp_alloc_shared_tag<std::allocator<facebook::velox::parquet::ReaderBase> >, std::unique_ptr<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput> >&&, facebook::velox::dwio::common::ReaderOptions const&)
# 13 std::shared_ptr<facebook::velox::parquet::ReaderBase>::shared_ptr<std::allocator<facebook::velox::parquet::ReaderBase>, std::unique_ptr<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput> >, facebook::velox::dwio::common::ReaderOptions const&>(std::_Sp_alloc_shared_tag<std::allocator<facebook::velox::parquet::ReaderBase> >, std::unique_ptr<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput> >&&, facebook::velox::dwio::common::ReaderOptions const&)
# 14 std::shared_ptr<facebook::velox::parquet::ReaderBase> std::allocate_shared<facebook::velox::parquet::ReaderBase, std::allocator<facebook::velox::parquet::ReaderBase>, std::unique_ptr<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput> >, facebook::velox::dwio::common::ReaderOptions const&>(std::allocator<facebook::velox::parquet::ReaderBase> const&, std::unique_ptr<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput> >&&, facebook::velox::dwio::common::ReaderOptions const&)
# 15 std::shared_ptr<facebook::velox::parquet::ReaderBase> std::make_shared<facebook::velox::parquet::ReaderBase, std::unique_ptr<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput> >, facebook::velox::dwio::common::ReaderOptions const&>(std::unique_ptr<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput> >&&, facebook::velox::dwio::common::ReaderOptions const&)
# 16 facebook::velox::parquet::ParquetReader::ParquetReader(std::unique_ptr<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput> >, facebook::velox::dwio::common::ReaderOptions const&)
# 17 main
# 18 __libc_start_main
# 19 _start

*** Aborted at 1701213604 (Unix time, try 'date -d @1701213604') ***
*** Signal 6 (SIGABRT) (0x3e4700135426) received by PID 1266726 (pthread TID 0x7f5ae515cac0) (linux TID 1266726) (maybe from PID 1266726, UID 15943) (code: -6), stack trace: ***
    @ 0000000001e310ff folly::symbolizer::(anonymous namespace)::innerSignalHandler(int, siginfo_t*, void*)
                       /home/lniu/code/velox_new/velox/folly/_build/../folly/experimental/symbolizer/SignalHandler.cpp:449
    @ 0000000001e311e0 folly::symbolizer::(anonymous namespace)::signalHandler(int, siginfo_t*, void*)
                       /home/lniu/code/velox_new/velox/folly/_build/../folly/experimental/symbolizer/SignalHandler.cpp:470
    @ 0000000000000000 (unknown)
    @ 000000000004300b gsignal
    @ 0000000000022858 abort
    @ 0000000000000000 (unknown)
    @ 0000000000000000 (unknown)
    @ 00000000000aa3f6 std::terminate()
    @ 00000000000aa6a8 __cxa_throw
    @ 0000000001ccf6e0 __cxa_throw
                       /home/lniu/code/velox_new/velox/folly/_build/../folly/experimental/exception_tracer/ExceptionTracerLib.cpp:108
    @ 0000000001c40d5a void facebook::velox::detail::veloxCheckFail<facebook::velox::VeloxRuntimeError, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&>(facebook::velox::detail::VeloxCheckFailArgs const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
                       /home/lniu/code/velox_new/velox/_build/debug/../.././velox/common/base/Exceptions.h:85
                       -> /home/lniu/code/velox_new/velox/_build/debug/../../velox/common/base/Exceptions.cpp
    @ 0000000000fd85d4 facebook::velox::parquet::ReaderBase::getParquetColumnInfo(unsigned int, unsigned int, unsigned int, unsigned int&, unsigned int&) const
                       /home/lniu/code/velox_new/velox/_build/debug/../../velox/dwio/parquet/reader/ParquetReader.cpp:283
    @ 0000000000fd8270 facebook::velox::parquet::ReaderBase::getParquetColumnInfo(unsigned int, unsigned int, unsigned int, unsigned int&, unsigned int&) const
                       /home/lniu/code/velox_new/velox/_build/debug/../../velox/dwio/parquet/reader/ParquetReader.cpp:258
    @ 0000000000fd7d80 facebook::velox::parquet::ReaderBase::initializeSchema()
                       /home/lniu/code/velox_new/velox/_build/debug/../../velox/dwio/parquet/reader/ParquetReader.cpp:215
    @ 0000000000fd7246 facebook::velox::parquet::ReaderBase::ReaderBase(std::unique_ptr<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput> >, facebook::velox::dwio::common::ReaderOptions const&)
                       /home/lniu/code/velox_new/velox/_build/debug/../../velox/dwio/parquet/reader/ParquetReader.cpp:138
    @ 0000000000ffc61a void __gnu_cxx::new_allocator<facebook::velox::parquet::ReaderBase>::construct<facebook::velox::parquet::ReaderBase, std::unique_ptr<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput> >, facebook::velox::dwio::common::ReaderOptions const&>(facebook::velox::parquet::ReaderBase*, std::unique_ptr<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput> >&&, facebook::velox::dwio::common::ReaderOptions const&)
                       /usr/include/c++/9/ext/new_allocator.h:146
                       -> /home/lniu/code/velox_new/velox/_build/debug/../../velox/dwio/parquet/reader/ParquetReader.cpp
    @ 0000000000ff9f43 void std::allocator_traits<std::allocator<facebook::velox::parquet::ReaderBase> >::construct<facebook::velox::parquet::ReaderBase, std::unique_ptr<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput> >, facebook::velox::dwio::common::ReaderOptions const&>(std::allocator<facebook::velox::parquet::ReaderBase>&, facebook::velox::parquet::ReaderBase*, std::unique_ptr<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput> >&&, facebook::velox::dwio::common::ReaderOptions const&)
                       /usr/include/c++/9/bits/alloc_traits.h:483
                       -> /home/lniu/code/velox_new/velox/_build/debug/../../velox/dwio/parquet/reader/ParquetReader.cpp
    @ 0000000000ff7dca std::_Sp_counted_ptr_inplace<facebook::velox::parquet::ReaderBase, std::allocator<facebook::velox::parquet::ReaderBase>, (__gnu_cxx::_Lock_policy)2>::_Sp_counted_ptr_inplace<std::unique_ptr<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput> >, facebook::velox::dwio::common::ReaderOptions const&>(std::allocator<facebook::velox::parquet::ReaderBase>, std::unique_ptr<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput> >&&, facebook::velox::dwio::common::ReaderOptions const&)
                       /usr/include/c++/9/bits/shared_ptr_base.h:548
                       -> /home/lniu/code/velox_new/velox/_build/debug/../../velox/dwio/parquet/reader/ParquetReader.cpp
    @ 0000000000ff4a4e std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count<facebook::velox::parquet::ReaderBase, std::allocator<facebook::velox::parquet::ReaderBase>, std::unique_ptr<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput> >, facebook::velox::dwio::common::ReaderOptions const&>(facebook::velox::parquet::ReaderBase*&, std::_Sp_alloc_shared_tag<std::allocator<facebook::velox::parquet::ReaderBase> >, std::unique_ptr<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput> >&&, facebook::velox::dwio::common::ReaderOptions const&)
                       /usr/include/c++/9/bits/shared_ptr_base.h:679
                       -> /home/lniu/code/velox_new/velox/_build/debug/../../velox/dwio/parquet/reader/ParquetReader.cpp
    @ 0000000000ff191b std::__shared_ptr<facebook::velox::parquet::ReaderBase, (__gnu_cxx::_Lock_policy)2>::__shared_ptr<std::allocator<facebook::velox::parquet::ReaderBase>, std::unique_ptr<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput> >, facebook::velox::dwio::common::ReaderOptions const&>(std::_Sp_alloc_shared_tag<std::allocator<facebook::velox::parquet::ReaderBase> >, std::unique_ptr<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput> >&&, facebook::velox::dwio::common::ReaderOptions const&)
                       /usr/include/c++/9/bits/shared_ptr_base.h:1344
                       -> /home/lniu/code/velox_new/velox/_build/debug/../../velox/dwio/parquet/reader/ParquetReader.cpp
    @ 0000000000fee84e std::shared_ptr<facebook::velox::parquet::ReaderBase>::shared_ptr<std::allocator<facebook::velox::parquet::ReaderBase>, std::unique_ptr<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput> >, facebook::velox::dwio::common::ReaderOptions const&>(std::_Sp_alloc_shared_tag<std::allocator<facebook::velox::parquet::ReaderBase> >, std::unique_ptr<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput> >&&, facebook::velox::dwio::common::ReaderOptions const&)
                       /usr/include/c++/9/bits/shared_ptr.h:359
                       -> /home/lniu/code/velox_new/velox/_build/debug/../../velox/dwio/parquet/reader/ParquetReader.cpp
    @ 0000000000feab56 std::shared_ptr<facebook::velox::parquet::ReaderBase> std::allocate_shared<facebook::velox::parquet::ReaderBase, std::allocator<facebook::velox::parquet::ReaderBase>, std::unique_ptr<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput> >, facebook::velox::dwio::common::ReaderOptions const&>(std::allocator<facebook::velox::parquet::ReaderBase> const&, std::unique_ptr<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput> >&&, facebook::velox::dwio::common::ReaderOptions const&)
                       /usr/include/c++/9/bits/shared_ptr.h:702
                       -> /home/lniu/code/velox_new/velox/_build/debug/../../velox/dwio/parquet/reader/ParquetReader.cpp
    @ 0000000000fe5cc9 std::shared_ptr<facebook::velox::parquet::ReaderBase> std::make_shared<facebook::velox::parquet::ReaderBase, std::unique_ptr<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput> >, facebook::velox::dwio::common::ReaderOptions const&>(std::unique_ptr<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput> >&&, facebook::velox::dwio::common::ReaderOptions const&)
                       /usr/include/c++/9/bits/shared_ptr.h:718
                       -> /home/lniu/code/velox_new/velox/_build/debug/../../velox/dwio/parquet/reader/ParquetReader.cpp
    @ 0000000000fdbe07 facebook::velox::parquet::ParquetReader::ParquetReader(std::unique_ptr<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput> >, facebook::velox::dwio::common::ReaderOptions const&)
                       /home/lniu/code/velox_new/velox/_build/debug/../../velox/dwio/parquet/reader/ParquetReader.cpp:762
    @ 0000000000fc7182 main
                       /home/lniu/code/velox_new/velox/_build/debug/../../velox/dwio/parquet/tests/reader/ParquetReaderExample.cpp:51
    @ 0000000000024082 __libc_start_main
    @ 0000000000fc4b7d _start
Aborted

System information

Velox System Info v0.0.2
Commit: 1e186e548833750cdee4b95d829711ddad78aba1
CMake Version: 3.16.3
System: Linux-5.4.0-1063-aws
Arch: x86_64
C++ Compiler: /usr/bin/c++
C++ Compiler Version: 9.4.0
C Compiler: /usr/bin/cc
C Compiler Version: 9.4.0
CMake Prefix Path: /usr/local;/usr;/;/usr;/usr/local;/usr/X11R6;/usr/pkg;/opt

Relevant logs

attached a file to reproduce the issue
@qqibrow qqibrow added bug Something isn't working triage Newly created issue that needs attention. labels Nov 28, 2023
@qqibrow
Copy link
Collaborator Author

qqibrow commented Nov 28, 2023

https://www.dropbox.com/scl/fi/qb2y77xzb95b9mecr77p7/native_parquet_reader_test3531171283231893335parquet?rlkey=hq9bz5o8jb72uy1q0wsavynan&dl=0

lniu@devrestricted-lniu:~/velox_parquet_test_triage/fail_parquet_files/testMap$ parquet-tools show native_parquet_reader_test3531171283231893335parquet
+----------------------------------------------------------------------------------+
| test                                                                             |
|----------------------------------------------------------------------------------|
| [('0', 0), ('1', 1), ('2', 2), ('3', 3), ('4', 4), ('5', 5), ('6', 6), ('7', 7)] |
+----------------------------------------------------------------------------------+

@hitarth
Copy link
Collaborator

hitarth commented Nov 29, 2023

Looking into this

facebook-github-bot pushed a commit that referenced this issue Dec 22, 2023
Summary:
This PR fixes issue #7777

  In Parquet, the map type is annotated as MAP converted type nomally.
    It should contain a repeated group annotated with MAP_KEY_VALUE,
    which in turn contains two children key and value:

    <map-repetition> group <name> (MAP) {
      repeated group key_value (MAP_KEY_VALUE) {
        required <key-type> key;
        <value-repetition> <value-type> value;
      }
    }

But sometimes a group annotated with MAP_KEY_VALUE was incorrectly
used in place of MAP.

    <map-repetition> group my_map (MAP_KEY_VALUE) {
      repeated group map {
        required binary key (UTF8);
        optional int32 value;
      }
    }

For backward-compatibility, a MAP_KEY_VALUE that is not contained by
MAP should be treated as MAP. This commit makes the following changes:

1. Adds a parentSchemaIdx to Parquet reader's
    getParquetColumnInfo() function to pass the parent schema.

2. Differenciate the situations where a MAP_KEY_VALUE's parent is or
    is not a MAP. If it is, then it should be the repeated group that
    contains the key and value. If it is not, it should be treated the same
    as MAP.

For more information please check https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#maps

Pull Request resolved: #7966

Reviewed By: pedroerp

Differential Revision: D52263936

Pulled By: Yuhta

fbshipit-source-id: 486b6167c76e613c604b309c02b785834ab050ac
@yingsu00
Copy link
Collaborator

Fixed by #7966

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working parquet triage Newly created issue that needs attention.
Projects
None yet
Development

No branches or pull requests

3 participants