Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CH] Exception with aggerate function first/last #3706

Closed
exmy opened this issue Nov 14, 2023 · 2 comments
Closed

[CH] Exception with aggerate function first/last #3706

exmy opened this issue Nov 14, 2023 · 2 comments
Labels
bug Something isn't working triage

Comments

@exmy
Copy link
Contributor

exmy commented Nov 14, 2023

Backend

CH (ClickHouse)

Bug description

create table test_data(key int, value string) stored as orc

When enable replaceSortAggWithHashAgg, select first(value) from test_data group by key throws exception:

Caused by: io.glutenproject.exception.GlutenException: io.glutenproject.exception.GlutenException: Cannot read all data. Bytes read: 3. Bytes expected: 4.
0. Poco::Exception::Exception(String const&, int) @ 0x00000000150bfd99 
1. DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x0000000006d9b339 
2. DB::Exception::Exception<unsigned long&, String>(int, FormatStringHelperImpl<std::type_identity<unsigned long&>::type, std::type_identity<String>::type>, unsigned long&, String&&) @ 0x0000000006e0830a 
3. DB::ReadBuffer::readStrict(char*, unsigned long) @ 0x0000000006e0823a
4. DB::SingleValueDataString::read(DB::ReadBuffer&, DB::ISerialization const&, DB::Arena*) @ 0x0000000007fb6d43
5. DB::AggregateFunctionNullBase<true, true, DB::AggregateFunctionNullUnary<true, true>>::deserialize(char*, DB::ReadBuffer&, std::optional<unsigned long>, DB::Arena*) const @ 0x0000000009bc5481
6. DB::SerializationAggregateFunction::deserializeBinaryBulk(DB::IColumn&, DB::ReadBuffer&, unsigned long, double) const @ 0x000000001012aae5
7. DB::ISerialization::deserializeBinaryBulkWithMultipleStreams(COW<DB::IColumn>::immutable_ptr<DB::IColumn>&, unsigned long, DB::ISerialization::DeserializeBinaryBulkSettings&, std::shared_ptr<DB::ISerialization::DeserializeBinaryBulkState>&, std::unordered_map<String, COW<DB::IColumn>::immutable_ptr<DB::IColumn>, std::hash<String>, std::equal_to<String>, std::allocator<std::pair<String const, COW<DB::IColumn>::immutable_ptr<DB::IColumn>>>>*) const @ 0x0000000010126810
8. DB::NativeReader::readData(DB::ISerialization const&, COW<DB::IColumn>::immutable_ptr<DB::IColumn>&, DB::ReadBuffer&, unsigned long, double) @ 0x00000000119d25b1
9. DB::NativeReader::read() @ 0x00000000119d32ad
10. local_engine::ShuffleReader::read() @ 0x00000000071c5562
11. Java_io_glutenproject_vectorized_CHStreamReader_nativeNext @ 0x0000000006c88d97
: While executing SourceFromJavaIter
0. Poco::Exception::Exception(String const&, int) @ 0x00000000150bfd99
1. DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x0000000006d9b339
2. DB::Exception::createRuntime(int, String&) @ 0x0000000006c9f32c
3. unsigned char local_engine::safeCallBooleanMethod<>(JNIEnv_*, _jobject*, _jmethodID*) @ 0x0000000006ca056d
4. local_engine::SourceFromJavaIter::generate() @ 0x00000000070ea757
5. DB::ISource::tryGenerate() @ 0x00000000119f18d5
6. DB::ISource::work() @ 0x00000000119f1522
7. DB::ExecutionThreadContext::executeTask() @ 0x0000000011a07313
8. DB::PipelineExecutor::executeStepImpl(unsigned long, std::atomic<bool>*) @ 0x00000000119ff0b0
9. DB::PipelineExecutor::executeStep(std::atomic<bool>*) @ 0x00000000119fea09
10. DB::PullingPipelineExecutor::pull(DB::Chunk&) @ 0x0000000011a0ba48
11. DB::PullingPipelineExecutor::pull(DB::Block&) @ 0x0000000011a0bc30
12. local_engine::LocalExecutor::hasNext() @ 0x00000000070a46b0
13. Java_io_glutenproject_vectorized_BatchIterator_nativeHasNext @ 0x0000000006c84537

Spark version

None

Spark configurations

No response

System information

No response

Relevant logs

No response

@lgbo-ustc
Copy link
Contributor

lgbo-ustc commented Nov 15, 2023

In aggregate operator, we use the function name frist_value_respect_nulls to construct a aggregate function which bases on any, and the intermediate result type is AggregateFunction(any, nullable(string)). That is too bad, becasue there are several aggregate functions have the same intermediate result type AggregateFunction(any, nullable(string)), but their data structure may be different. For frist_value_respect_nulls, the intemediate result doesn't containt null flags.

ShuffleReader builds a deserializer from the type info AggregateFunction(any, nullable(string)). CH will construct a aggregate function any combined with AggregateFunctionNullBase to deserialize the data. The null flag takes one byte, that is why 3 bytes remained for any.

@exmy exmy changed the title [CH] Exception with aggerate function first query [CH] Exception with aggerate function first/last Nov 15, 2023
@lgbo-ustc
Copy link
Contributor

I think this is solved by ClickHouse/ClickHouse#57189

@exmy exmy closed this as completed Dec 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage
Projects
None yet
Development

No branches or pull requests

2 participants