Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VL] stop == StopReason::kBlock || stop == StopReason::kAtEnd || stop == StopReason::kAlreadyTerminated || stop == StopReason::kTerminate #6908

Closed
FelixYBW opened this issue Aug 18, 2024 · 5 comments · Fixed by #6934
Labels
bug Something isn't working triage

Comments

@FelixYBW
Copy link
Contributor

Backend

VL (Velox)

Bug description

The error raised again, it's usually caused by background thread exit but there is no coredump this time.

File: /home/binweiyang/gluten/ep/build-velox/build/velox_ep/velox/exec/Driver.cpp
Line: 370
Stack trace:
# 0  _ZN8facebook5velox7process10StackTraceC1Ei
# 1  _ZN8facebook5velox14VeloxExceptionC1EPKcmS3_St17basic_string_viewIcSt11char_traitsIcEES7_S7_S7_bNS1_4TypeES7_
# 2  _ZN8facebook5velox6detail14veloxCheckFailINS0_17VeloxRuntimeErrorENS1_22CompileTimeEmptyStringEEEvRKNS1_18VeloxCheckFailArgsET0_
# 3  _ZN8facebook5velox4exec6Driver4nextERSt10shared_ptrINS1_13BlockingStateEE
# 4  _ZN8facebook5velox4exec4Task4nextEPN5folly10SemiFutureINS3_4UnitEEE
# 5  _ZN6gluten24WholeStageResultIterator4nextEv
# 6  Java_org_apache_gluten_vectorized_ColumnarBatchOutIterator_nativeHasNext
# 7  0x00007f3f2c3bf427

	at org.apache.gluten.vectorized.ColumnarBatchOutIterator.nativeHasNext(Native Method)
	at org.apache.gluten.vectorized.ColumnarBatchOutIterator.hasNextInternal(ColumnarBatchOutIterator.java:61)
	at org.apache.gluten.vectorized.GeneralOutIterator.hasNext(GeneralOutIterator.java:37)
	... 31 more

	at org.apache.gluten.vectorized.ColumnarBatchOutIterator.nativeHasNext(Native Method)
	at org.apache.gluten.vectorized.ColumnarBatchOutIterator.hasNextInternal(ColumnarBatchOutIterator.java:61)
	at org.apache.gluten.vectorized.GeneralOutIterator.hasNext(GeneralOutIterator.java:37)
	... 18 more

@zhztheplayer

Spark version

None

Spark configurations

No response

System information

No response

Relevant logs

No response

@FelixYBW FelixYBW added bug Something isn't working triage labels Aug 18, 2024
@FelixYBW
Copy link
Contributor Author

FelixYBW commented Aug 18, 2024

OK, bad news, it's caused by io threads in scan. The issue is gone once we set iothread=0. Or set read-ahead rowgroup to 0

@FelixYBW
Copy link
Contributor Author

Interesting, setting iothreads hurts performance.

@zhztheplayer
Copy link
Member

Is newest Gluten used? Had seen similar issue in GHA CI weeks ago which disappeared for a while then. Was not able to reproduce locally.

@FelixYBW
Copy link
Contributor Author

Interesting, setting iothreads hurts performance.

It's a cluster issue.

@FelixYBW
Copy link
Contributor Author

confirmed, #6934 fixed the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage
Projects
None yet
2 participants