Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exception More than Int.MaxValue elements #147

Open
YuelongCai opened this issue Oct 25, 2023 · 3 comments
Open

Exception More than Int.MaxValue elements #147

YuelongCai opened this issue Oct 25, 2023 · 3 comments

Comments

@YuelongCai
Copy link

Caused by: java.lang.IllegalArgumentException: More than Int.MaxValue elements.
at scala.collection.immutable.NumericRange$.check$1(NumericRange.scala:318)
at scala.collection.immutable.NumericRange$.count(NumericRange.scala:328)
at scala.collection.immutable.NumericRange.numRangeElements$lzycompute(NumericRange.scala:53)
at scala.collection.immutable.NumericRange.numRangeElements(NumericRange.scala:52)
at scala.collection.immutable.NumericRange.length(NumericRange.scala:55)
at org.apache.spark.rdd.ParallelCollectionRDD$.slice(ParallelCollectionRDD.scala:143)

@YuelongCai
Copy link
Author

val countQuery = s"SELECT count(*) FROM $tableNameOrSubquery $whereClause"
If the countQuery return result more than Int.MaxValue, then use this number to construct a RDD[Row] will cause exception.
Refer to spark source code https://github.com/apache/spark/blob/a073bf38c7d8802e2ab12c54299e1541a48a394e/core/src/main/scala/org/apache/spark/rdd/ParallelCollectionRDD.scala#L143
when call nr.length which is expected to return Int, but exceed int.MaxValue.

@YuelongCai
Copy link
Author

The possible solution is

if the total count is too big, like N
to split it into multiple rdds, each should be less than int.MaxValue and union all to generate rdd with size N

for example,
sc.parallelize(1L to 1000000000L, 200).union(sc.parallelize(1L to 1000000000L, 200)).union(sc.parallelize(1L to 1000000000L, 200)).union(sc.parallelize(1L to 1000000000L, 200)) will avoid this issue

@bsharifi
Copy link
Collaborator

Thank you @YuelongCai for reporting this limitation and suggestions on how to work around it. While we will consider addressing this in a future release, please feel free to submit a PR and we will be happy to review and merge it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants