-
Notifications
You must be signed in to change notification settings - Fork 447
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[VL] Add helper function ColumnarBatches.toString and InternalRow toString #6458
Conversation
Thanks for opening a pull request! Could you open an issue for this pull request on Github Issues? https://github.com/apache/incubator-gluten/issues Then could you also rename commit message and pull request title in the following format?
See also: |
What's the purpose of the API? Is it for testing use? Also please fill in the PR description. Thanks. |
Hi @jinchengchenghh, If it's for testing purpose from Java side, my suggestion is not to propagate the call down to C++ code. We can add a Java API |
ArrowWritableVector does not have print function in java side. |
d8b1312
to
0163834
Compare
If we can get the rows, there must be a way to stringify them since Spark requires for this to implement |
Yes, it uses ToPrettyString to show the result in dataframe, we can only use some of code, I have implemented our version. |
Run Gluten Clickhouse CI |
5 similar comments
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
Can you help review this PR again? Thanks! @zhztheplayer |
Can you help review? Thanks! @zhztheplayer |
@jinchengchenghh Will take a look as soon as possible. Thanks. |
Can I take your some time? @zhztheplayer |
Reviewing now. Thank you for noting. |
Hi @jinchengchenghh , I have been thinking if the code can be simplified to ease maintenance. Would you please have a check about the following example code to pretty print a row iterator with much less code than test("UnsafeRow to string 2") {
val util = ToStringUtil(Option.apply(SQLConf.get.sessionLocalTimeZone))
val row1 =
InternalRow.apply(UTF8String.fromString("hello"), UTF8String.fromString("world"), 123)
val rowWithNull = InternalRow.apply(null, null, 4)
val row2 = UnsafeProjection
.create(Array[DataType](StringType, StringType, IntegerType))
.apply(rowWithNull)
val it = List(row1, row2, row1, row1, row2).toIterator
val struct = new StructType().add("a", StringType).add("b", StringType).add("c", IntegerType)
val encoder = RowEncoder(struct).resolveAndBind() // `ExpressionEncoder(struct).resolveAndBind()` for newer version of Spark
val deserializer = encoder.createDeserializer()
it.map(deserializer).foreach(r => println(r.mkString("|")))
} |
For Iterator[UnsafeRow], it is OK. |
Then we should fix this at first... Which sounds like a bug. I'll take another look on the issue. |
a4f4560
to
c89c6fa
Compare
Run Gluten Clickhouse CI |
1 similar comment
Run Gluten Clickhouse CI |
|
@zhztheplayer Thanks for your suggestion, It really make code clean and easy to maintain. Can you help to review again? |
private static ColumnarBatch newArrowBatch(int numRows) { | ||
String schema = "a boolean, b int"; | ||
final ArrowWritableColumnVector[] columns = | ||
ArrowWritableColumnVector.allocateColumns(numRows, StructType.fromDDL(schema)); | ||
ArrowWritableColumnVector col1 = columns[0]; | ||
ArrowWritableColumnVector col2 = columns[1]; | ||
for (int j = 0; j < numRows; j++) { | ||
col1.putBoolean(j, j % 2 == 0); | ||
col2.putInt(j, 15 - j); | ||
} | ||
col2.putNull(numRows - 1); | ||
for (ArrowWritableColumnVector col : columns) { | ||
col.setValueCount(numRows); | ||
} | ||
final ColumnarBatch batch = new ColumnarBatch(columns); | ||
batch.setNumRows(numRows); | ||
return batch; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove this method? Could fill in the vectors in test case's code.
E.g.,
final int numRows = 100;
final ColumnarBatch batch = newArrowBatch("a boolean, b int", numRows);
final ArrowWritableColumnVector col0 = (ArrowWritableColumnVector) batch.column(0);
final ArrowWritableColumnVector col1 = (ArrowWritableColumnVector) batch.column(1);
for (int j = 0; j < numRows; j++) {
col0.putBoolean(j, j % 2 == 0);
col1.putInt(j, 15 - j);
}
col1.putNull(numRows - 1);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
For test purpose, add this helper function.
Add refactor columnarToRow and rowToColumnar functions to support used in otherwhere.