Add Spark CAST(timestamp as integral) #11468

boneanxs · 2024-11-07T06:10:45Z

Add Spark CAST (timestamp as integral). Supported types are tinyint, smallint, integer and bigint.

Spark's implementation: https://github.com/apache/spark/blob/fd86f85e181fc2dc0f50a096855acf83a6cc5d9c/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala#L682

netlify · 2024-11-07T06:11:03Z

✅ Deploy Preview for meta-velox canceled.

Name	Link
🔨 Latest commit	`8cc7aa6`
🔍 Latest deploy log	https://app.netlify.com/sites/meta-velox/deploys/672c59e71b10db0008347454

rui-mo

Thanks.

rui-mo · 2024-11-08T03:24:44Z

velox/type/Timestamp.h

-          (int64_t)(nanos_ / 1'000));
-    } catch (const std::exception& e) {
+    // We use int128_t to make sure the computation does not overflows since
+    // there are cases such that seconds*1000000 does not fit in int64_t,


Spark is using int64 type to represent micro second, so the max allowed seconds should be INT64_MAX / 1000000. For a valid timestamp from Spark, why would seconds * 1000000 overflow?

The min second could overflow, the min second is -9223372036855

Would you like to extract this fix to a separate PR like 671e126? We could add test in 'velox/type/tests/TimestampTest.cpp'.

sure, let me extract this to a new pr

Added here: #11774

jinchengchenghh

Please also update the document, thanks!

jinchengchenghh · 2024-11-11T09:14:59Z

velox/functions/sparksql/specialforms/SparkCastHooks.cpp

@@ -41,6 +40,12 @@ Expected<Timestamp> SparkCastHooks::castIntToTimestamp(int64_t seconds) const {
  return Timestamp(seconds, 0);
 }

+Expected<int64_t> SparkCastHooks::castTimestampToInt(
+    Timestamp timestamp) const {
+  return std::floor(


Does timestamp.toMicros() / Timestamp::kMicrosecondsInSecond work?

If the timestamp.toMicros() is negative, we need to round towards negative infinity instead of 0, so here we need std::floor, like the implementation in Spark using Math.floorDiv(ts, MICROS_PER_SECOND)

jinchengchenghh · 2024-11-11T09:16:06Z

velox/type/Timestamp.h

-    } catch (const std::exception& e) {
+    // We use int128_t to make sure the computation does not overflows since
+    // there are cases such that seconds*1000000 does not fit in int64_t,
+    // but seconds*1000000 + nanos does, an example is TimeStamp::minMillis().


typo TimeStamp::minMillis(). -> Timestamp::minMillis().

jinchengchenghh · 2024-11-11T09:16:27Z

velox/type/Timestamp.h

+    // there are cases such that seconds*1000000 does not fit in int64_t,
+    // but seconds*1000000 + nanos does, an example is TimeStamp::minMillis().
+
+    // If the final result does not fit in int64_tw we throw.


typo int64_tw

jinchengchenghh · 2024-11-11T09:17:06Z

velox/type/Timestamp.h

+    // If the final result does not fit in int64_tw we throw.
+    __int128_t result =
+        (__int128_t)seconds_ * 1'000'000 + (int64_t)(nanos_ / 1'000);
+    if (result < std::numeric_limits<int64_t>::min() ||


INT64_MAX and INT64_MIN

rui-mo · 2024-11-12T08:27:36Z

velox/type/Timestamp.h

-          (int64_t)(nanos_ / 1'000));
-    } catch (const std::exception& e) {
+    // We use int128_t to make sure the computation does not overflows since
+    // there are cases such that seconds*1000000 does not fit in int64_t,


Would you like to extract this fix to a separate PR like 671e126? We could add test in 'velox/type/tests/TimestampTest.cpp'.

rui-mo · 2024-11-12T08:31:22Z

velox/expression/CastExpr-inl.h

+      if (castResult.hasError()) {
+        setError(castResult.error().message());
+      } else {
+        result->set(row, static_cast<To>(castResult.value()));


static_cast(castResult.value())

Is overflow well handled if casting int64_t as a lower-byte type?

Add Spark timestamp to int

8cc7aa6

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 7, 2024

rui-mo reviewed Nov 8, 2024

View reviewed changes

boneanxs requested a review from rui-mo November 11, 2024 06:53

jinchengchenghh reviewed Nov 11, 2024

View reviewed changes

rui-mo reviewed Nov 12, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Spark CAST(timestamp as integral) #11468

Add Spark CAST(timestamp as integral) #11468

boneanxs commented Nov 7, 2024

netlify bot commented Nov 7, 2024 •

edited

Loading

rui-mo left a comment

rui-mo Nov 8, 2024

boneanxs Nov 8, 2024

rui-mo Nov 12, 2024

boneanxs Dec 6, 2024

boneanxs Dec 6, 2024

jinchengchenghh left a comment

jinchengchenghh Nov 11, 2024

boneanxs Dec 6, 2024

jinchengchenghh Nov 11, 2024

jinchengchenghh Nov 11, 2024

jinchengchenghh Nov 11, 2024

rui-mo Nov 12, 2024

rui-mo Nov 12, 2024

Add Spark CAST(timestamp as integral) #11468

Are you sure you want to change the base?

Add Spark CAST(timestamp as integral) #11468

Conversation

boneanxs commented Nov 7, 2024

netlify bot commented Nov 7, 2024 • edited Loading

✅ Deploy Preview for meta-velox canceled.

rui-mo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jinchengchenghh left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

netlify bot commented Nov 7, 2024 •

edited

Loading