Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return int32 for integer type date part #13466

Merged
merged 8 commits into from
Nov 22, 2024

Conversation

jayzhan211
Copy link
Contributor

@jayzhan211 jayzhan211 commented Nov 18, 2024

Which issue does this PR close?

Close #13482
Part of #13449

Rationale for this change

Date part such as year, month, day are integer type, we can return i32 instead of casting it to f64

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added sqllogictest SQL Logic Tests (.slt) functions labels Nov 18, 2024
@jayzhan211 jayzhan211 changed the title Return int for integer type date part Return int32 for integer type date part Nov 18, 2024
Signed-off-by: jayzhan211 <[email protected]>
| "qtr"
| "quarter"
| "doy"
| "dow"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

second, millisecond, micro, nanos too?

Copy link
Contributor Author

@jayzhan211 jayzhan211 Nov 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let r: Float64Array = secs
            .iter()
            .zip(subsecs)
            .map(|(secs, subsecs)| {
                secs.map(|secs| {
                    let subsecs = subsecs.unwrap_or(0);
                    (secs as f64 + ((subsecs % 1_000_000_000) as f64 / 1_000_000_000_f64))
                        * sf
                })
            })
            .collect();

Probably we still need f64 for seconds

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i didn't know extract(second .. returns seconds with fraction. In PostgreSQL it does so, in e.g. Trino and Snowflake it does not

postgres=# select extract(second from now());
  extract
-----------
 29.009019
trino> select extract(second from now());
 _col0
-------
    46

so yes, keep seconds with float for now

Copy link
Contributor

@comphead comphead Nov 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extract for seconds is float correct, returning the fractions. however its not the case for DuckDB.

Copy link
Member

@findepi findepi Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

filed #13482 for seconds

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jayzhan211 what about millisecond, micro, nano?

// Try to remove quote if exist, if the quote is invalid, return original string and let the downstream function handle the error
fn part_normalization(part: &str) -> &str {
part.strip_prefix(|c| c == '\'' || c == '\"')
.and_then(|s| s.strip_suffix(|c| c == '\'' || c == '\"'))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why would we want to support unmatched quotes like '..."?

the date_part argument shouldn't contain any quotes, why would we want to support quotes at all?

i see this is pre-existing, so okay to keep, but still wonder why we're doing this, doesn't feel right

SELECT extract(year from arrow_cast('10 years', 'Interval(YearMonth)'))
----
10

query R
query I
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have explicit tests inspecting arrow_typeof(extract(....))?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I is for integer and we don't need to make sure it is exactly i32 or i64 so I think that is fine. If we want to differentiate type like utf8 or utf8View then we need arrow_typeof

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I is for integer

yes i know

need to make sure it is exactly i32 or i64

do we? do we not?
i think it may make difference, since it's 2x fewer bytes to keep in memory

up to you

@jayzhan211
Copy link
Contributor Author

--------------------
Benchmark clickbench_1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃      main ┃ date-part-int ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │    0.40ms │        0.53ms │  1.33x slower │
│ QQuery 1     │   45.15ms │       47.96ms │  1.06x slower │
│ QQuery 2     │   81.29ms │       79.44ms │     no change │
│ QQuery 3     │   68.24ms │       62.99ms │ +1.08x faster │
│ QQuery 4     │  404.77ms │      408.36ms │     no change │
│ QQuery 5     │  730.24ms │      722.14ms │     no change │
│ QQuery 6     │   38.72ms │       39.52ms │     no change │
│ QQuery 7     │   44.31ms │       43.05ms │     no change │
│ QQuery 8     │  476.74ms │      467.79ms │     no change │
│ QQuery 9     │  672.87ms │      669.63ms │     no change │
│ QQuery 10    │  175.53ms │      179.61ms │     no change │
│ QQuery 11    │  199.23ms │      199.55ms │     no change │
│ QQuery 12    │  758.33ms │      775.82ms │     no change │
│ QQuery 13    │  895.53ms │      895.39ms │     no change │
│ QQuery 14    │  725.44ms │      724.15ms │     no change │
│ QQuery 15    │  485.85ms │      500.24ms │     no change │
│ QQuery 16    │ 1132.63ms │     1145.34ms │     no change │
│ QQuery 17    │ 1058.71ms │     1047.24ms │     no change │
│ QQuery 18    │ 3434.22ms │     2979.31ms │ +1.15x faster │
│ QQuery 19    │   55.59ms │       57.97ms │     no change │
│ QQuery 20    │  850.86ms │      853.53ms │     no change │
│ QQuery 21    │ 1116.08ms │     1105.99ms │     no change │
│ QQuery 22    │ 3147.33ms │     3218.51ms │     no change │
│ QQuery 23    │ 7837.74ms │     7632.39ms │     no change │
│ QQuery 24    │  502.62ms │      483.36ms │     no change │
│ QQuery 25    │  497.66ms │      505.86ms │     no change │
│ QQuery 26    │  559.75ms │      529.33ms │ +1.06x faster │
│ QQuery 27    │ 1317.98ms │     1291.91ms │     no change │
│ QQuery 28    │ 9508.45ms │     9513.32ms │     no change │
│ QQuery 29    │  432.42ms │      406.76ms │ +1.06x faster │
│ QQuery 30    │  668.52ms │      670.90ms │     no change │
│ QQuery 31    │  679.62ms │      696.56ms │     no change │
│ QQuery 32    │ 3171.05ms │     3326.19ms │     no change │
│ QQuery 33    │ 3197.64ms │     3022.01ms │ +1.06x faster │
│ QQuery 34    │ 2907.09ms │     3069.23ms │  1.06x slower │
│ QQuery 35    │  714.22ms │      702.46ms │     no change │
│ QQuery 36    │  137.17ms │      132.89ms │     no change │
│ QQuery 37    │  101.83ms │       99.71ms │     no change │
│ QQuery 38    │  101.74ms │      102.64ms │     no change │
│ QQuery 39    │  254.02ms │      251.36ms │     no change │
│ QQuery 40    │   40.08ms │       37.21ms │ +1.08x faster │
│ QQuery 41    │   38.25ms │       31.57ms │ +1.21x faster │
│ QQuery 42    │   41.37ms │       39.40ms │     no change │
└──────────────┴───────────┴───────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary            ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (main)            │ 49307.28ms │
│ Total Time (date-part-int)   │ 48769.11ms │
│ Average Time (main)          │  1146.68ms │
│ Average Time (date-part-int) │  1134.17ms │
│ Queries Faster               │          7 │
│ Queries Slower               │          3 │
│ Queries with No Change       │         33 │
└──────────────────────────────┴────────────┘
--------------------
Benchmark tpch_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃     main ┃ date-part-int ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │ 106.95ms │       98.68ms │ +1.08x faster │
│ QQuery 2     │  27.55ms │       26.60ms │     no change │
│ QQuery 3     │  41.45ms │       40.75ms │     no change │
│ QQuery 4     │  29.59ms │       25.12ms │ +1.18x faster │
│ QQuery 5     │  61.08ms │       56.22ms │ +1.09x faster │
│ QQuery 6     │  20.54ms │       18.17ms │ +1.13x faster │
│ QQuery 7     │  76.95ms │       70.63ms │ +1.09x faster │
│ QQuery 8     │  58.13ms │       56.18ms │     no change │
│ QQuery 9     │  70.39ms │       69.05ms │     no change │
│ QQuery 10    │  64.87ms │       62.71ms │     no change │
│ QQuery 11    │  19.23ms │       17.79ms │ +1.08x faster │
│ QQuery 12    │  39.59ms │       38.58ms │     no change │
│ QQuery 13    │  33.14ms │       35.80ms │  1.08x slower │
│ QQuery 14    │  31.71ms │       28.79ms │ +1.10x faster │
│ QQuery 15    │  48.16ms │       45.50ms │ +1.06x faster │
│ QQuery 16    │  19.27ms │       19.69ms │     no change │
│ QQuery 17    │  81.12ms │       78.97ms │     no change │
│ QQuery 18    │ 116.53ms │      115.46ms │     no change │
│ QQuery 19    │  51.45ms │       50.14ms │     no change │
│ QQuery 20    │  43.01ms │       41.61ms │     no change │
│ QQuery 21    │  94.00ms │       90.85ms │     no change │
│ QQuery 22    │  21.32ms │       20.42ms │     no change │
└──────────────┴──────────┴───────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary            ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (main)            │ 1156.04ms │
│ Total Time (date-part-int)   │ 1107.72ms │
│ Average Time (main)          │   52.55ms │
│ Average Time (date-part-int) │   50.35ms │
│ Queries Faster               │         8 │
│ Queries Slower               │         1 │
│ Queries with No Change       │        13 │
└──────────────────────────────┴───────────┘

@jayzhan211 jayzhan211 marked this pull request as ready for review November 18, 2024 14:46
Signed-off-by: jayzhan211 <[email protected]>
jayzhan211 and others added 2 commits November 19, 2024 20:51
@jayzhan211 jayzhan211 marked this pull request as draft November 20, 2024 07:00
@jayzhan211 jayzhan211 marked this pull request as ready for review November 20, 2024 07:18
Signed-off-by: jayzhan211 <[email protected]>
return internal_err!("unit {unit:?} not supported");
}

let conversion_factor = match unit {
Copy link
Contributor

@comphead comphead Nov 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should probably factor out the common part for this conversion in future. its too many hardcoded conversions between second fraction and corresponding long value in the project.

Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm thanks @jayzhan211

@jayzhan211 jayzhan211 merged commit 58761ac into apache:main Nov 22, 2024
25 checks passed
@jayzhan211 jayzhan211 deleted the date-part-int branch November 22, 2024 00:47
@jayzhan211
Copy link
Contributor Author

Thanks @comphead @findepi @Dandandan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api change Changes the API exposed to users of the crate functions sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

extract(second from dt) should return seconds without fraction
5 participants