Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

min/max aggregate functions should support non-numeric types #5584

Open
philrz opened this issue Jan 17, 2025 · 1 comment
Open

min/max aggregate functions should support non-numeric types #5584

philrz opened this issue Jan 17, 2025 · 1 comment

Comments

@philrz
Copy link
Contributor

philrz commented Jan 17, 2025

tl;dr

The SQL spec indicates the MIN and MAX aggregate functions should work with many data types, but these functions in SuperDB currently only support numeric types.

Details

Repro is with super commit ec4a26f. This issue was spotted because ClickBench queries 21 & 22 happen to invoke MIN(URL) where URL is a string type.

These aggregate functions currently work fine with numbers.

$ super -version
Version: v1.18.0-230-gec4a26f8

$ echo '{"a":1} {"a":2} {"a":3}' | super -c "select max(a) from '/dev/stdin'"
{max:3}

However, with a non-numeric type such as strings, we return null.

$ echo '{"a":"foo"} {"a":"bar"} {"a":"baz"}' | super -c "select max(a) from '/dev/stdin'"
{max:null}

Meanwhile a SQL solution like DuckDB does return a result.

$ duckdb --version
v1.1.3 19864453f7

$ echo '{"a":"foo"} {"a":"bar"} {"a":"baz"}' | duckdb -c "select max(a) from read_json('/dev/stdin')"
┌─────────┐
│ max(a)  │
│ varchar │
├─────────┤
│ foo     │
└─────────┘

A reading of the 1992 SQL spec shows in section 6.5 that the results of MIN and MAX should be determined using the logic from section 8.2 on "comparison predicates", which in turn lists out the treatments for the many different data types including strings and many others.

@philrz
Copy link
Contributor Author

philrz commented Jan 21, 2025

In a group discussion it was pointed out that the challenge for us vs. the incumbent SQL solutions is that we don't have the luxury of always assuming values of a single type are being aggregated. Therefore solving this correctly requires defining the "total order" among types. See #4545 for another use case affected by this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant