Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CH] Mismatched results from stddev #5904

Closed
lgbo-ustc opened this issue May 29, 2024 · 2 comments · Fixed by #5913
Closed

[CH] Mismatched results from stddev #5904

lgbo-ustc opened this issue May 29, 2024 · 2 comments · Fixed by #5913
Labels
bug Something isn't working triage

Comments

@lgbo-ustc
Copy link
Contributor

Backend

CH (ClickHouse)

Bug description

Following queries have different results between gluten and vanilla

select a, stddev(b/c) from (select * from values (1,2, 1), (1,3,0) as data(a,b,c)) group by a
select a, stddev(b) from (select * from values (1,2, 1) as data(a,b,c)) group by a

vanilla's result is null, but gluten's result is NaN.

Spark version

None

Spark configurations

No response

System information

No response

Relevant logs

No response

@lgbo-ustc lgbo-ustc added bug Something isn't working triage labels May 29, 2024
@lgbo-ustc
Copy link
Contributor Author

adding a post projections doesn't seem to make too much overhead.

f2386dc7dd0d :) select a, if(isNaN(dev), null, dev) from(select a, stddev_samp(b) as dev   from (select number as a , number % 111 as b from numbers(10000000)) group by a) Format Null settings max_threads=1

SELECT
    a,
    if(isNaN(dev), NULL, dev)
FROM
(
    SELECT
        a,
        stddev_samp(b) AS dev
    FROM
    (
        SELECT
            number AS a,
            number % 111 AS b
        FROM numbers(10000000)
    )
    GROUP BY a
)
FORMAT `Null`
SETTINGS max_threads = 1

Query id: 49a4b5ec-1be6-4f15-9332-f3e4f637aeb1

Ok.

0 rows in set. Elapsed: 1.240 sec. Processed 10.00 million rows, 80.00 MB (8.06 million rows/s., 64.49 MB/s.)
Peak memory usage: 769.10 MiB.

f2386dc7dd0d :) select a, stddev_samp(b) as dev   from (select number as a , number % 111 as b from numbers(10000000)) group by a Format Null settings max_threads=1;

SELECT
    a,
    stddev_samp(b) AS dev
FROM
(
    SELECT
        number AS a,
        number % 111 AS b
    FROM numbers(10000000)
)
GROUP BY a
FORMAT `Null`
SETTINGS max_threads = 1

Query id: cfcca2c4-a437-4f5b-99ac-a7cbc1c24a20

Ok.

0 rows in set. Elapsed: 1.235 sec. Processed 10.00 million rows, 80.00 MB (8.09 million rows/s., 64.76 MB/s.)
Peak memory usage: 769.10 MiB.

@lgbo-ustc
Copy link
Contributor Author

lgbo-ustc commented May 29, 2024

一般而言,CH的聚合函数对于非nullable的数据,结果是非nullable的。nullable的兼容是在外面再包一个null combinator实现的。一般情况下,只有所有输入都是null时,聚合的结果才为null。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant