You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
SELECTCOUNT(*) FROM hits WHERE"URL"LIKE'%google%';
SELECT"SearchPhrase", MIN("URL"), COUNT(*) AS c FROM hits WHERE"URL"LIKE'%google%'AND"SearchPhrase"<>''GROUP BY"SearchPhrase"ORDER BY c DESCLIMIT10;
SELECT"SearchPhrase", MIN("URL"), MIN("Title"), COUNT(*) AS c, COUNT(DISTINCT "UserID") FROM hits WHERE"Title"LIKE'%Google%'AND"URL" NOT LIKE'%.google.%'AND"SearchPhrase"<>''GROUP BY"SearchPhrase"ORDER BY c DESCLIMIT10;
SELECT*FROM hits WHERE"URL"LIKE'%google%'ORDER BY to_timestamp_seconds("EventTime") LIMIT10;
Describe the solution you'd like
Given a table with StringView data such as:
> CREATE OR REPLACE TABLE string_views ASVALUES (arrow_cast('Andrew', 'Utf8View'), 'A Much Longer String Than 12 Characters', 'Nonsense', 'A Much', NULL);
0 row(s) fetched.
Elapsed 0.006 seconds.
>select arrow_typeof(column1) from string_views limit1;
+------------------------------------+
| arrow_typeof(string_views.column1) |
+------------------------------------+
| Utf8View |
+------------------------------------+1 row(s) fetched.
Elapsed 0.012 seconds.
I would like queries using LIKE and NOT LIKE to work:
>select column1 from string_views WHERE column1 LIKE'A%';
Arrow error: Invalid argument error: Invalid string operation: Utf8View LIKE Utf8View
>select column1 from string_views WHERE column1 NOT LIKE'A%';
Arrow error: Invalid argument error: Invalid string operation: Utf8View NLIKE Utf8View
>select column1 from string_views WHERE column1 LIKE'%e';
Arrow error: Invalid argument error: Invalid string operation: Utf8View LIKE Utf8View
>select column1 from string_views WHERE column1 LIKE'%A%';
Arrow error: Invalid argument error: Invalid string operation: Utf8View LIKE Utf8View
>
Describe alternatives you've considered
We could add a coercion rule to LIKE to automatically coerce Utf8View to Utf8, however that is inefficient as it will involve copying all the strings. It would be much better to actually implement LIKE for Utf8View arrays directly
Is your feature request related to a problem or challenge?
Part of #10918 where we are integrating
StringView
into DataFusion, initially targeting making ClickBench queries fasterIn the ClickBench queries there are several
LIKE
predicates on String columns such asdatafusion/benchmarks/queries/clickbench/queries.sql
Lines 21 to 24 in 5bfc11b
Describe the solution you'd like
Given a table with StringView data such as:
I would like queries using LIKE and NOT LIKE to work:
Describe alternatives you've considered
We could add a coercion rule to LIKE to automatically coerce Utf8View to Utf8, however that is inefficient as it will involve copying all the strings. It would be much better to actually implement
LIKE
forUtf8View
arrays directlyCurrently LIKE and ILIKE are implemented in arrow-rs kernels, such as https://docs.rs/arrow/latest/arrow/compute/kernels/comparison/fn.like.html
The DataFusion implementation is here:
https://github.com/apache/datafusion/blob/main/datafusion/physical-expr/src/expressions/like.rs
I think there are two potential implementations:
Additional context
Please remember to target the
string-view
branch in DataFusion, rather thanmain
with your PRThe text was updated successfully, but these errors were encountered: