-
Notifications
You must be signed in to change notification settings - Fork 600
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug: Errors with NULL columns #9669
Comments
Based on
it sounds like you know the type of the column: string. The best thing to do then is to pass that information to
|
Thanks! Yeah, once I knew the cause solving the immediate problem was not difficult, but I still thought I'd raise this issue as the current behavior might not be ideal. |
@Riezebos Any interest in submitting a PR? Adding this to the |
I'd be interested, but then I'd rather like to fix it in a way that doesn't give the errors I had. Both of the errors don't make it clear that the cause of the problem is a column full of nulls. The current way of handling null columns with duckdb/sqlglot seems a little broken, I'll explain why I think so below. memtable to pyarrow (via duckdb)When using
The schema of the expression still thinks the datatype of the column is null in This can be checked using: import ibis
data = [{"col1": 1, "col2": None}, {"col1": 4, "col2": None}]
t = ibis.memtable(data)
con = ibis.duckdb.connect()
print(con._to_duckdb_relation(t).arrow()) Output:
Relevant lines:
Potential solutions for this would be:
create_table in duckdb from memtableI'm pretty sure this just fails because sqlglot writes a query that tries to create a table with a datatype of NULL which duckdb doesn't support. The method This can be tested by running this after the example above: con.compiler.type_mapper.from_ibis(t.schema().types[1]) Relevant lines: Potential solutions for this would be:
I don't know if null columns is something which ibis wants to handle fully, but the null datatype does exist, it works in pyarrow and somewhat works in duckdb. Without solving the above problems, the combination of ibis+duckdb+null column is a bit broken in my opinion. I personally think that in both cases solution 1 is the best solution. If you agree with the problems and the solutions, I'd be happy to start working on a PR. |
What happened?
When a column has only nulls, I can't create a DuckDB table or pyarrow table from it.
Here are some reproducible examples:
Result:
ParserException: Parser Error: syntax error at or near "NULL"
t.execute()
does work in the above example.Result:
ArrowNotImplementedError: Unsupported cast from int32 to null using function cast_null
Result:
ParserException: Parser Error: syntax error at or near "NULL"
I am guessing the problem is that PyArrow supports columns having a datatype of
null
, which most databases probably don't?DuckDB apparently converts the
null
datatype intoint32
:I have no idea what the best way to handle this is, maybe raising an exception asking the user to specify a schema when a NULL column exists?
A job I run daily suddenly started giving the first error. With the error I got, it took some experimenting to figure out that it was actually caused by this issue. A column in the source data (from some API) that usually has strings and nulls now had only nulls.
What version of ibis are you using?
9.2.0
What backend(s) are you using, if any?
DuckDB
Relevant log output
No response
Code of Conduct
The text was updated successfully, but these errors were encountered: