Add related source code locations to errors #13664

eliaperantoni · 2024-12-05T14:52:01Z

After the PoC that I did in https://github.com/eliaperantoni/datafusion/tree/eper/inline-diagnostics, in this PR I'm attempting to build a more robust solution, with the goal of eventually merging it.

Still, so far I've only added diagnostics to the Cannot coerce arithmetic expression .. to valid types error to get feedback and approval on the general implementation. That way I can make tweaks quickly, before I go extend this to the rest of the repo.

Which issue does this PR close?

Closes #13662.

What changes are included in this PR?

I introduce the Diagnostic, DiagnosticEntry, and DiagnosticEntryKind types to enrich errors with better user-facing information, mainly by referring to the portions of the SQL code that caused it.

A DiagnosticEntry is used to refer to one particular portion of SQL code and describe its involvement in the error with a string message and a kind. A Diagnostic is simply a collection of DiagnosticEntry.

For example, in SELECT a_string + an_integer we'd have:

Entry kind=Error message=Incompatible types in binary expression Wrapping the whole a_string + an_integer expression.
Entry kind=Note message=Is of type Utf8 Wrapping just the left side.
Entry kind=Note message=Is of type Int64 Wrapping just the right side.

A new DataFusionError::Diagnostic variant is used to attach Diagnostic to errors. You can call with_diagnostic on any DataFusionError to do so. They can later be retrieved by calling DataFusionError::get_diagnostics.

It is possible to attach multiple Diagnostic while an error is being returned along the function call stack. The rationale here is that different functions might have different levels of detail, but they might all have useful diagnostics to add. e.g. the function that computes the output type of an expression has no idea about why it's being called, but calling functions have no idea about the intricacies of the error. They all know different things.

I also added a span: Span to Column. This is the first step of a (probably) long process of enriching the types used during logical planning with the Spans from the parsed AST nodes.

Added WithSpan<T>. The idea is that we want to get Span information down to functions called deep in the call stack, but without breaking existing code too much so that we can ensure a smooth and incremental transition to Diagnostic-ed errors. WithSpan<T> was my attempt to do so because any T implements Into<WithSpan<T>> by simply attaching Span::empty (but of course, if a Span is available it should be used instead). This means that any time a function wants to start spawning Diagnostic-ed errors, it can change some of its arguments by wrapping their types in WithSpan<T> and existing calls will keep on working.

WithSpan<T> should be used when the argument type is something that can loosely be traced back to a location in the source code, but is not really part of any AST node, or anything that is isomorphic to it. A good example is DataType which is taken by signature(DataType, Operator, DataType) -> DataType to get the output type of a binary expression. That is the function that spawns the error, and the caller doesn't have enough information to provide a good Diagnostic. But at the same time, it would be (in my opinion) cumbersome to add Span arguments to the function. So instead, the function can wrap the DataTypes for the two sides of the expression in WithSpan<T>. Some call points can be updated to pass a Span-enriched WithSpan<DataType>, and all the others will keep on working without changes required.

Are these changes tested?

Not yet, while this a draft to gather feedback.

Are there any user-facing changes?

Yes. The user can call DataFusionError::get_diagnostics to get an iterator over all the Diagnostic that have been attached to the error chain. Those contain source code locations that relate to the error, and each location comes with a message.

alamb

Thank you @eliaperantoni -- this is very cool. I left some comments, let me know what you think

alamb · 2024-12-05T18:27:57Z

datafusion/common/src/error.rs

@@ -131,6 +132,7 @@ pub enum DataFusionError {
    /// Errors from either mapping LogicalPlans to/from Substrait plans
    /// or serializing/deserializing protobytes to Substrait plans
    Substrait(String),
+    Diagnostic(Diagnostic, Box<DataFusionError>),


This is an interesting idea -- one way to think about this is that it adds additional structured information to DataFusionError::Context

If we went with the DataFusion::Diagnostic approach, do you think we would be able to deprecate / remove DataFusionError::Context in a future release?

Thank you! Yes, I think DataFusion::Diagnostic can convey a superset of the information that DataFusion::Context can. Any wrapping such as:

let error = DataFusionError::Context(message, Box::new(error));

can be converted to:

let error = DataFusionError::Diagnostic( Diagnostic { entries: vec![ DiagnosticEntry { span: Span::empty(), kind: DiagnosticEntryKind::Error, message, } ] }, Box::new(error) );

And of course, we can provide a DataFusionError::with_simple_diagnostic function to avoid the boilerplate. At that point, DataFusion::Context could be removed.

This also enables progressively adding Span information to what was previously simply a string message.

alamb · 2024-12-05T18:31:11Z

datafusion/common/src/column.rs

 pub struct Column {
    /// relation/table reference.
    pub relation: Option<TableReference>,
    /// field/column name.
    pub name: String,
+    #[derivative(


this is an interesting pattern -- it is different than the way sqlparser did it (which is to effectively wrap the span with a struct that is ignored when comparing eq, hash, etc

Did you consider a similar approach?

pub struct Column { ... // Wrapped span that does not contribute to PartialEq, Eq, Hash, etc pub span: DiagnosticOnly<Span> }

I think this approach is less verbose and might be easier to understand for the casual reader

I also see WithSpan has a similar approach (but that is for wrapping things with Spans 🤔 )

It is indeed interesting. 😅 I was going to use derivative for this problem in sqlparser initially actually, but I wasn't sure if you would want to add a new dependency. If that is not a concern, I think using derivative for cases like this is pretty nice as it is also clear at the usage site that it is being ignored, unlike AttachedToken which kind of hides the behavior. If a wrapper type is preferred, I now think actually maybe the struct in sqlparser should have been called DiagnosticOnly as well 😂

I think the AttachedToken approach that we took in sqlparser could be a functionally equivalent alternative. And I understand that not everybody might be familiar with derivative.

What I don't 100% love about it is that the PartialEq implementation on AttachedToken is a bit "impure". In that, it serves the one specific purpose of making it ignored when used in a struct field, but prevents you from actually comparing two instances of it because they would always are equal (the implementation returns always true), which is something you might want to do at some point. Especially because the name AttachedToken doesn't clearly convey that intent, and does more than just being a passthrough for PartialEq since its main purpose is tying together a Span and a Token.

When looking at a struct that uses it:

#[derive(Debug, Clone, PartialEq, PartialOrd, Eq, Ord, Hash)] pub struct Select { /// Token for the `SELECT` keyword pub select_token: AttachedToken, // ... }

it's not clear to me that AttachedToken is being ignored. I only realise that when I go look at the implementation of PartialEq.

Overall, the derivative approach to me more clearly lets me keep the sensible implementation of PartialEq for Span, while also conveying that "when used as a field in this particular struct, I want it to not influence the struct's comparison". But in other structs, I might want it to influence the comparison.

I personally find this approach more readable and flexible, but I'm open to converting to a wrapper type :)

I also see WithSpan has a similar approach (but that is for wrapping things with Spans 🤔 )

Ah yes good catch! I could definitely have used derivative for WithSpan too. And about WithSpan: do you like the general idea of it?

It's meant to be a "sidecar" to get a Span all the way to where it's needed to create a meaningful diagnostic, by attaching it to values that are sort-of-related, but wouldn't make a lot of sense to add as a struct field.

For example, the two DataTypes in a binary expression come from the two expressions in the SQL query, so they (in some sense) relate to a portion of the code. But putting span as a field of DataType sounds wrong.

Maybe a better name could be SpanRelated or something like that?.

From my perspective some design goals should be:

Make it easy for someone reading the DataFusion source code to understand what is going on

If people are not interested in the span / diagnostic information they can ignore it

From my perspective, using new crates does reduce the amount of code in the DataFusion crate, but can often increase cognitive load (as now to understand the DataFusion code you need to understand the crates)

I think the AttachedToken approach that we took in sqlparser could be a functionally equivalent alternative. And I understand that not everybody might be familiar with derivative.

Yeah, that is why I think AttachedToken might be better in this case

alamb · 2024-12-05T18:36:23Z

datafusion/common/src/with_span.rs

+
+use sqlparser::tokenizer::Span;
+
+/// A wrapper type for entitites that somehow refer to a location in the source


Did you consider a trait like was done in sqlparser that retrieves spans? For example

pub trait Spanned { /// return the span for this object, if known fn span(&self) -> Option<&Span> }

And then we would implement that trait for various DataFusion / Arrow tyoes (like DataType, etc)?

That would likely require implementing specific wrapper types like DataTypeWithSpan or something to attach Span information to DataTypes

I also see the need to somehow plumb through the span information into things like coerce-types 🤔

Though maybe now would be a good time to make a proper structure for coercing types, (struct TypeCoercer or something) where we could attach the spans 🤔

Did you consider a trait like was done in sqlparser that retrieves spans?

Yes, but to implement one such trait I'd have to put the data in DataType. And I think that's a bit pushing the limits of what DataType is meant to do. I quite like the idea of the WithSpan (can rename) functioning like a sidecar, to limit big changes in the code.

Maybe we could have a Spanned trait in datafusion as well, that is automatically implemented for all WithSpan<T>, and also by types that natively carry a Span?

Though maybe now would be a good time to make a proper structure for coercing types, (struct TypeCoercer or something) where we could attach the spans 🤔

Not too familiar about the history of that component. But I think that's just one example, and there's probably many more cases where changes like that would be needed to accommodate the presence of Spans.

eliaperantoni · 2024-12-09T14:26:02Z

@alamb here's a few examples of what the new diagnostics that I implemented so far can do:

What are you thoughts so far?

Note that the rendering here is not part of my PR, but it consumes the Diagnostic data that is now produced by datafusion.

We think that this could be quite nicely integrated with datafusion-dft and datafusion-clito provide a richer experience for end users.

Perhaps this kind of quality of life features can also be related to #13525, as it would make it easier for consumers of Datafusion to provide a nicer experience to their end users.

alamb · 2024-12-12T16:07:09Z

Sorry for my radio silence here -- I am reviewing this now. Thank you for your patience

alamb

Sorry again for the delay

I spent some more time reviewing this PR, but did not get through it entirely

I think one of the largest design decisions is "where to store the spans"?

There is a tension between trying to keep full source provenance information and overburdening the existing code. I am musing here, but I am hoping to find some balance between

Getting something useful in to DataFusion that we can incrementally update
Ensuring we have a good long term roadmap.

The basic idea of attaching the spans to existing LogicalPlan and Exprs (which is what this PR seems to do) makes sense to me in theory, but in practice it feels like it may be hugely disruptive (many changes to existing structs) and the information propagation will likely be to hard to maintain and test -- all code that transforms/rewrites would have to propagate the spans

I also added a span: Span to Column. This is the first step of a (probably) long process of enriching the types used during logical planning with the Spans from the parsed AST nodes.

What do you think about initially focusing on locations from the errors out of the sql planner (SqlToRel) and postpone adding Spans to the plan nodes?

This would allow us to set up a good pattern for attaching Diagnostics to errors and reportin them. Once that infrastructure is solidified we can start figuring out a pattern to plumb the source information into other areas (like type coercions errors)?

alamb · 2024-12-12T16:14:44Z

datafusion/common/src/column.rs

 pub struct Column {
    /// relation/table reference.
    pub relation: Option<TableReference>,
    /// field/column name.
    pub name: String,
+    #[derivative(


From my perspective some design goals should be:

Make it easy for someone reading the DataFusion source code to understand what is going on

If people are not interested in the span / diagnostic information they can ignore it

From my perspective, using new crates does reduce the amount of code in the DataFusion crate, but can often increase cognitive load (as now to understand the DataFusion code you need to understand the crates)

I think the AttachedToken approach that we took in sqlparser could be a functionally equivalent alternative. And I understand that not everybody might be familiar with derivative.

Yeah, that is why I think AttachedToken might be better in this case

alamb · 2024-12-12T16:15:47Z

datafusion/common/src/column.rs

+    /// Attaches a [`Span`] to the [`Column`], i.e. its location in the source
+    /// SQL query.
+    pub fn with_span(mut self, span: Span) -> Self {
+        self.spans = vec![span];


should this perhaps append a span to the Column rather than overwrite it?

alamb · 2024-12-12T16:19:46Z

datafusion/common/src/dfschema/mod.rs

@@ -113,6 +117,10 @@ pub struct DFSchema {
    field_qualifiers: Vec<Option<TableReference>>,
    /// Stores functional dependencies in the schema.
    functional_dependencies: FunctionalDependencies,
+    /// The location in the source code where the fields are defined (e.g. in


This seems somewhat strange to me as the schema may not be defined in the query

Maybe we can avoid adding this as part of V1

eliaperantoni · 2025-01-17T14:19:27Z

@alamb I'm very sorry for the delay. I think your points about not wanting to change the logical types are valid, and I confess it was a bit of a pain to make all the changes to make my previous iteration compile and route the spans through. I tried again, this time using your suggestion of:

What do you think about initially focusing on locations from the errors out of the sql planner (SqlToRel) and postpone adding Spans to the plan nodes?

I think this was a lot easier and pollutes the source code a lot less. So far I've implemented:

Table not found
Unqualified column not found
Qualified column not found
Mismatch in number of columns in set operation

I think the "column not found" is the most interesting, since when I get an unresolved datafusion::Column, I now have to walk down a tree of sqlparser::Expr and fine one that produces a matching datafusion::Column.

i.e. since I can't store data in the logical node, when I get a troublesome one I have to transform the AST tree again to figure out which AST node produces the problematic logical node.

That works okay. Performance might not be great but it only happens in case of errors, so I think it might be okay.

I'd like to hear your opinion.

But then I started implementing the "column missing from GROUP BY clause" error and that was a bit more difficult (or unreadable) because there's many more layers of functions in between that which has access the AST tree, and the one that checks a single logical expr in the SELECT part of the query and throws the error. I could of course pass down the AST tree, but need I need a DFSchema and a PlannerContext to normalise the identifiers and correctly go from AST to logical nodes. I find that a bit cumbersome and error prone. I feel like it would be so much easier if we could agree on a way to store Span in the logical Column.

alamb · 2025-01-20T13:08:48Z

datafusion/common/src/error.rs

+        self.with_diagnostic(diagnostic)
+    }
+
+    pub fn get_diagnostics(&self) -> impl Iterator<Item = &Diagnostic> + '_ {


Minor -- I think a more consistent API would be to call this

Suggested change

pub fn get_diagnostics(&self) -> impl Iterator<Item = &Diagnostic> + '_ {

pub fn diagnostics(&self) -> impl Iterator<Item = &Diagnostic> + '_ {

alamb · 2025-01-20T13:10:59Z

I am starting to check this PR out

alamb · 2025-01-20T13:11:45Z

I am looking at this PR in detail now

alamb

Thank you @eliaperantoni and I am really sorry for the delay in reviewing. I will try and keep more focus on this PR.

I think the changes to DataFusion error looks good.

I think I may be missing something with the planner changes, but it seems to me like it would be possible to plumb the "current span" information down using the PlannerContext (I left some more detailed suggestions)

In terms of next steps, what I imagine will happen is once we get the patterns set in this PR, we (aka I will do / help a lot) will write up a high level overview / next steps (e.g. plumb span information more, etc)

alamb · 2025-01-20T13:38:50Z

datafusion/sql/src/expr/mod.rs

-        let mut expr = self.sql_expr_to_logical_expr(sql, schema, planner_context)?;
+        // The location of the original SQL expression in the source code
+        let mut expr =
+            self.sql_expr_to_logical_expr(sql.clone(), schema, planner_context)?;


Having to copy the entire AST just to get the span information on error is non ideal (we are trying to keep planning reasonably faster)

I understand that Expr doesn't have the Span (and adding it is quite potentially intrusive). However, I wonder if you have considered using PlannerContext?

Specifically, since the PlannerContext is already threaded through most/all planning methods, if you could add the "current span" on the PlannerContext, you would have the necessary span information when you generated an error.

#[derive(Debug, Clone)] pub struct PlannerContext { ... /// the current span of the expression or statement being planned /// Note not all statements have span information yet /// see <https://github.com/apache/datafusion-sqlparser-rs/issues/1548> current_span: Option<Span>, ... }

Then rather than calling sql_expr_to_logical_expr twice, you could have the error generated in sql_expr_to_logical_expr include the span information.

The key would to manage setting/restoring the spans during the planing process. Maybe it could be something like

... // set the `current_span` field in the planner context // if sql has a span, otherwise use the existing span let planner_context = planner_context.with_span(&sql); self.sql_expr_to_logical_expr(sql.clone(), schema, planner_context)?; ...

That's a very interesting idea. I see a few issues with it though:

What if sql is a compound expression like a + b but only b is unresolved? Ideally, I would want sql_expr_to_logical_expr to be able to return diagnostics that highlight b only. But how does it do that if it only know the Span of the entire sql expression?

What if a diagnostic needs to highlight various parts of a query. e.g. if a non-aggregated column is missing from the GROUP BY clause, a good diagnostic would highlight both the column and the GROUP BY clause. But planner context has only one Span, and which is the "current span" in this case?

I thought that I could solve the above issues by putting a HashMap<String, Vec<Span>> in PlannerContext. The key would be to label different parts of the query, e.g. group_by_clause, set_operator, left_side_of_expr, etc.. So that I could do something like:

error.with_diagnostic(Diagnostic::new().with_error( "column missing from GROUP BY", planner_context.span("expr").last(), // Highlight the column ).with_note( "GROUP BY is here", planner_context.span("group_by").last(), // Highlight the GROUP BY ))

The Vec<_> is just to have stacks, so that in something like SELECT ... UNION (SELECT ... UNION SELECT ...) the set_operator key could point to the outer UNION, then the inner one, then restore the outer one when we're done planning the query in parentheses.

But then I encountered another issue:

Some functions where PlannerContext is threaded through (e.g. sql_expr_to_logical_expr and sql_expr_to_logical_expr_internal) take &mut PlannerContext. What if I need to add new spans?

For example, in the body of sql_expr_to_logical_expr I would like to do:

But, what if sql_expr_to_logical_expr_internal needs to mutate the given &mut PlannerContext? I can't simply clone PlannerContext and add a new span because I then break the ability to mutate downstream.

Then it seems like a pattern like:

pattern_context.with_span("expr", sql_expr.span(), move |planner_context| { let expr = self.sql_expr_to_logical_expr_internal( *sql_expr, schema, planner_context, )?; });

might be necessary, to add a Span to the given PlannerContext but also remove it. Idk if that's less invasive than adding Span to Expr though.

Also it's validate_schema_satisfies_exprs that throws the error, not sql_expr_to_logical_expr. And in that place I have access to the SQL expression, but the problem still is that I don't know which subexpression was the problem

eliaperantoni · 2025-01-24T12:48:48Z

Thanks! I tried fixing the failing CI tests

eliaperantoni · 2025-01-27T14:23:27Z

Ah sorry @alamb those tests didn't fail before. Is there any way I can trigger CI myself?

alamb

Thank you @eliaperantoni -- I reallly like how this PR is looking now.

in my opinion this PR could almost be merged as is because:

The features are disabled by default
The tests are wonderful

The things blocking in my mind are:

A few comments on the new pub struct
Removing the new clones in SqlToRel (I left comments)

I also think before/part of merging this PR would be to document what is needed before we would want to turn on collect_spans by default

I will also run the sql planning benchmarks just to make make sure there is no planning performance regression

cc @jonahgao and @comphead for your thoughts

datafusion/sql/tests/cases/diagnostic.rs

alamb · 2025-01-27T20:26:57Z

datafusion/sqllogictest/test_files/create_external_table.slt

@@ -33,23 +33,23 @@ statement error DataFusion error: SQL error: ParserError\("Missing LOCATION clau
 CREATE EXTERNAL TABLE t STORED AS CSV

 # Option value is missing
-statement error DataFusion error: SQL error: ParserError\("Expected: string or numeric value, found: \)"\)
+statement error DataFusion error: SQL error: ParserError\("Expected: string or numeric value, found: \) at Line: 1, Column: 66"\)


Thats really cool, does that work for multiline statements?

I'm thinking should be that parser responsibility rather than the runtime engine's 🤔

Yes it does work for multiline statements, and it's the parser that outputs that error.

WITH cte1 AS ( SELECT 1 AS id ), cte2 AS ( SELECT 1 AS id ), SELECT id FROM cte1, cte2, users -- SQL error: ParserError("Expected: AS, found: id at Line: 6, Column: 8")

datafusion/common/src/column.rs

alamb · 2025-01-27T20:40:16Z

datafusion/expr-common/Cargo.toml

@@ -41,3 +41,4 @@ arrow = { workspace = true }
 datafusion-common = { workspace = true }
 itertools = { workspace = true }
 paste = "^1.0"
+sqlparser = { workspace = true }


this new dependecy is also unfortunate, but I see it is required as the BinaryTypeCoercer needs to know about spans

It's not necessary anymore since, after your feedback, I made a semi-copy of Span in datafusion-common 🥳

datafusion/expr/src/expr.rs

alamb · 2025-01-27T21:17:10Z

datafusion/expr-common/src/type_coercion/binary.rs

+    op: &'a Operator,
+    rhs: &'a DataType,
+
+    lhs_spans: Spans,


What about simply having a single spans: Span that is computed on construction? Is there any reason to keep around the different set of spans?

We need the spans of both sides individually though, to produce diagnostics like the following:

Maybe it's not super clear from the screenshot but each note: has type X is highlighting just one side of the expression.

They are also merged for the overarching error though, that's true

datafusion/expr/src/expr_rewriter/mod.rs

datafusion/sql/src/expr/mod.rs

datafusion/sql/src/set_expr.rs

datafusion/common/src/column.rs

datafusion/common/src/diagnostic.rs

comphead · 2025-01-27T22:13:42Z

datafusion/sqllogictest/test_files/unnest.slt

@@ -899,7 +899,7 @@ logical_plan
 07)--------Unnest: lists[__unnest_placeholder(outer_ref(u.column1))|depth=1] structs[]
 08)----------Projection: outer_ref(u.column1) AS __unnest_placeholder(outer_ref(u.column1))
 09)------------EmptyRelation
-physical_plan_error This feature is not implemented: Physical plan does not support logical expression OuterReferenceColumn(List(Field { name: "item", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }), Column { relation: Some(Bare { table: "u" }), name: "column1" })
+physical_plan_error This feature is not implemented: Physical plan does not support logical expression OuterReferenceColumn(List(Field { name: "item", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }), Column { relation: Some(Bare { table: "u" }), name: "column1", spans: Spans([]) })


I'm wondering if Spans brings a purpose here? Should we exclude it from error message, otherwise it might be confusing

Hmm yeah in general I think I agree that we don't need Column::Debug to print the spans. I'll change it.

matthewmturner · 2025-01-28T14:26:05Z

Thanks for the ping @xudong963. Indeed, this is really cool - I haven't had the chance to read the whole thread yet but from what I see I do think this could be a good addition to dft. I've created an issue for it

eliaperantoni · 2025-01-28T20:23:31Z

@alamb fixed ci issues

eliaperantoni · 2025-01-28T22:33:36Z

I also think before/part of merging this PR would be to document what is needed before we would want to turn on collect_spans by default

In my opinion:

Putting Span on more logical types in order to..
Attach Diagnostic to more errors, until most of the typical errors that we can expect the end user to make are covered

I know it's a little vague but, as a starting point, I would consider:

Reference to a function that doesn't exist
Wrong number of parameters in a function call
Wrong types of parameters in a function call
Syntax errors
Subquery that returns more than one column
Type coercion in unary expressions
Using = NULL instead of IS NULL, maybe that could be the first use case of Diagnostic::new_warning
Alias two columns/tables with the same name

alamb

Thank you @eliaperantoni and @comphead (and @mkarbo)!

I think this PR looks really nice and is a great addition to DataFusion

I suggest we wait for the DataFusion 45 release candidate (likely later this week early next week) and merge this PR in once that has been done.

Release DataFusion 45.0.0 #14008

Once we merge someone (I will do it if no one else beats me to it) should then file a ticket like [EPIC]: Improved error source tracking / messages listing the various subtasks you highlighted on #13664 (comment)

FYI @sadboy and @findepi -- this feature might be of interest to SDF as well.

I am very excited to see this happen. Thank you for pushing it through

alamb · 2025-01-29T11:39:14Z

datafusion/expr/src/expr_rewriter/mod.rs

-            Some::<TableReference>(subqry_alias.into()),
-            name,
-        )),
+        Expr::Column(col) => Ok(col.with_relation(subqry_alias.into())),


alamb · 2025-01-29T11:40:41Z

datafusion/sql/src/utils.rs

@@ -90,12 +90,35 @@ pub(crate) fn rebase_expr(
        .data()
 }

+#[derive(Debug, Clone, Copy, PartialEq, Eq)]


this is quite nice

alamb · 2025-01-29T11:47:17Z

I merged up from main for the PR to get a clean CI test run

alamb · 2025-02-01T12:02:40Z

Update here is we are very close to cutting the 45 release branch. See

More details on Release DataFusion 45.0.0 #14008 (comment)

Once we do that I'll plan to merge this PR

github-actions bot added sql SQL Planner logical-expr Logical plan and expressions optimizer Optimizer rules core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) common Related to common crate labels Dec 5, 2024

alamb reviewed Dec 5, 2024

View reviewed changes

This was referenced Dec 6, 2024

Add related source code locations to errors #13662

Open

Report multiple errors, not just the first one #13676

Open

github-actions bot added the proto Related to proto crate label Dec 11, 2024

alamb reviewed Dec 12, 2024

View reviewed changes

eliaperantoni added 5 commits January 16, 2025 14:58

Reset branch to no changes other than SqlRel

13ad9db

feat: improve Diagnostic ergonomics

4603b35

feat: 'table not found' diagnostic in SqlToRel

86e1722

feat: unresolved fields

c3532c1

feat: union with different number of columns

91f4e3c

eliaperantoni force-pushed the eper/diagnostics branch from 0506a80 to 91f4e3c Compare January 17, 2025 14:09

github-actions bot removed logical-expr Logical plan and expressions optimizer Optimizer rules core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) proto Related to proto crate labels Jan 17, 2025

eliaperantoni requested a review from alamb January 20, 2025 11:14

alamb reviewed Jan 20, 2025

View reviewed changes

eliaperantoni added 3 commits January 24, 2025 13:19

fix: clippy, by removing smallvec

1b35207

fix: update slt

243b788

Merge branch 'main' into eper/diagnostics

e98694b

github-actions bot added documentation Improvements or additions to documentation sqllogictest SQL Logic Tests (.slt) labels Jan 24, 2025

eliaperantoni added 5 commits January 24, 2025 15:45

feat: support all binary expressions

e847477

refactor: move diagnostic tests to datafusion-sql

60a9f8b

chore: format

f178db0

Merge branch 'main' into eper/diagnostics

1308b85

fix: tests

51dd141

alamb reviewed Jan 27, 2025

View reviewed changes

alamb mentioned this pull request Jan 27, 2025

Jan 18, 2025: This week(s) in DataFusion #14179

Open

comphead reviewed Jan 27, 2025

View reviewed changes

datafusion/common/src/column.rs Show resolved Hide resolved

comphead reviewed Jan 27, 2025

View reviewed changes

datafusion/common/src/diagnostic.rs Show resolved Hide resolved

comphead reviewed Jan 27, 2025

View reviewed changes

datafusion/common/src/diagnostic.rs Show resolved Hide resolved

comphead reviewed Jan 27, 2025

View reviewed changes

feat: pr feedback

b5e326c

matthewmturner mentioned this pull request Jan 28, 2025

Better error reporting / diagnostics datafusion-contrib/datafusion-dft#266

Open

fix: ci checks

1cd0f3e

eliaperantoni requested a review from alamb January 29, 2025 08:36

Merge remote-tracking branch 'apache/main' into eper/diagnostics

20e9f61

alamb approved these changes Jan 29, 2025

View reviewed changes

alamb mentioned this pull request Jan 31, 2025

[WIP] Introduce the "parser" feature to gate the SQL text processing and leaving only AST and other support types apache/datafusion-sqlparser-rs#1691

Draft


		use sqlparser::tokenizer::Span;

		/// A wrapper type for entitites that somehow refer to a location in the source

	pub fn get_diagnostics(&self) -> impl Iterator<Item = &Diagnostic> + '_ {
	pub fn diagnostics(&self) -> impl Iterator<Item = &Diagnostic> + '_ {

Add related source code locations to errors #13664

Are you sure you want to change the base?

Add related source code locations to errors #13664

Conversation

eliaperantoni commented Dec 5, 2024

Which issue does this PR close?

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

alamb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eliaperantoni commented Dec 9, 2024 • edited Loading

alamb commented Dec 12, 2024

alamb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eliaperantoni commented Jan 17, 2025

Choose a reason for hiding this comment

alamb commented Jan 20, 2025

alamb commented Jan 20, 2025

alamb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eliaperantoni Jan 21, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eliaperantoni commented Jan 24, 2025

eliaperantoni commented Jan 27, 2025

alamb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

comphead Jan 27, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

matthewmturner commented Jan 28, 2025

eliaperantoni commented Jan 28, 2025

eliaperantoni commented Jan 28, 2025

alamb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alamb commented Jan 29, 2025

alamb commented Feb 1, 2025

eliaperantoni commented Dec 9, 2024 •

edited

Loading

eliaperantoni Jan 21, 2025 •

edited

Loading

comphead Jan 27, 2025 •

edited

Loading