Skip to content

Commit

Permalink
Add the doc about error recovering
Browse files Browse the repository at this point in the history
  • Loading branch information
makenowjust committed Dec 20, 2024
1 parent 2c0d969 commit 8dbc8c8
Show file tree
Hide file tree
Showing 4 changed files with 148 additions and 6 deletions.
141 changes: 141 additions & 0 deletions docs/content/error-recover/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,141 @@
---
date: 2024-12-20 00:00:00 +0900
title: "Error recovering"
weight: 2
---

Since v0.1.0, `memefish.ParseXXX` methods returns AST node(s) even if an error is reproted.
That is, if we try to parse incomplete SQL such as:

```sql
SELECT (1 +) + (* 2)
```

Then, the following two errors are reported:

```sql
syntax error: :1:12: unexpected token: )

1: SELECT (1 +) + (* 2)
^


syntax error: :1:17: unexpected token: *

1: SELECT (1 +) + (* 2)
^
```

Hoever, the AST is also returned:

```go {hl_lines=["10-31","36-57"]}
&ast.QueryStatement{
Query: &ast.Select{
Results: []ast.SelectItem{
&ast.ExprSelectItem{
Expr: &ast.BinaryExpr{
Op: "+",
Left: &ast.ParenExpr{
Lparen: 7,
Rparen: 11,
Expr: &ast.BadExpr{
BadNode: &ast.BadNode{
NodePos: 8,
NodeEnd: 11,
Tokens: []*token.Token{
&token.Token{
Kind: "<int>",
Raw: "1",
Base: 10,
Pos: 8,
End: 9,
},
&token.Token{
Kind: "+",
Space: " ",
Raw: "+",
Pos: 10,
End: 11,
},
},
},
},
},
Right: &ast.ParenExpr{
Lparen: 15,
Rparen: 19,
Expr: &ast.BadExpr{
BadNode: &ast.BadNode{
NodePos: 16,
NodeEnd: 19,
Tokens: []*token.Token{
&token.Token{
Kind: "*",
Raw: "*",
Pos: 16,
End: 17,
},
&token.Token{
Kind: "<int>",
Space: " ",
Raw: "2",
Base: 10,
Pos: 18,
End: 19,
},
},
},
},
},
},
},
},
},
}
```

Thus, the places where the error occurred are filled with the `ast.BadXXX` nodes (`ast.BadExpr` in this example).

## How méméfish performs error recovery

This section explains how méméfish performs error recovery.

In méméfish, a *recovery point* is set when parsing a syntax where some multiple types of AST nodes are the result.
For example, when parsing an parenthesized expression, the recovery point is set after the open parenthesis `(`.
If an error occurs in the parenthesized expression, the parser backtracks to the recovery point and skips the tokens until the parenthesized expression ends.
The skipped tokens are then collectively `ast.BadNode` and this node is wrapped up a specific `ast.BadXXX` node (e.g., `ast.BadExpr`).

```sql
SELECT (1 + 2 *)
^--- error point
^---------- recovery point
|~~~~~| --- skipped tokens
```

Recovery points are set where:

- the beginning of statements, queries, DDLs, DMLs,
- the beginning of expressions (e.g., after an open parenthesis `(`, `SELECT`, `WHERE` etc.), and
- the beginning of types.

Token skipping is performed as follows.

- For `ast.Statement`, `ast.DDL`, and `ast.DML`,
* skip tokens until a semicolon `;` appears.
- For `ast.QueryExpr`,
* skip tokens until a semicolon `;` appears, or
* skip tokens with counting the nest of parentheses `(`
+ until the closing symbol (`)`) appears at no nestings, or
+ until the symbol that is supposed to be the end of the expression (`UNION`, `INTERSECT`, `EXCEPT`) appears at no nestings.
- For `ast.Expr`,
* skip tokens until a semicolon `;` appears, or
* skip tokens with counting the nest of parentheses `(`, brackets `[`, `CASE` and `WHEN`
+ until the closing symbol (`)`, `]`, `END`, `THEN`) appears at no nestings or
+ until the symbol that is supposed to be the end of the expression (`,`, `AS`, `FROM`, `GROUP`, `HAVING`, `ORDER`, `LIMIT`, `OFFSET`, `AT`, `UNION`, `INTERSECT`, `EXCEPT`) appears at no nestings.
- For `ast.Type`,
* skip tokens until the semicolon `;` or the closing parenthesis `)` appears, or
* skip tokens with counting the nest of triangle brackets `<`
* until the closing symbol (`>`) appears at no nestings.

Note that this skipping rules are just heuristics and may not be perfect.
In some cases, there is a possibility of skipping too many tokens.
4 changes: 0 additions & 4 deletions docs/content/example-parse/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,6 @@ weight: 1

This example shows how to parse a Spanner SQL and unparse it.

<!--more-->

## Code

```go
package main

Expand Down
2 changes: 2 additions & 0 deletions docs/hugo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,8 @@ summaryLength = 30
startLevel = 2
endLevel = 6
ordered = false
[markup.highlight]
style = 'catppuccin-frappe'

[params]
description = "Spanner SQL parser for Go"
Expand Down
7 changes: 5 additions & 2 deletions tools/parse/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ import (
"github.com/cloudspannerecosystem/memefish/ast"
"github.com/cloudspannerecosystem/memefish/token"
"github.com/cloudspannerecosystem/memefish/tools/util/poslang"
"github.com/k0kubun/pp"
"github.com/k0kubun/pp/v3"
)

var usage = heredoc.Doc(`
Expand Down Expand Up @@ -121,8 +121,11 @@ func main() {
}

fmt.Println("--- AST")
_, _ = pp.Println(node)
pprinter := pp.New()
pprinter.SetOmitEmpty(true)
_, _ = pprinter.Println(node)
fmt.Println()

fmt.Println("--- SQL")
fmt.Println(node.SQL())

Expand Down

0 comments on commit 8dbc8c8

Please sign in to comment.