Add the doc about error recovering

cloudspannerecosystem · Dec 20, 2024 · 8dbc8c8 · 8dbc8c8
1 parent 2c0d969
commit 8dbc8c8
Show file tree

Hide file tree

Showing 4 changed files with 148 additions and 6 deletions.
diff --git a/docs/content/error-recover/_index.md b/docs/content/error-recover/_index.md
@@ -0,0 +1,141 @@
+---
+date: 2024-12-20 00:00:00 +0900
+title: "Error recovering"
+weight: 2
+---
+
+Since v0.1.0, `memefish.ParseXXX` methods returns AST node(s) even if an error is reproted.
+That is, if we try to parse incomplete SQL such as:
+
+```sql
+SELECT (1 +) + (* 2)
+```
+
+Then, the following two errors are reported:
+
+```sql
+syntax error: :1:12: unexpected token: )
+
+  1:  SELECT (1 +) + (* 2)
+                 ^
+
+
+syntax error: :1:17: unexpected token: *
+
+  1:  SELECT (1 +) + (* 2)
+                      ^
+```
+
+Hoever, the AST is also returned:
+
+```go {hl_lines=["10-31","36-57"]}
+&ast.QueryStatement{
+  Query: &ast.Select{
+    Results: []ast.SelectItem{
+      &ast.ExprSelectItem{
+        Expr: &ast.BinaryExpr{
+          Op:   "+",
+          Left: &ast.ParenExpr{
+            Lparen: 7,
+            Rparen: 11,
+            Expr:   &ast.BadExpr{
+              BadNode: &ast.BadNode{
+                NodePos: 8,
+                NodeEnd: 11,
+                Tokens:  []*token.Token{
+                  &token.Token{
+                    Kind: "<int>",
+                    Raw:  "1",
+                    Base: 10,
+                    Pos:  8,
+                    End:  9,
+                  },
+                  &token.Token{
+                    Kind:  "+",
+                    Space: " ",
+                    Raw:   "+",
+                    Pos:   10,
+                    End:   11,
+                  },
+                },
+              },
+            },
+          },
+          Right: &ast.ParenExpr{
+            Lparen: 15,
+            Rparen: 19,
+            Expr:   &ast.BadExpr{
+              BadNode: &ast.BadNode{
+                NodePos: 16,
+                NodeEnd: 19,
+                Tokens:  []*token.Token{
+                  &token.Token{
+                    Kind: "*",
+                    Raw:  "*",
+                    Pos:  16,
+                    End:  17,
+                  },
+                  &token.Token{
+                    Kind:  "<int>",
+                    Space: " ",
+                    Raw:   "2",
+                    Base:  10,
+                    Pos:   18,
+                    End:   19,
+                  },
+                },
+              },
+            },
+          },
+        },
+      },
+    },
+  },
+}
+```
+
+Thus, the places where the error occurred are filled with the `ast.BadXXX` nodes (`ast.BadExpr` in this example).
+
+## How méméfish performs error recovery
+
+This section explains how méméfish performs error recovery.
+
+In méméfish, a *recovery point* is set when parsing a syntax where some multiple types of AST nodes are the result.
+For example, when parsing an parenthesized expression, the recovery point is set after the open parenthesis `(`.
+If an error occurs in the parenthesized expression, the parser backtracks to the recovery point and skips the tokens until the parenthesized expression ends.
+The skipped tokens are then collectively `ast.BadNode` and this node is wrapped up a specific `ast.BadXXX` node (e.g., `ast.BadExpr`).
+
+```sql
+SELECT (1 + 2 *)
+               ^--- error point
+        ^---------- recovery point
+        |~~~~~| --- skipped tokens
+```
+
+Recovery points are set where:
+
+- the beginning of statements, queries, DDLs, DMLs,
+- the beginning of expressions (e.g., after an open parenthesis `(`, `SELECT`, `WHERE` etc.), and
+- the beginning of types.
+
+Token skipping is performed as follows.
+
+- For `ast.Statement`, `ast.DDL`, and `ast.DML`,
+  * skip tokens until a semicolon `;` appears.
+- For `ast.QueryExpr`,
+  * skip tokens until a semicolon `;` appears, or
+  * skip tokens with counting the nest of parentheses `(`
+    + until the closing symbol (`)`) appears at no nestings, or
+    + until the symbol that is supposed to be the end of the expression (`UNION`, `INTERSECT`, `EXCEPT`) appears at no nestings.
+- For `ast.Expr`,
+  * skip tokens until a semicolon `;` appears, or
+  * skip tokens with counting the nest of parentheses `(`, brackets `[`, `CASE` and `WHEN`
+    + until the closing symbol (`)`, `]`, `END`, `THEN`) appears at no nestings or
+    + until the symbol that is supposed to be the end of the expression (`,`, `AS`, `FROM`, `GROUP`, `HAVING`, `ORDER`, `LIMIT`, `OFFSET`, `AT`, `UNION`, `INTERSECT`, `EXCEPT`) appears at no nestings.
+- For `ast.Type`,
+  * skip tokens until the semicolon `;` or the closing parenthesis `)` appears, or
+  * skip tokens with counting the nest of triangle brackets `<`
+  * until the closing symbol (`>`) appears at no nestings.
+
+Note that this skipping rules are just heuristics and may not be perfect.
+In some cases, there is a possibility of skipping too many tokens.
diff --git a/docs/content/example-parse/_index.md b/docs/content/example-parse/_index.md
@@ -6,10 +6,6 @@ weight: 1
 
 This example shows how to parse a Spanner SQL and unparse it.
 
- <!--more--> 
-
- ## Code
-
 ```go
 package main
 

diff --git a/docs/hugo.toml b/docs/hugo.toml
@@ -21,6 +21,8 @@ summaryLength = 30
     startLevel = 2
     endLevel = 6
     ordered = false
+  [markup.highlight]
+    style = 'catppuccin-frappe'
 
 [params]
   description = "Spanner SQL parser for Go"

diff --git a/tools/parse/main.go b/tools/parse/main.go
@@ -15,7 +15,7 @@ import (
 	"github.com/cloudspannerecosystem/memefish/ast"
 	"github.com/cloudspannerecosystem/memefish/token"
 	"github.com/cloudspannerecosystem/memefish/tools/util/poslang"
-	"github.com/k0kubun/pp"
+	"github.com/k0kubun/pp/v3"
 )
 
 var usage = heredoc.Doc(`
@@ -121,8 +121,11 @@ func main() {
 	}
 
 	fmt.Println("--- AST")
-	_, _ = pp.Println(node)
+	pprinter := pp.New()
+	pprinter.SetOmitEmpty(true)
+	_, _ = pprinter.Println(node)
 	fmt.Println()
+
 	fmt.Println("--- SQL")
 	fmt.Println(node.SQL())
-Original file line number
+Diff line change
@@ Expand Up / @@ -6,10 +6,6 @@ weight: 1 @@
     This example shows how to parse a Spanner SQL and unparse it.
-     <!--more-->
-     ## Code
     ```go
     package main
@@ Expand Down @@