Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: MySQL数据源支持基于识别自定义分隔符以及识别BeginEnd语句块切分SQL文本 #2636

Merged
merged 23 commits into from
Oct 30, 2024
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
100e845
feat: delimiter can identifying delimiter definitions syntax and deli…
winfredLIN Jul 4, 2024
63d9001
feat: block can match begin and end of some sql block
winfredLIN Jul 4, 2024
a02896e
feat: splitter can split sql text into sqls
winfredLIN Jul 4, 2024
ffad400
test: unit test for splitter
winfredLIN Jul 4, 2024
4988a79
modify: use splitter instead of origin parser to parse sql text
winfredLIN Jul 4, 2024
b977faa
ci: no lint this error check
winfredLIN Jul 4, 2024
f2aca99
ci: no error check of this method use
winfredLIN Jul 4, 2024
65907ff
chore: update dependency of parser
winfredLIN Sep 27, 2024
d93809e
rename: matcheDelimiterCommand to matchedDelimiterCommand
winfredLIN Sep 27, 2024
737ded1
modify: handle error while using reset method
winfredLIN Sep 27, 2024
0e0ea93
modify: handle error return by Parse()
winfredLIN Sep 29, 2024
3543efc
modify: the logic of judging delimiter
winfredLIN Sep 29, 2024
2a7f4c1
modify: optimize get and set delimiter value
winfredLIN Sep 29, 2024
5ad36b1
test: modify some test case that can not execute in fact
winfredLIN Oct 8, 2024
1ab78e0
test: exchange expected and actual results
winfredLIN Oct 8, 2024
ebb4809
modify: end pos is after the end of delimiter
winfredLIN Oct 8, 2024
19a9ed6
modify: jumping out of the beginning End statement block
winfredLIN Oct 8, 2024
5c85cef
modify: use parse one statement instead
winfredLIN Oct 8, 2024
eb80ec2
rename: matche -> match
winfredLIN Oct 8, 2024
6e5d77d
modify: correct words and resolve error
winfredLIN Oct 10, 2024
18133dc
modify: process sql totally in splitSqlText
winfredLIN Oct 10, 2024
5ef7b15
modify: splitSqlText will not return delimiter command so modify unit…
winfredLIN Oct 10, 2024
e14f526
modify: cancel use of formatted sql and format sql in origin sql
winfredLIN Oct 29, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -185,7 +185,7 @@ replace (
cloud.google.com/go/compute/metadata => cloud.google.com/go/compute/metadata v0.1.0
github.com/labstack/echo/v4 => github.com/labstack/echo/v4 v4.6.1
github.com/pingcap/log => github.com/pingcap/log v0.0.0-20191012051959-b742a5d432e9
github.com/pingcap/parser => github.com/sjjian/parser v0.0.0-20240305095250-688ad439ef31
github.com/pingcap/parser => github.com/sjjian/parser v0.0.0-20240704052347-b6199b7bccae
github.com/swaggo/swag => github.com/swaggo/swag v1.6.7
google.golang.org/grpc => google.golang.org/grpc v1.29.0
)
4 changes: 2 additions & 2 deletions go.sum
Original file line number Diff line number Diff line change
Expand Up @@ -795,8 +795,8 @@ github.com/sirupsen/logrus v1.6.0/go.mod h1:7uNnSEd1DgxDLC74fIahvMZmmYsHGZGEOFrf
github.com/sirupsen/logrus v1.7.0/go.mod h1:yWOB1SBYBC5VeMP7gHvWumXLIWorT60ONWic61uBYv0=
github.com/sirupsen/logrus v1.9.0 h1:trlNQbNUG3OdDrDil03MCb1H2o9nJ1x4/5LYw7byDE0=
github.com/sirupsen/logrus v1.9.0/go.mod h1:naHLuLoDiP4jHNo9R0sCBMtWGeIprob74mVsIT4qYEQ=
github.com/sjjian/parser v0.0.0-20240305095250-688ad439ef31 h1:E3JSX1FjUlg8ep8XQlLkdTFbZhM4tmecpdfZhUuubLs=
github.com/sjjian/parser v0.0.0-20240305095250-688ad439ef31/go.mod h1:Qq2tnreUXwVo7NAKAHmbWFsgqpDUkxwhJCClY+ZCudA=
github.com/sjjian/parser v0.0.0-20240704052347-b6199b7bccae h1:rTwogc7Uq0/7zMHNhpQTqjtGKdccVh8Z8vHReIlsXG0=
github.com/sjjian/parser v0.0.0-20240704052347-b6199b7bccae/go.mod h1:Qq2tnreUXwVo7NAKAHmbWFsgqpDUkxwhJCClY+ZCudA=
github.com/skeema/knownhosts v1.2.0 h1:h9r9cf0+u7wSE+M183ZtMGgOJKiL96brpaz5ekfJCpM=
github.com/skeema/knownhosts v1.2.0/go.mod h1:g4fPeYpque7P0xefxtGzV81ihjC8sX2IqpAoNkjxbMo=
github.com/smartystreets/assertions v0.0.0-20180927180507-b2de0cb4f26d/go.mod h1:OnSkiWE9lh6wB0YB77sQom3nweQdgAjqCqsofrRNTgc=
Expand Down
80 changes: 80 additions & 0 deletions sqle/driver/mysql/splitter/block.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
package splitter

import (
"github.com/pingcap/parser"
"strings"
)

type Block interface {
MatchBegin(token *parser.Token) bool
MatchEnd(token *parser.Token) bool
}

var allBlocks []Block = []Block{
BeginEndBlock{},
IfEndIfBlock{},
CaseEndCaseBlock{},
RepeatEndRepeatBlock{},
WhileEndWhileBlock{},
LoopEndLoopBlock{},
}

type LoopEndLoopBlock struct{}

func (b BeginEndBlock) MatchBegin(token *parser.Token) bool {
return token.TokenType() == parser.Begin
}

func (b BeginEndBlock) MatchEnd(token *parser.Token) bool {
return true
}

type IfEndIfBlock struct{}

func (b IfEndIfBlock) MatchBegin(token *parser.Token) bool {
return token.TokenType() == parser.IfKwd
}

func (b IfEndIfBlock) MatchEnd(token *parser.Token) bool {
return token.TokenType() == parser.IfKwd
}

type CaseEndCaseBlock struct{}

func (b CaseEndCaseBlock) MatchBegin(token *parser.Token) bool {
return token.TokenType() == parser.CaseKwd
}

func (b CaseEndCaseBlock) MatchEnd(token *parser.Token) bool {
return token.TokenType() == parser.CaseKwd
}

type RepeatEndRepeatBlock struct{}

func (b RepeatEndRepeatBlock) MatchBegin(token *parser.Token) bool {
return token.TokenType() == parser.Repeat
}

func (b RepeatEndRepeatBlock) MatchEnd(token *parser.Token) bool {
return token.TokenType() == parser.Repeat
}

type WhileEndWhileBlock struct{}

func (b WhileEndWhileBlock) MatchBegin(token *parser.Token) bool {
return token.TokenType() == parser.Identifier && strings.ToUpper(token.Ident()) == "WHILE"
}

func (b WhileEndWhileBlock) MatchEnd(token *parser.Token) bool {
return token.TokenType() == parser.Identifier && strings.ToUpper(token.Ident()) == "WHILE"
}

type BeginEndBlock struct{}

func (b LoopEndLoopBlock) MatchBegin(token *parser.Token) bool {
return token.TokenType() == parser.Identifier && strings.ToUpper(token.Ident()) == "LOOP"
}

func (b LoopEndLoopBlock) MatchEnd(token *parser.Token) bool {
return token.TokenType() == parser.Identifier && strings.ToUpper(token.Ident()) == "LOOP"
}
138 changes: 138 additions & 0 deletions sqle/driver/mysql/splitter/delimiter.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,138 @@
package splitter

import (
"errors"
"github.com/pingcap/parser"
"strings"
)

const (
BackSlash int = '\\'
BackSlashString string = "\\"
BlankSpace string = " "
DefaultDelimiterString string = ";"
DelimiterCommand string = "DELIMITER"
DelimiterCommandSort string = `\d`
)

type Delimiter struct {
FirstTokenTypeOfDelimiter int
FirstTokenValueOfDelimiter string
DelimiterStr string
line int
startPos int
}

func NewDelimiter() *Delimiter {
return &Delimiter{}
}

// \\d会被识别为三个token \ \ d 不能使用Lex,Lex可能会跳过空格和注释,因此这里使用字符串匹配
func (d *Delimiter) isSortDelimiterCommand(sql string, index int) bool {
return index+2 < len(sql) && sql[index+1] == 'd'
}
winfredLIN marked this conversation as resolved.
Show resolved Hide resolved

// DELIMITER会被识别为identifier,因此这里仅需识别其值是否相等
func (d *Delimiter) isDelimiterCommand(token string) bool {
return strings.ToUpper(token) == DelimiterCommand
}

// 该函数翻译自MySQL Client获取delimiter值的代码,参考:https://github.com/mysql/mysql-server/blob/824e2b4064053f7daf17d7f3f84b7a3ed92e5fb4/client/mysql.cc#L4866
func getDelimiter(line string) string {
winfredLIN marked this conversation as resolved.
Show resolved Hide resolved
ptr := 0
start := 0
quoted := false
qtype := byte(0)
winfredLIN marked this conversation as resolved.
Show resolved Hide resolved

// 跳过开头的空格
for ptr < len(line) && isSpace(line[ptr]) {
ptr++
}

if ptr == len(line) {
return ""
}

// 检查是否为引号字符串
if line[ptr] == '\'' || line[ptr] == '"' || line[ptr] == '`' {
qtype = line[ptr]
quoted = true
ptr++
}

start = ptr

// 找到字符串结尾
for ptr < len(line) {
if !quoted && line[ptr] == '\\' && ptr+1 < len(line) { // 跳过转义字符
ptr += 2
} else if (!quoted && isSpace(line[ptr])) || (quoted && line[ptr] == qtype) {
break
} else {
ptr++
}
}

return line[start:ptr]
}

// 辅助函数,判断字符是否为空格
func isSpace(c byte) bool {
winfredLIN marked this conversation as resolved.
Show resolved Hide resolved
return c == ' ' || c == '\t' || c == '\n' || c == '\r'
}

var ErrDelimiterCanNotExtractToken = errors.New("sorry, we cannot extract any token form the delimiter you provide, please change a delimiter")
var ErrDelimiterContainsBackslash = errors.New("DELIMITER cannot contain a backslash character")
var ErrDelimiterContainsBlankSpace = errors.New("DELIMITER should not contain blank space")
var ErrDelimiterMissing = errors.New("DELIMITER must be followed by a 'delimiter' character or string")
var ErrDelimiterReservedKeyword = errors.New("delimiter should not use a reserved keyword")

/*
该方法设置分隔符,对分隔符的内容有一定的限制:

1. 不允许分隔符内部包含反斜杠
2. 不允许分隔符为空字符串
3. 不允许分隔符为mysql的保留字,因为这样会被scanner扫描为其他类型的token,从而绕过判断分隔符的逻辑

注:其中1和2与MySQL客户端对分隔符内容一致,错误内容参考MySQL客户端源码中的com_delimiter函数
https://github.com/mysql/mysql-server/blob/824e2b4064053f7daf17d7f3f84b7a3ed92e5fb4/client/mysql.cc#L4621
*/
func (d *Delimiter) setDelimiter(delimiter string) (err error) {
if delimiter == "" {
return ErrDelimiterMissing
}
if strings.Contains(delimiter, BackSlashString) {
return ErrDelimiterContainsBackslash
}
if strings.Contains(delimiter, BlankSpace) {
return ErrDelimiterContainsBlankSpace
}
if isReservedKeyWord(delimiter) {
return ErrDelimiterReservedKeyword
}
token := parser.NewScanner(delimiter).NextToken()
d.FirstTokenTypeOfDelimiter = token.TokenType()
if d.FirstTokenTypeOfDelimiter == 0 {
return ErrDelimiterCanNotExtractToken
}
d.FirstTokenValueOfDelimiter = token.Ident()
d.DelimiterStr = delimiter
return nil
}

func isReservedKeyWord(input string) bool {
token := parser.NewScanner(input).NextToken()
tokenType := token.TokenType()
if len(token.Ident()) < len(input) {
// 如果分隔符无法识别为一个token,则一定不是关键字
return false
}
// 如果分隔符识别为一个关键字,但不知道是哪个关键字,则为identifier,此时就非保留字
return tokenType != parser.Identifier && tokenType > parser.YyEOFCode && tokenType < parser.YyDefault
}

func (d *Delimiter) reset() error {
d.line = 0
d.startPos = 0
return d.setDelimiter(DefaultDelimiterString)
}
Loading
Loading