Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Typst parser #718

Open
wants to merge 231 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
231 commits
Select commit Hold shift + click to select a range
6c615d1
base commit
uben0 Jul 31, 2023
ac629ea
add content block
uben0 Aug 1, 2023
be234b2
code factorisation
uben0 Aug 1, 2023
662558c
add named tuple, list and test for calls
uben0 Aug 1, 2023
57f7b0b
if else
uben0 Aug 1, 2023
7227761
strong mode
uben0 Aug 1, 2023
8f4822a
emph mode
uben0 Aug 1, 2023
a94f059
init to any
uben0 Aug 1, 2023
3b4ed45
code factorisation
uben0 Aug 1, 2023
a398ae6
code factorisation
uben0 Aug 1, 2023
f2a1262
statements
uben0 Aug 2, 2023
dd0d06f
blocks
uben0 Aug 2, 2023
a84ba8c
add numbers
uben0 Aug 2, 2023
d034c3d
add dot notation
uben0 Aug 2, 2023
4ad9895
add dot notation
uben0 Aug 2, 2023
daa3340
fixed strong/emph modes
uben0 Aug 2, 2023
8bb7e3e
code factorisation, better par_break, escape char
uben0 Aug 3, 2023
ae056e0
comment, string, import
uben0 Aug 3, 2023
8d164a1
fix list
uben0 Aug 3, 2023
b6d50c6
fix white space in list
uben0 Aug 3, 2023
4847969
add sub plus unary
uben0 Aug 3, 2023
265edb5
add operators
uben0 Aug 3, 2023
41cfb98
multiline vs singleline
uben0 Aug 5, 2023
d6e505a
better white space management
uben0 Aug 5, 2023
bdd3ba2
fix termination item vs instr
uben0 Aug 6, 2023
d23bb46
fix instr space termination
uben0 Aug 6, 2023
0772000
fix comment as extra
uben0 Aug 6, 2023
38e887f
big jump, comments postponed, raw block
uben0 Aug 10, 2023
40100dc
math fence, builtin
uben0 Aug 10, 2023
5b607db
save before _element refactor
uben0 Aug 11, 2023
36fc3d5
preparing for publication
uben0 Aug 11, 2023
9b873e9
comment in text mode
uben0 Aug 11, 2023
b397129
show, unicode letters, builtins
uben0 Aug 11, 2023
aa198f3
starting removing of recursive rules
uben0 Aug 16, 2023
3618165
gitignore
uben0 Aug 16, 2023
1b8fe23
remove old corpus
uben0 Aug 16, 2023
cb3c715
makefile
uben0 Aug 16, 2023
dcd35ba
refactor base
uben0 Aug 16, 2023
ccc1b8e
refactor step 2
uben0 Aug 19, 2023
c24bd57
refactor step 3
uben0 Aug 19, 2023
5535372
stuck on strange token char
uben0 Aug 19, 2023
2f2370f
refactor step 4
uben0 Aug 19, 2023
a92f6c6
managed to integrate heading
uben0 Aug 19, 2023
b4e4c21
refactor done, except single line expr
uben0 Aug 19, 2023
519ecf4
need solution about expr with single line and comments
uben0 Aug 20, 2023
2fca8ae
readme update
uben0 Aug 20, 2023
60320cb
single line vs multi line
uben0 Aug 21, 2023
caf7393
comment before operators
uben0 Aug 21, 2023
e865389
assign operator added
uben0 Aug 21, 2023
a44f8b5
url and some more tests
uben0 Aug 21, 2023
7820ffd
clean up
uben0 Aug 21, 2023
fe07baa
add continue and break, plus return inlined, plus highlight for helix…
uben0 Aug 21, 2023
6646528
indentation with lists
uben0 Aug 24, 2023
3ff9c54
readme update
uben0 Aug 24, 2023
2d2fbc1
fix indent dedent issues
uben0 Aug 24, 2023
57454fd
fix indent dedent test 160 and 161 with redent
uben0 Aug 24, 2023
137184f
full proof indentation and container transition thanks to external sc…
uben0 Aug 25, 2023
1495f7b
readme update
uben0 Aug 25, 2023
ecf5523
term
uben0 Aug 25, 2023
217253e
fix false flag heading
uben0 Aug 25, 2023
af5e45d
label and ref
uben0 Aug 25, 2023
a3049b2
symbol
uben0 Aug 25, 2023
41a945d
start math
uben0 Aug 25, 2023
9eaff12
math progress
uben0 Aug 25, 2023
5fdb5cf
math progress
uben0 Aug 25, 2023
6023258
math vs item call
uben0 Aug 25, 2023
ef278c6
math update, more builtin
uben0 Aug 26, 2023
1bf47a2
identifying remaining problems
uben0 Aug 27, 2023
6c91ccc
add license MIT
uben0 Aug 27, 2023
6f6b307
math, presentation for public visibility
uben0 Aug 27, 2023
4c2b319
readme update
uben0 Aug 27, 2023
a116e28
white space operator strategy
uben0 Aug 27, 2023
3990cbd
white space operator strategy update
uben0 Aug 27, 2023
5b55c28
white space operator fixes not_in op
uben0 Aug 27, 2023
34fa51f
math spaces, plus import precedence
uben0 Aug 28, 2023
29ea450
math ident without _
uben0 Aug 28, 2023
e6e982e
fix url with proper tokenization
uben0 Aug 29, 2023
b74d1e2
discover bug with heading and list
uben0 Aug 29, 2023
80d8d79
fixing line starting markup inside markup
uben0 Aug 30, 2023
4f986cf
strictier content rule
uben0 Aug 30, 2023
dfd2c4a
fix indentation at markup termination
uben0 Aug 31, 2023
35117a7
better Makefile
uben0 Aug 31, 2023
4c175e9
unicode database
uben0 Aug 31, 2023
6c201a5
all unicode white spaces
uben0 Aug 31, 2023
dec7ccf
fix anti markup and math symbols
uben0 Aug 31, 2023
d4052c6
fix unrecognized leading space inside markup
uben0 Sep 1, 2023
b4e4794
math lacks comas and semi colon syntax
uben0 Sep 1, 2023
59ceda6
math comas and semi colons plus tagged parameter
uben0 Sep 2, 2023
3b14de2
rename nodes, math prime, more builtin, more tests
uben0 Sep 3, 2023
7a18794
doc for helix editor installation
uben0 Sep 3, 2023
697123d
doc for helix editor installation
uben0 Sep 3, 2023
f504100
bug with field operator, should accept spaces
uben0 Sep 3, 2023
48a59bc
start better space management
uben0 Sep 4, 2023
dc9cc83
start better space management
uben0 Sep 4, 2023
11da32e
dot notation with WS in expr context
uben0 Sep 4, 2023
dd3aa5a
atom
uben0 Sep 4, 2023
bc0c9e1
document bug with spaces in method
uben0 Sep 4, 2023
94487d3
readme update
uben0 Sep 5, 2023
a1b1166
measuring parser size
uben0 Sep 5, 2023
5c73f9b
20% parser size reduction
uben0 Sep 6, 2023
1df5247
scanner cleaner code
uben0 Sep 6, 2023
12f3d8b
tiny parser size
uben0 Sep 6, 2023
7d32ae3
-60% parser size, mandatory code end
uben0 Sep 6, 2023
d391a04
readme update
uben0 Sep 6, 2023
25f9922
math ident in external scanner
uben0 Sep 7, 2023
bfe4387
fix spaces and comment in dot notation
uben0 Sep 7, 2023
926a400
fix bug with dot token plus optimization of blocks
uben0 Sep 7, 2023
a56583e
fix segfault, found bug with token else
uben0 Sep 7, 2023
38c810c
added raw spans and blocks as literals
uben0 Sep 7, 2023
bcfed94
add builtins
uben0 Sep 7, 2023
a6ff802
add builtins
uben0 Sep 7, 2023
1eaac4f
add builtins and parser size
uben0 Sep 7, 2023
1e85801
fix builtin precedence
uben0 Sep 7, 2023
09c644f
fix builtin full word
uben0 Sep 7, 2023
e59042c
start better else detection
uben0 Sep 8, 2023
86072dd
step in better else detection
uben0 Sep 9, 2023
dfedcee
fix ws prec problem with branch
uben0 Sep 9, 2023
bec1c20
fix else detection
uben0 Sep 9, 2023
21ae68f
readme update and found missing feature
uben0 Sep 9, 2023
49e0da7
fix set statement condition
uben0 Sep 9, 2023
bb9186f
test
uben0 Sep 9, 2023
dbeacf2
fix import precedence
uben0 Sep 9, 2023
e1b620c
more test, plus number unit
uben0 Sep 9, 2023
33291f7
more tests, rename, cleaner code
uben0 Sep 9, 2023
c4e23f7
fix math shorthand precedence
uben0 Sep 9, 2023
a4ba8d4
math rename and removal of inconsistant operators
uben0 Sep 9, 2023
e572633
cleaner code
uben0 Sep 9, 2023
4a656b2
cleaner code
uben0 Sep 9, 2023
9c38731
readme update
uben0 Sep 9, 2023
e020627
cleaner code
uben0 Sep 10, 2023
c6a23cf
fix math group termination
uben0 Sep 10, 2023
e1746a2
fix single line comment absorbing line break
uben0 Sep 10, 2023
9a365cd
more tests, found indentation bug
uben0 Sep 10, 2023
9e8aa7e
preparing indentation machanism change
uben0 Sep 10, 2023
7686406
preparing indentation machanism change
uben0 Sep 10, 2023
ef3b80e
fix indentation
uben0 Sep 10, 2023
cd7731f
readme update
uben0 Sep 10, 2023
6a97e5f
more tests, small edits
uben0 Sep 10, 2023
43ad964
readme update
uben0 Sep 10, 2023
9c52e75
highlight update
uben0 Sep 10, 2023
bb86a09
fix bad else detection when follow by comment
uben0 Sep 11, 2023
71ef3e7
readme update
uben0 Sep 12, 2023
576e8b5
readme update
uben0 Sep 12, 2023
c07fd17
fix math item, tweak highlight, readme update
uben0 Sep 12, 2023
615572e
comment parsed in external scanner
uben0 Sep 13, 2023
302f124
space parsed in external scanner
uben0 Sep 13, 2023
3076c9d
working extras
uben0 Sep 14, 2023
c19be8e
starting to remove explicit space and comment
uben0 Sep 14, 2023
b46a0a0
starting to remove line break as well
uben0 Sep 14, 2023
8aa1e6f
extras's upgrade finished
uben0 Sep 14, 2023
0f0c533
optimization and cleaner code
uben0 Sep 15, 2023
91f17c4
huge optimization
uben0 Sep 15, 2023
0d892a9
cleaner code
uben0 Sep 15, 2023
142b2d6
math node formula, highlight
uben0 Sep 16, 2023
1b1b2cf
cleaner code
uben0 Sep 16, 2023
367e8d2
cleaner code, fix if-else end bug
uben0 Sep 17, 2023
0a61618
add wildcard in import
uben0 Sep 17, 2023
da204a9
fix math fraction, fix math as value
uben0 Sep 17, 2023
b8f3ac3
highlight update
uben0 Sep 17, 2023
82a95c2
fix url accepting space
uben0 Sep 18, 2023
d3e7991
added generated source for external importations
uben0 Sep 18, 2023
ea9a2aa
fix bad named grammar
uben0 Sep 18, 2023
b9c9ce5
remove dead code
uben0 Sep 18, 2023
24933b0
readme update and doc
uben0 Sep 18, 2023
95161e7
readme update and doc
uben0 Sep 18, 2023
7a2bb24
found segfault
uben0 Sep 18, 2023
a942b91
fix missing builtin plus better highlight of item and term
uben0 Sep 29, 2023
8e00691
added heading marker level distinction
uben0 Sep 29, 2023
de668c7
added builtin colbreak
uben0 Oct 1, 2023
791cac4
added builtin place
uben0 Oct 1, 2023
e6245d1
fixed inlined field with underscore
uben0 Oct 18, 2023
90a99de
fixed inexistant `angle` builtin
uben0 Oct 18, 2023
17cf2a8
fixed bad assert macro
uben0 Oct 20, 2023
80166f0
added dict litteral
uben0 Oct 22, 2023
e35aa22
fixed float literal
uben0 Oct 22, 2023
aca445a
fix: ponctuation -> punctuation
AlexanderBrevig Oct 24, 2023
3896773
fix installation instructions
uben0 Oct 31, 2023
35221b6
fix installation instructions
uben0 Oct 31, 2023
9cb47db
fix installation instructions
uben0 Oct 31, 2023
b7a745a
Merge pull request #4 from AlexanderBrevig/patch-1
uben0 Oct 31, 2023
8be2b20
fix installation instructions
uben0 Oct 31, 2023
da13e4f
fix missing cvs builtin and typst 0.9 with oklab builtin
uben0 Nov 3, 2023
a96fa35
readme update
uben0 Nov 4, 2023
60843a0
fix comment in raw
uben0 Nov 5, 2023
e1f00fb
more tests, brackets in math
uben0 Nov 6, 2023
020e687
fix label colon
uben0 Nov 7, 2023
5d698b6
doc(README): add Emacs support
Ziqi-Yang Nov 7, 2023
264fefb
heading distinction
uben0 Nov 11, 2023
2ece57a
readme update
uben0 Nov 12, 2023
e33ad5e
helix config pointing at latest revision on master
uben0 Nov 12, 2023
6099c23
add builtin pad
uben0 Nov 12, 2023
d372eb2
add builtin pad
uben0 Nov 12, 2023
86763f3
readme update
uben0 Nov 12, 2023
0f0abe6
fix: lable + reference syntax (TOKEN_LABEL)
Ziqi-Yang Nov 26, 2023
126f7b8
Merge pull request #13 from Ziqi-Yang/fix/label_reference
uben0 Nov 26, 2023
1651d6e
label test
uben0 Nov 26, 2023
39c5976
found invalid ref
uben0 Nov 28, 2023
c0765e3
update 0.10
uben0 Dec 23, 2023
802c9c6
fix square brackets, fix labels
uben0 Dec 23, 2023
a94d63b
readme update
uben0 Dec 27, 2023
d7e552f
foldable sections added!
uben0 Dec 30, 2023
77e2318
readme update
uben0 Dec 30, 2023
d5aca73
test name more consistant
uben0 Dec 30, 2023
e0f9ab5
fix helix config missing `roots`
uben0 Jan 3, 2024
f3b362d
section hierachy changed
uben0 Jan 5, 2024
ecf8596
fix plain text square bracket indentation
uben0 Jan 13, 2024
9ca6db5
disabled debug mode by default
uben0 Jan 23, 2024
d6e06e1
Correct cargo install command
nuke-web3 Jan 25, 2024
57068e6
fix typst-lsp installation command in readme
uben0 Jan 26, 2024
e8ff960
updated project structure for better neovim support
uben0 Jan 26, 2024
b330529
Update README.md
uben0 Feb 3, 2024
e3fd020
fix new hard termination on linebreak
uben0 Feb 3, 2024
2d68228
Merge remote-tracking branch 'refs/remotes/origin/master'
uben0 Feb 3, 2024
f70f88b
added lib.rs
uben0 Feb 5, 2024
6241b94
fix if/else on iterative parsing
uben0 Feb 6, 2024
3a6c81b
fix if/else on iterative parsing with comments
uben0 Feb 6, 2024
c757be0
Update README.md
uben0 Feb 21, 2024
244c53f
dictionary key not classified as builtin
uben0 Feb 27, 2024
baddc32
revert last fix, added tests
uben0 Feb 27, 2024
3c3e5f8
removed builtin detection
uben0 Feb 28, 2024
11e9d28
tree-sitter update, README update
uben0 Mar 19, 2024
f457c77
added context
uben0 Mar 19, 2024
67db220
remove all syscall when not in debug mode
uben0 Apr 5, 2024
1509e0f
no line comment in block comment
uben0 Apr 5, 2024
1ad9724
readme update
uben0 Apr 5, 2024
13863dd
context precedence, more tests
uben0 Apr 9, 2024
4610172
fix math shorthand
uben0 Apr 22, 2024
b19e640
fix raw block with many backticks
uben0 May 5, 2024
3924cb9
embeded code now have its own node
uben0 May 9, 2024
b442c53
Add 'vendored_parsers/tree-sitter-typst/' from commit '3924cb9ed9e0e6…
arbrauns May 17, 2024
19230a4
Add typst parser
arbrauns May 17, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions build.rs
Original file line number Diff line number Diff line change
Expand Up @@ -358,6 +358,11 @@ fn main() {
src_dir: "vendored_parsers/tree-sitter-typescript-src/typescript/src",
extra_files: vec!["scanner.c"],
},
TreeSitterParser {
name: "tree-sitter-typst",
src_dir: "vendored_parsers/tree-sitter-typst-src",
extra_files: vec!["scanner.c"],
},
TreeSitterParser {
name: "tree-sitter-vhdl",
src_dir: "vendored_parsers/tree-sitter-vhdl-src",
Expand Down
3 changes: 3 additions & 0 deletions sample_files/compare.expected
Original file line number Diff line number Diff line change
Expand Up @@ -277,6 +277,9 @@ fee7ee33d2037ad1941ba6bb5532a1db -
sample_files/typing_1.ml sample_files/typing_2.ml
36161bd77a8c86643bc90656ec41c92c -

sample_files/typst_1.typ sample_files/typst_2.typ
1eb9abd1d35daaaaf12f4cc964b831ff -

sample_files/utf16_1.py sample_files/utf16_2.py
39014a682ed2318f980c7ea4177cf659 -

Expand Down
25 changes: 25 additions & 0 deletions sample_files/typst_1.typ
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
#set page(width: 10cm, height: auto)
#set heading(numbering: "1.")

= Fibonacci sequence
The Fibonacci sequence is defined through the
recurrence relation $F_n = F_(n-1) + F_(n-2)$.
It can also be expressed in _closed form:_

$ F_n = round(1 / sqrt(5) phi.alt^n), quad
phi.alt = (1 + sqrt(5)) / 2 $

#let count = 8
#let nums = range(1, count + 1)
#let fib(n) = (
if n <= 2 { 1 }
else { fib(n - 1) + fib(n - 2) }
)

The first #count numbers of the sequence are:

#align(center, table(
columns: count,
..nums.map(n => $F_#n$),
..nums.map(n => str(fib(n))),
))
25 changes: 25 additions & 0 deletions sample_files/typst_2.typ
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
#set page(width: 10cm, height: auto)
#set heading(numbering: "1.")

= Fibonacci sequence
The Fibonacci sequence is defined through the
recurrence relation $F_n = F_(n-1) + F_(n-2)$.
It can also be expressed in _closed form:_

$ F_n = round(1 / sqrt(5) phi.alt^n), quad
phi.alt = (1 + sqrt(5)) / 2 $

#let count = 8
#let fib(n) = (
if n <= 2 { 1 }
else { fib(n - 1) + fib(n - 2) }
)

The first #count numbers of the sequence are:

#let function_table(f, nums) = align(center, table(
columns: nums.len(),
..nums.map(n => $F_#n$),
..nums.map(n => str(f(n))),
))
#function_table(fib, range(1, count+1))
4 changes: 4 additions & 0 deletions src/parse/guess_language.rs
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,7 @@ pub(crate) enum Language {
Toml,
TypeScript,
TypeScriptTsx,
Typst,
Vhdl,
Xml,
Yaml,
Expand Down Expand Up @@ -170,6 +171,7 @@ pub(crate) fn language_name(language: Language) -> &'static str {
Toml => "TOML",
TypeScript => "TypeScript",
TypeScriptTsx => "TypeScript TSX",
Typst => "Typst",
Vhdl => "VHDL",
Xml => "XML",
Yaml => "YAML",
Expand Down Expand Up @@ -368,6 +370,7 @@ pub(crate) fn language_globs(language: Language) -> Vec<glob::Pattern> {
],
TypeScript => &["*.ts"],
TypeScriptTsx => &["*.tsx"],
Typst => &["*.typ"],
Vhdl => &["*.vhdl", "*.vhd"],
Xml => &[
"*.ant",
Expand Down Expand Up @@ -540,6 +543,7 @@ fn from_emacs_mode_header(src: &str) -> Option<Language> {
"toml" => Some(Toml),
"tuareg" => Some(OCaml),
"typescript" => Some(TypeScript),
"typst" => Some(Typst),
"vhdl" => Some(Vhdl),
"yaml" => Some(Yaml),
"zig" => Some(Zig),
Expand Down
15 changes: 15 additions & 0 deletions src/parse/tree_sitter_parser.rs
Original file line number Diff line number Diff line change
Expand Up @@ -118,6 +118,7 @@ extern "C" {
fn tree_sitter_toml() -> ts::Language;
fn tree_sitter_tsx() -> ts::Language;
fn tree_sitter_typescript() -> ts::Language;
fn tree_sitter_typst() -> ts::Language;
fn tree_sitter_vhdl() -> ts::Language;
fn tree_sitter_xml() -> ts::Language;
fn tree_sitter_yaml() -> ts::Language;
Expand Down Expand Up @@ -1133,6 +1134,20 @@ pub(crate) fn from_language(language: guess::Language) -> TreeSitterConfig {
sub_languages: vec![],
}
}
Typst => {
let language = unsafe { tree_sitter_typst() };
TreeSitterConfig {
language,
atom_nodes: vec!["string"].into_iter().collect(),
delimiter_tokens: vec![("{", "}"), ("(", ")"), ("[", "]")],
highlight_query: ts::Query::new(
language,
include_str!("../../vendored_parsers/highlights/typst.scm"),
)
.unwrap(),
sub_languages: vec![],
}
}
Xml => {
let language = unsafe { tree_sitter_xml() };
TreeSitterConfig {
Expand Down
1 change: 1 addition & 0 deletions vendored_parsers/highlights/typst.scm
1 change: 1 addition & 0 deletions vendored_parsers/tree-sitter-typst-src
8 changes: 8 additions & 0 deletions vendored_parsers/tree-sitter-typst/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
log.html
node_modules
package-lock.json
target
# src/grammar.json
# src/node-types.json
# src/tree_sitter/parser.h
# src/parser.c
26 changes: 26 additions & 0 deletions vendored_parsers/tree-sitter-typst/Cargo.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
[package]
name = "tree-sitter-typst"
description = "typst grammar for the tree-sitter parsing library"
version = "0.0.1"
keywords = ["incremental", "parsing", "typst"]
categories = ["parsing", "text-editors"]
repository = "https://github.com/uben0/tree-sitter-typst"
edition = "2018"
license = "MIT"

build = "bindings/rust/build.rs"
include = [
"bindings/rust/*",
"grammar.js",
"queries/*",
"src/*",
]

[lib]
path = "bindings/rust/lib.rs"

[dependencies]
tree-sitter = "~0.20.10"

[build-dependencies]
cc = "1.0"
93 changes: 93 additions & 0 deletions vendored_parsers/tree-sitter-typst/DOC.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
# Implementation documentation

## Tricky Typst

List of tests exposing particular behaviors from Typst

- Test `positive/320`: indent takes in account comments, but redent is done at item, term or heading token.
- Tests `positive/328` and `positive/329`: math function don't work with symbol.
- Test `positive/330`: Comment block don't need closing end.

## FIXME

- [ ] Test `fixme/011`: Embedded code causes a segfault
- With new version of Tree Sitter or Helix, this bug seems to be fixed
- [X] ~Test `fixme/014`: Right square bracket exits from all containers~
- [X] ~Test `fixme/012`: Matching square brackets in text are paired~
- [X] ~Test `fixme/013`: Point ending a ref is not part of the ref~
- [X] ~Test `fixme/010`: Math shorthand and letter can applied~
- [X] ~Test `fixme/009`: Indentation and comments~
- [X] ~Test `fixme/001`: Group termination in math~
- [X] ~Test `fixme/002`: Import precedence over list~
- [X] ~Test `fixme/008`: Condition if set statement~
- [X] ~Test `fixme/007`: Trailing comments before `else`~
- [X] ~Test `fixme/003`: Spaces in method notation~
- [X] ~Test `fixme/004`: Leading space not recognized~
- [X] ~Test `fixme/005`: Inlined code absorbs new line~

Failing tests are found in [`corpus/fixme.scm`](https://github.com/uben0/tree-sitter-typst/blob/master/corpus/fixme.scm).

## Optimization with extras

When searching ways to optimize the parser and simplify the grammar, I thought about using the *extras* feature for spaces and comments (and line breaks as well). At the end, it significantly reduced parser size. The only problem arises with function calls and, in inline code, field access. They must be directly joined (no space nor comment in between). The use of the *immediate* feature won't solve the problem as it only takes in acount inline regex (which would be ok with spaces but not comments, as they have to appear in output tree).

The solution is to rely on external scanner when parsing spaces or comments. Lets call a "pre-immediate" token, a token susceptible to be followed by immediate token. When a pre-immediate token is parsed, it sets a flag to `true`, and when a space or comment is parsed, it resets the flag to `false` (this flag is stored in scanner's state as a boolean).

This way when a token has to be immediate, an external token can be required and will only match if flag is `true`. It means, any pre-immediate token have to be preceded by a token that will set to `true` the flag.

# Scanner

## Containers

The scanner uses a stack of "containers" as internal states in order to simplify the parsing of some nodes like `emph`, `strong` and `content`. At any moment, the scanner knows what are the containing nodes. For instance:

```typst
* _ #[Hello *World*] _ *
```

Here, when the scanner will be called at the beginning of "World", the container stack will be `stron/emph/content/strong`. When a '*' is encountered while the top of the stack is `strong`, it will be interpreted as the end of this container. If the top of the stack was `emph`, it would be the start of a `strong` container, and it would be pushed on the stack.

It is mandatory to use the external scanner for the containers because of indentation.

## Indentation

Three distinct token are dedicated to indentation:

- `indent`: When the following line start further than the current one
- `dedent`: When the following line start at a column previously set as indentation column
- `redent`: When the following line start at the same column as current one, or before, but further than the previous indentation.

The concept of redent can be seen here:
```typst
- 1
- 2
- 3
- 4
```
Which have the same hierarchy as:
```typst
- 1
- 2
- 3
- 4
```

## Character class

Five character classes are defined in the external scanner:

- space (`is_sp`)
- line break (`is_lb`)
- xid start (`is_id_start`)
- xid continue (`is_id_continue`)
- word part (`is_word_part`)

The three functions `is_id_start`, `is_id_continue` and `is_word_part` are implemented as binary search.

The character list is based on the Unicode database which can be found here: https://www.unicode.org/Public/UCD/latest/ucd/

A utility is used to produce those tables: https://github.com/uben0/unicode-table

## Barrier

The `heading`, `item` and `term` nodes are technically delimited contexts, but because thay behave similarly, they have the same external token as container, which is `barrier`.
21 changes: 21 additions & 0 deletions vendored_parsers/tree-sitter-typst/LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
MIT License

Copyright (c) 2023 Gerbais-Nief Eddie

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
19 changes: 19 additions & 0 deletions vendored_parsers/tree-sitter-typst/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
test: src/parser.c
tree-sitter test -f Test

all: src/parser.c
tree-sitter test

fixme: src/parser.c
tree-sitter test -f Fixme

src/parser.c: grammar.js
tree-sitter generate

build: src/parser.c size

size: src/parser.c
du -b src/parser.c
du -b src/parser.c > size.txt

.PHONY: fixme test build all size
48 changes: 48 additions & 0 deletions vendored_parsers/tree-sitter-typst/Package.swift
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
// swift-tools-version:5.3
import PackageDescription

let package = Package(
name: "TreeSitterTypst",
platforms: [.macOS(.v10_13), .iOS(.v11)],
products: [
.library(name: "TreeSitterTypst", targets: ["TreeSitterTypst"]),
],
dependencies: [],
targets: [
.target(name: "TreeSitterTypst",
path: ".",
exclude: [
"Cargo.toml",
"Makefile",
"binding.gyp",
"bindings/c",
"bindings/go",
"bindings/node",
"bindings/python",
"bindings/rust",
"prebuilds",
"grammar.js",
"package.json",
"package-lock.json",
"pyproject.toml",
"setup.py",
"test",
"examples",
".editorconfig",
".github",
".gitignore",
".gitattributes",
".gitmodules",
],
sources: [
"src/parser.c",
// NOTE: if your language has an external scanner, add it here.
],
resources: [
.copy("queries")
],
publicHeadersPath: "bindings/swift",
cSettings: [.headerSearchPath("src")])
],
cLanguageStandard: .c11
)
Loading
Loading