[C Family] Full rewrite of C family syntaxes #4147

camila314 · 2025-02-07T20:23:44Z

I am currently working on a full rewrite of the C family using modern Sublime Text syntax highlighting features. I plan to rewrite C, Objective-C, C++, and Objective-C++ in that order. Note that as of right now, only the C syntax has been edited and the syntax test file does not yet have any annotations.

Changes made to C Highlighting

Removed #if 0 and #if 1 #else creating a commented out block
Added i and j to numerical suffixes (GNU extension)
Add _Alignof, alignof, and offsetof, _Static_assert, static_assert, _Pragma as word operators
Every double-underscored identifier without another scope is treated as support constant
__cplusplus is a support.constant
Add nullptr_t (C23) and max_align_t (C11) to list of default types
Add _Thread_local, thread_local, _Atomic, _Alignas and _Noreturn as storage modifiers
Fixed bug where a function parameter followed by a newline wouldn't be highlighted properly
Add more __declspec properties
Add common __attribute__ properties
Add digit separator ' (C23)
Add scopes for __VA_ARGS__ and __VA_OPT__ (C23)
Give defined macro a suport.macro.c scope
Add #embed, #elifdef, and #elifndef directives (C23)
Add multi-line double-slashed comments via \
Add typedef support for function ptrs
Function pointer arguments in a typedef now share the same scoping as regular function defs

Issues

camila314 · 2025-02-07T20:30:34Z

My motivations are simple

C++/C.sublime-syntax

C++/syntax_test_c.c

C++/C.sublime-syntax

- Add digit separator `'` (C23) - Add scopes for __VA_ARGS__ and __VA_OPT__ - Give defined macro a `suport.macro.c` scope - Add #embed, #elifdef, and #elifndef directives - Add multi-line double-slashed comments via `\` - Fix some bugs

C++/C.sublime-syntax

…ht for fn pointer types in a typedef.

deathaxe · 2025-02-10T17:34:34Z

Some general advices:

main and prototype are primary entry points and thus shouldn't be located somewhere in the middle of anywhere. They should be the first contexts.

I'd recommend to organize contexts in a logical tree-like top-down structure, grouping related things like preprocessors, declarations, statements, expressions, keywords, identifiers, constants, operators, variables etc. together.

Various syntaxes even use ###[ SECTION NAME]### comments to visually separate bigger blocks of related contexts. This also enables easy folding of irrelevant blocks.

All of that will certainly help to keep track of structure as syntax definition grows. Keeping same context order and structure especially is important when extending a syntax, which will be the casse for C++.

The more syntaxes in this repo follow a common schema the easier it is to maintain them.

Premature ordering can be kill of creativity, so order is not that important for prototyping. Maybe however don't wait too long to add some structure.

Please don't just add some code snippets to syntax test files, but also design assertions on the way going. It sometimes takes more time to write tests than syntax, they are however the only way to ensure not to break wanted/designed behavior by automated syntax tests.

When writing rules for certain constructs you have in mind what you find important. Tests are just an expression of those thoughts or tools to verify your ideas/expectations against reality and vice versa.

Tests can (or even should) also contain incomplete statements/expressions, to ensure code doesn't break too badly. At least it helps to ensure to limit breaking highlighting to small regions or makes you think about how to achieve it.

Assertions should also contain some negative tests to ensure meta scope boundaries are maintained correctly or to avoid unwanted overlapping meta scopes. These are the less obvious issues, which may however impact various functionalities such as syntax based folding.

deathaxe · 2025-02-10T17:36:18Z

FWIW, context order and structure of existing C syntaxes may not be ideal nor a good template to rely on. Better look at something more recent, such as Java, PHP or Python.

C++/C++.sublime-syntax

michaelblyons · 2025-03-04T20:42:18Z

C++/C.sublime-syntax

+      scope: punctuation.accessor.c
+  operator.pointer:
+    - match: \&|\*
+      scope: keyword.operator.c


I think I've seen these scoped as keyword.operator.reference .

jrappen · 2025-03-05T09:07:49Z

C++/C.sublime-syntax

+    - match: '{{dir_start}}(?!\s*{{identifier}})'
+      scope: keyword.control.c
+  preprocessor.line-escape:
+    - match: \\$\n


If there is a line continuation is at EOF, this wont match. How do you want to handle that?

C++/C.sublime-syntax

jrappen · 2025-03-05T09:29:52Z

Objective-C/Objective-C.sublime-syntax

-          scope: punctuation.definition.string.end.objc
-          pop: true
+variables:
+  fn_pointer: \(\s*(\*|\^)({{identifier}})\s*\)\s*(\()


syntax variables should never include capturing groups

C++/syntax_test_c.c

jrappen · 2025-03-05T09:40:09Z

C++/C++.sublime-syntax

rwols · 2025-03-08T17:14:11Z

First of all let me say I'm happy someone is looking at these syntaxes. I'm impressed by the sheer speed you are displaying.

I haven't read the 50+ posts on this pull request but I'm trying it out locally now, specifically the C++ syntax.

Problem 1: Eager scoping of entity.name.function

.

The first thing I'm noticing is that for plain function calls, it assigns entity.name.function. Regrettably, I would want this behavior as it was before. Function calls should be scoped variable.function. Examples where it too eagerly assigns entity.name.function:

C& C::f(std::string_view z) noexcept
{
    if (const auto e = a().b(z)) {
        return f(*e); // f *should* be scoped as variable.function
    }
    return *this;
}

C& C::f(const X& x) noexcept
{
    const std::vector<X*> xs = f().g<X>(); // the call to g<X> *should* be variable.function
    m = std::distance(xs.begin(),
        std::find_if(xs.begin(), xs.end(), [&](const X* c) -> bool {
            return c == std::addressof(x);
        })
    );
    return *this;
}

void Frame::OnOpenProject(wxCommandEvent& /*event*/)
{
    if (/*...current content has not been saved...*/ 0) {
        const wxString body("Current content has not been saved! Proceed?");
        //             ^^^^ variable.function
        const wxString title("Please confirm");
        //             ^^^^^ variable.function
        if (wxMessageBox(body, title, wxICON_QUESTION | wxYES_NO, this) == wxNO)
            return;
    }
    wxFileDialog openFileDialog(this, wxT("Open a project file"), "", "", "JSON files (*.json)|*.json", wxFD_OPEN);
    //           ^^^^^^^^^^^^^^ variable.function
    if (openFileDialog.ShowModal() == wxID_CANCEL)
        return;
    try {
        Project project(Project::LoadFromFile(openFileDialog.GetPath()));
        //      ^^^^^^^ variable.function
        SetCurrentWorkingDirectory(project.GetBaseDir());
        // ^^^^^^^^^^^^^^^^^^^^^^^ variable.function
    } catch (const std::exception& ex) {
        wxLogError("Failed to load project: %s", ex.what());
        // ^^^^^^^ variable.function
        SetStatusText("project NOT loaded");
        // ^^^^^^^^^^ variable.function
        return;
    }
    SetStatusText("project loaded!");
}

This eager scoping of entity.name.function also happens in class initializer lists:

SomeClass::SomeClass(const S& title, const P& pos, const Z& size)
    : m_member(std::make_unique<G>(m_other_member))
    //              ^^^^^^^^^^^ variable.function
{
}

It also too eagerly assigns entity.name.function to macro invocations:

void C::Update()
{
    LOG_INFO(FACILITY) << "hello there general kenobi";
    // ^^^^^ - entity.name.function
    //                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ string.quoted.double
    const auto& state = getState();
    if (m.Update(state)) {
        LOG_INFO(FACILITY) << "frobnicating";
        // ^^^^^ - entity.name.function
        //                    ^^^^^^^^^^^^^^ string.quoted.double
        getFoo().f(getBlorb());
        LOG_INFO(FACILITY) << "did the frobnicating";
        // ^^^^^ - entity.name.function
        //                    ^^^^^^^^^^^^^^^^^^^^^^ string.quoted.double
    }
}

deathaxe

OK, seems low hanging fruits are pruned and we can start to the hard part of writing syntax defintions - the details. ;-)

A formal note ahead:

Not separating contexts with emptly lines in combination with separating patterns is odd and undesirable.

For readability reasons all contexts should be separated by an emptly line.

Use ###[ ... ]### to group related contexts instead.

Highlighting seems to work, but tests are not sufficent to prove it being ready for the wild, to be honest. Especially meta scopes seem some general refinements to ensure correct order, boundaries or not overlapping them unintentionally. That's actually the most tricky part of writing syntax definitions.

So here are some examples, of what may need some attention:

C++/syntax_test_c.c

deathaxe · 2025-03-08T17:40:17Z

C++/C.sublime-syntax

+  statement:
+    - match: ;
+      scope: punctuation.terminator.c
+      pop: 1
+    - include: expression
+
+  expression:
+    # priority issues with matching ^\s*{{identifier}}
+    - match: ^\s+
+      branch_point: try-function
+      branch: [try-function, pop-immediately]
+    - include: keyword.control-flow
+    - include: keyword.control
+    - match: (?=^{{identifier}}\s+{{identifier}}\s*\()
+      push:
+        - include: function
+        - include: else-pop
+    - include: number
+    - include: string
+    - include: type
+    - include: keyword
+    - include: constant
+    - include: punctuation
+    - include: operator
+    - include: generic
+    - include: call
+    - include: enclosure


Not an ideal distinction of statements vs. expressions, IMHO.

Statements are e.g. control flow statements, function definitions, type definitions, assignment statements etc.

They consist of expressions.

Everything which can occur within parentheses for instance, belongs to expressions. These are numbers, constants, operators, function calls, strings, etc.

Anything else belongs to statements.

Not respecting this distinction likely causes bugs and messy architecture.

I'd even always group all contexts after that schema:

comments

preprocessor

definitions (functions, types, ...)

statements (if, while, ...)

expressions (groups, brackets, ...)

To some degree this is already true for C current syntax, but not to full degree.

Function call and definitions are somewhat mixed (calls belong to expressions).

case statement related contexts are somehow part of organized under "enclosures".

to mention some examples.

deathaxe · 2025-03-08T17:47:51Z

C++/C.sublime-syntax

+  try-function:
+    - include: keyword.control
+    - include: keyword.control-flow
+    - match: (?={{identifier}}\s+{{identifier}}\s*\()


This looks suspicious. Do we relly want to scope if void foo() a function definition?

deathaxe · 2025-03-08T17:49:56Z

C++/C.sublime-syntax

+    - match: ^\s+
+      branch_point: try-function
+      branch: [try-function, pop-immediately]


Note it being syntactically very correct to start function definitions after semi-colons. Especially in C family we can't rely on statements beginning on a line, only.

deathaxe · 2025-03-08T18:00:57Z

C++/C.sublime-syntax

+    - match: \btypedef\b
+      scope: keyword.declaration.type.c
+      push: type.typedef
+    - include: enum-def
+    - match: \bstruct\b
+      scope: keyword.declaration.struct.c
+      branch_point: struct
+      branch: [type.struct, pop-immediately]
+    - match: \bunion\b
+      scope: keyword.declaration.union.c
+      branch_point: union
+      branch: [type.union, pop-immediately]


Maybe organize all sorts of definitions in separate contexts, grouped logically? This way we don't need to introduce exceptions like enum-def in case an extending syntax needs to override it.

rwols · 2025-03-08T18:25:28Z

Problem 2: Does not handle templates well

The following are all valid C++ template classes:

template <class T> struct X {
//               ^ punctuation.definition.generic.end
};

template <class T, class S> class Y {
//                        ^ punctuation.definition.generic.end
};

template <typename T, typename S> class Z {
//                              ^ punctuation.definition.generic.end    
};

template <template <class> T> struct W {
//                       ^ punctuation.definition.generic.end
//                          ^ punctuation.definition.generic.end
};

Compiler Explorer link: https://godbolt.org/z/o5P3MrjGM

Template functions are also not scoped correctly:

#include <type_traits>
#include <utility>

template <typename T> struct is_pyobject : std::false_type {};
//                  ^ punctuation.definition.generic.end

struct object {};

template <typename T, std::enable_if_t<is_pyobject<T>::value, int> = 0>
//                                                                    ^ punctuation.definition.generic.end
auto object_or_cast(T &&o) -> decltype(std::forward<T>(o)) {
    return std::forward<T>(o);
}

template <typename T, std::enable_if_t<!is_pyobject<T>::value, int> = 0>
//                                                                     ^ punctuation.definition.generic.end

object object_or_cast(T &&o);
//                          ^ - invalid.illegal

int main() {
    return 0;
}

Compiler Explorer link: https://godbolt.org/z/xb7zKb85j

For working with templates, I invite you to clone this project locally and see if you can make the template-heavy code in the header files work: https://github.com/pybind/pybind11/tree/master/include/pybind11.

I'm just reporting global themed problems as I go :) thanks once again for tackling this!

Co-authored-by: deathaxe <[email protected]>

deathaxe · 2025-03-09T08:56:02Z

C++/syntax_test_c.c

+//                  ^ keyword.operator.arithmetic.c 
+//                   ^ keyword.operator.assignment.c 


Operator patterns is a sensible part. You need to carefully organize them in correct order, starting with those of 3 chars, followd by 2 and then 1 to ensure they are matched correctly.

Suggested change

// ^ keyword.operator.arithmetic.c

// ^ keyword.operator.assignment.c

// ^^ keyword.operator.comparison.c

camila314 added 3 commits February 7, 2025 13:53

Initial Commit for rewrite

d46975e

syntax test file wip

8546556

\9 shouldn't be a valid escape

c7b239a

jrappen reviewed Feb 7, 2025

View reviewed changes

C++/C.sublime-syntax Outdated Show resolved Hide resolved

jrappen reviewed Feb 7, 2025

View reviewed changes

C++/C.sublime-syntax Outdated Show resolved Hide resolved

jrappen reviewed Feb 7, 2025

View reviewed changes

C++/C.sublime-syntax Outdated Show resolved Hide resolved

jrappen reviewed Feb 7, 2025

View reviewed changes

C++/C.sublime-syntax Outdated Show resolved Hide resolved

jrappen reviewed Feb 7, 2025

View reviewed changes

C++/C.sublime-syntax Outdated Show resolved Hide resolved

jrappen reviewed Feb 7, 2025

View reviewed changes

C++/C.sublime-syntax Outdated Show resolved Hide resolved

jrappen reviewed Feb 7, 2025

View reviewed changes

C++/C.sublime-syntax Outdated Show resolved Hide resolved

jrappen reviewed Feb 7, 2025

View reviewed changes

C++/C.sublime-syntax Outdated Show resolved Hide resolved

jrappen reviewed Feb 7, 2025

View reviewed changes

C++/C.sublime-syntax Outdated Show resolved Hide resolved

jrappen reviewed Feb 7, 2025

View reviewed changes

C++/syntax_test_c.c Outdated Show resolved Hide resolved

jrappen reviewed Feb 7, 2025

View reviewed changes

C++/C.sublime-syntax Outdated Show resolved Hide resolved

camila314 added 4 commits February 7, 2025 16:33

Fix some issues pointed out

ab7b691

Fix line escape in string

e09a040

Update Syntax Test

85a129e

michaelblyons reviewed Feb 8, 2025

View reviewed changes

C++/C.sublime-syntax Outdated Show resolved Hide resolved

camila314 added 4 commits February 10, 2025 11:04

Add typedef entity (with fn pointer) and make function params highlig…

0f0ac07

…ht for fn pointer types in a typedef.

Remove consume-pop

3e2d92a

Add std funcs

f1abee3

Syntax Test

5497d83

Add actual tests to the syntax test file

a8887ab

This comment was marked as outdated.

Sign in to view

camila314 added 2 commits February 10, 2025 12:32

Fix name in syntax test

a60ca70

Reformat file to be top-down instead of bottom-up

332929e

camila314 added 3 commits March 4, 2025 11:19

C++ Syntax Test

8336715

Objective-C++ syntax tests

84c04bc

Small accessor fixes

1055d64

camila314 marked this pull request as ready for review March 4, 2025 17:45

deathaxe reviewed Mar 4, 2025

View reviewed changes

C++/C++.sublime-syntax Outdated Show resolved Hide resolved

deathaxe reviewed Mar 4, 2025

View reviewed changes

C++/C++.sublime-syntax Outdated Show resolved Hide resolved

deathaxe reviewed Mar 4, 2025

View reviewed changes

C++/C++.sublime-syntax Outdated Show resolved Hide resolved

deathaxe reviewed Mar 4, 2025

View reviewed changes

C++/C++.sublime-syntax Outdated Show resolved Hide resolved

deathaxe reviewed Mar 4, 2025

View reviewed changes

C++/C++.sublime-syntax Outdated Show resolved Hide resolved

michaelblyons reviewed Mar 4, 2025

View reviewed changes

Case and arrow pointer fix

89f92e9

jrappen reviewed Mar 5, 2025

View reviewed changes

C++/C.sublime-syntax Show resolved Hide resolved

jrappen reviewed Mar 5, 2025

View reviewed changes

C++/C.sublime-syntax Outdated Show resolved Hide resolved

jrappen reviewed Mar 5, 2025

View reviewed changes

C++/C.sublime-syntax Outdated Show resolved Hide resolved

jrappen reviewed Mar 5, 2025

View reviewed changes

C++/syntax_test_c.c Show resolved Hide resolved

camila314 added 2 commits March 5, 2025 12:01

Minor Fixes

29a3f2b

Minor Fixes

6b14a5f

michaelblyons reviewed Mar 7, 2025

View reviewed changes

C++/C++.sublime-syntax Show resolved Hide resolved

camila314 added 4 commits March 6, 2025 23:45

More fixes

57cdf66

Forgot to commit this file

dcbfd87

Typo fix

ef1450c

Fix struct, union, enum and more

6b2206e

deathaxe reviewed Mar 8, 2025

View reviewed changes

Apply suggestions from code review

db4b3b7

Co-authored-by: deathaxe <[email protected]>

deathaxe reviewed Mar 9, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[C Family] Full rewrite of C family syntaxes #4147

[C Family] Full rewrite of C family syntaxes #4147

camila314 commented Feb 7, 2025 •

edited

Loading

camila314 commented Feb 7, 2025

deathaxe commented Feb 10, 2025

deathaxe commented Feb 10, 2025

This comment was marked as outdated.

michaelblyons Mar 4, 2025

jrappen Mar 5, 2025

jrappen Mar 5, 2025

jrappen commented Mar 5, 2025

rwols commented Mar 8, 2025 •

edited

Loading

deathaxe left a comment

deathaxe Mar 8, 2025

deathaxe Mar 8, 2025

deathaxe Mar 8, 2025

deathaxe Mar 8, 2025

rwols commented Mar 8, 2025

deathaxe Mar 9, 2025

		// ^ keyword.operator.arithmetic.c
		// ^ keyword.operator.assignment.c

	// ^ keyword.operator.arithmetic.c
	// ^ keyword.operator.assignment.c
	// ^^ keyword.operator.comparison.c

[C Family] Full rewrite of C family syntaxes #4147

Are you sure you want to change the base?

[C Family] Full rewrite of C family syntaxes #4147

Conversation

camila314 commented Feb 7, 2025 • edited Loading

Changes made to C Highlighting

Issues

camila314 commented Feb 7, 2025

deathaxe commented Feb 10, 2025

deathaxe commented Feb 10, 2025

This comment was marked as outdated.

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jrappen commented Mar 5, 2025

rwols commented Mar 8, 2025 • edited Loading

Problem 1: Eager scoping of entity.name.function

deathaxe left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rwols commented Mar 8, 2025

Problem 2: Does not handle templates well

Choose a reason for hiding this comment

camila314 commented Feb 7, 2025 •

edited

Loading

rwols commented Mar 8, 2025 •

edited

Loading