Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C Family] Full rewrite of C family syntaxes #4147

Open
wants to merge 66 commits into
base: master
Choose a base branch
from

Conversation

camila314
Copy link
Contributor

@camila314 camila314 commented Feb 7, 2025

I am currently working on a full rewrite of the C family using modern Sublime Text syntax highlighting features. I plan to rewrite C, Objective-C, C++, and Objective-C++ in that order. Note that as of right now, only the C syntax has been edited and the syntax test file does not yet have any annotations.

Changes made to C Highlighting

  • Removed #if 0 and #if 1 #else creating a commented out block
  • Added i and j to numerical suffixes (GNU extension)
  • Add _Alignof, alignof, and offsetof, _Static_assert, static_assert, _Pragma as word operators
  • Every double-underscored identifier without another scope is treated as support constant
  • __cplusplus is a support.constant
  • Add nullptr_t (C23) and max_align_t (C11) to list of default types
  • Add _Thread_local, thread_local, _Atomic, _Alignas and _Noreturn as storage modifiers
  • Fixed bug where a function parameter followed by a newline wouldn't be highlighted properly
  • Add more __declspec properties
  • Add common __attribute__ properties
  • Add digit separator ' (C23)
  • Add scopes for __VA_ARGS__ and __VA_OPT__ (C23)
  • Give defined macro a suport.macro.c scope
  • Add #embed, #elifdef, and #elifndef directives (C23)
  • Add multi-line double-slashed comments via \
  • Add typedef support for function ptrs
  • Function pointer arguments in a typedef now share the same scoping as regular function defs

Issues

@camila314
Copy link
Contributor Author

My motivations are simple
image

- Add digit separator `'` (C23)
- Add scopes for __VA_ARGS__ and __VA_OPT__
- Give defined macro a `suport.macro.c` scope
- Add #embed, #elifdef, and #elifndef directives
- Add multi-line double-slashed comments via `\`
- Fix some bugs
@deathaxe
Copy link
Collaborator

Some general advices:

main and prototype are primary entry points and thus shouldn't be located somewhere in the middle of anywhere. They should be the first contexts.

I'd recommend to organize contexts in a logical tree-like top-down structure, grouping related things like preprocessors, declarations, statements, expressions, keywords, identifiers, constants, operators, variables etc. together.

Various syntaxes even use ###[ SECTION NAME]### comments to visually separate bigger blocks of related contexts. This also enables easy folding of irrelevant blocks.

All of that will certainly help to keep track of structure as syntax definition grows. Keeping same context order and structure especially is important when extending a syntax, which will be the casse for C++.

The more syntaxes in this repo follow a common schema the easier it is to maintain them.

Premature ordering can be kill of creativity, so order is not that important for prototyping. Maybe however don't wait too long to add some structure.


Please don't just add some code snippets to syntax test files, but also design assertions on the way going. It sometimes takes more time to write tests than syntax, they are however the only way to ensure not to break wanted/designed behavior by automated syntax tests.

When writing rules for certain constructs you have in mind what you find important. Tests are just an expression of those thoughts or tools to verify your ideas/expectations against reality and vice versa.

Tests can (or even should) also contain incomplete statements/expressions, to ensure code doesn't break too badly. At least it helps to ensure to limit breaking highlighting to small regions or makes you think about how to achieve it.

Assertions should also contain some negative tests to ensure meta scope boundaries are maintained correctly or to avoid unwanted overlapping meta scopes. These are the less obvious issues, which may however impact various functionalities such as syntax based folding.

@deathaxe
Copy link
Collaborator

FWIW, context order and structure of existing C syntaxes may not be ideal nor a good template to rely on. Better look at something more recent, such as Java, PHP or Python.

@michaelblyons

This comment was marked as outdated.

@camila314 camila314 marked this pull request as ready for review March 4, 2025 17:45
scope: punctuation.accessor.c
operator.pointer:
- match: \&|\*
scope: keyword.operator.c
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I've seen these scoped as keyword.operator.reference .

- match: '{{dir_start}}(?!\s*{{identifier}})'
scope: keyword.control.c
preprocessor.line-escape:
- match: \\$\n
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is a line continuation is at EOF, this wont match. How do you want to handle that?

scope: punctuation.definition.string.end.objc
pop: true
variables:
fn_pointer: \(\s*(\*|\^)({{identifier}})\s*\)\s*(\()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

syntax variables should never include capturing groups

@jrappen
Copy link
Contributor

jrappen commented Mar 5, 2025

  • update commit message for merge (i.e. opening comment)
  • tests for comments should include - comment comment tests if they cannot be nested
  • add tests for indentation and (transformed) symbols (you need to switch to "dev" in the top right when viewing the docs page)
  • maybe you could add ###[ SECTION HEADERS ]##################### to C++, too
  • you sometimes use invalid.illegal.xxx.yyy.zzz, sometimes invalid.illegal.xxx-yyy-zzz
  • you've got a lot of capturing groups where you do not use them, that decreases performance
  • you could add a syntax var to C, to use in the first_line_matches
  • move word lists from matches to variables
  • syntax variables should never include capturing groups
  • looking at the syntax test files, it seems you copied & pasted a lot.
    • Tests for the base syntax should cover (1) its base scope and (2) whatever you have defined there.
    • Tests for the extending syntaxes should cover (3) their base scopes and (4) whatever is different in their syntax compared to whatever syntax(es !) they are extending. There is no need for and you should not include duplicate tests.

@rwols
Copy link
Contributor

rwols commented Mar 8, 2025

First of all let me say I'm happy someone is looking at these syntaxes. I'm impressed by the sheer speed you are displaying.

I haven't read the 50+ posts on this pull request but I'm trying it out locally now, specifically the C++ syntax.

Problem 1: Eager scoping of entity.name.function

.

The first thing I'm noticing is that for plain function calls, it assigns entity.name.function. Regrettably, I would want this behavior as it was before. Function calls should be scoped variable.function. Examples where it too eagerly assigns entity.name.function:

C& C::f(std::string_view z) noexcept
{
    if (const auto e = a().b(z)) {
        return f(*e); // f *should* be scoped as variable.function
    }
    return *this;
}
C& C::f(const X& x) noexcept
{
    const std::vector<X*> xs = f().g<X>(); // the call to g<X> *should* be variable.function
    m = std::distance(xs.begin(),
        std::find_if(xs.begin(), xs.end(), [&](const X* c) -> bool {
            return c == std::addressof(x);
        })
    );
    return *this;
}
void Frame::OnOpenProject(wxCommandEvent& /*event*/)
{
    if (/*...current content has not been saved...*/ 0) {
        const wxString body("Current content has not been saved! Proceed?");
        //             ^^^^ variable.function
        const wxString title("Please confirm");
        //             ^^^^^ variable.function
        if (wxMessageBox(body, title, wxICON_QUESTION | wxYES_NO, this) == wxNO)
            return;
    }
    wxFileDialog openFileDialog(this, wxT("Open a project file"), "", "", "JSON files (*.json)|*.json", wxFD_OPEN);
    //           ^^^^^^^^^^^^^^ variable.function
    if (openFileDialog.ShowModal() == wxID_CANCEL)
        return;
    try {
        Project project(Project::LoadFromFile(openFileDialog.GetPath()));
        //      ^^^^^^^ variable.function
        SetCurrentWorkingDirectory(project.GetBaseDir());
        // ^^^^^^^^^^^^^^^^^^^^^^^ variable.function
    } catch (const std::exception& ex) {
        wxLogError("Failed to load project: %s", ex.what());
        // ^^^^^^^ variable.function
        SetStatusText("project NOT loaded");
        // ^^^^^^^^^^ variable.function
        return;
    }
    SetStatusText("project loaded!");
}

This eager scoping of entity.name.function also happens in class initializer lists:

SomeClass::SomeClass(const S& title, const P& pos, const Z& size)
    : m_member(std::make_unique<G>(m_other_member))
    //              ^^^^^^^^^^^ variable.function
{
}

It also too eagerly assigns entity.name.function to macro invocations:

void C::Update()
{
    LOG_INFO(FACILITY) << "hello there general kenobi";
    // ^^^^^ - entity.name.function
    //                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ string.quoted.double
    const auto& state = getState();
    if (m.Update(state)) {
        LOG_INFO(FACILITY) << "frobnicating";
        // ^^^^^ - entity.name.function
        //                    ^^^^^^^^^^^^^^ string.quoted.double
        getFoo().f(getBlorb());
        LOG_INFO(FACILITY) << "did the frobnicating";
        // ^^^^^ - entity.name.function
        //                    ^^^^^^^^^^^^^^^^^^^^^^ string.quoted.double
    }
}

Copy link
Collaborator

@deathaxe deathaxe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, seems low hanging fruits are pruned and we can start to the hard part of writing syntax defintions - the details. ;-)

A formal note ahead:

Not separating contexts with emptly lines in combination with separating patterns is odd and undesirable.

For readability reasons all contexts should be separated by an emptly line.

Use ###[ ... ]### to group related contexts instead.


Highlighting seems to work, but tests are not sufficent to prove it being ready for the wild, to be honest. Especially meta scopes seem some general refinements to ensure correct order, boundaries or not overlapping them unintentionally. That's actually the most tricky part of writing syntax definitions.

So here are some examples, of what may need some attention:

Comment on lines +172 to +198
statement:
- match: ;
scope: punctuation.terminator.c
pop: 1
- include: expression

expression:
# priority issues with matching ^\s*{{identifier}}
- match: ^\s+
branch_point: try-function
branch: [try-function, pop-immediately]
- include: keyword.control-flow
- include: keyword.control
- match: (?=^{{identifier}}\s+{{identifier}}\s*\()
push:
- include: function
- include: else-pop
- include: number
- include: string
- include: type
- include: keyword
- include: constant
- include: punctuation
- include: operator
- include: generic
- include: call
- include: enclosure
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not an ideal distinction of statements vs. expressions, IMHO.

Statements are e.g. control flow statements, function definitions, type definitions, assignment statements etc.

They consist of expressions.

Everything which can occur within parentheses for instance, belongs to expressions. These are numbers, constants, operators, function calls, strings, etc.

Anything else belongs to statements.

Not respecting this distinction likely causes bugs and messy architecture.

I'd even always group all contexts after that schema:

  1. comments
  2. preprocessor
  3. definitions (functions, types, ...)
  4. statements (if, while, ...)
  5. expressions (groups, brackets, ...)

To some degree this is already true for C current syntax, but not to full degree.

  • Function call and definitions are somewhat mixed (calls belong to expressions).
  • case statement related contexts are somehow part of organized under "enclosures".

to mention some examples.

Comment on lines +200 to +203
try-function:
- include: keyword.control
- include: keyword.control-flow
- match: (?={{identifier}}\s+{{identifier}}\s*\()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks suspicious. Do we relly want to scope if void foo() a function definition?

Comment on lines +180 to +182
- match: ^\s+
branch_point: try-function
branch: [try-function, pop-immediately]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note it being syntactically very correct to start function definitions after semi-colons. Especially in C family we can't rely on statements beginning on a line, only.

Comment on lines +327 to +338
- match: \btypedef\b
scope: keyword.declaration.type.c
push: type.typedef
- include: enum-def
- match: \bstruct\b
scope: keyword.declaration.struct.c
branch_point: struct
branch: [type.struct, pop-immediately]
- match: \bunion\b
scope: keyword.declaration.union.c
branch_point: union
branch: [type.union, pop-immediately]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe organize all sorts of definitions in separate contexts, grouped logically? This way we don't need to introduce exceptions like enum-def in case an extending syntax needs to override it.

@rwols
Copy link
Contributor

rwols commented Mar 8, 2025

Problem 2: Does not handle templates well

The following are all valid C++ template classes:

template <class T> struct X {
//               ^ punctuation.definition.generic.end
};

template <class T, class S> class Y {
//                        ^ punctuation.definition.generic.end
};

template <typename T, typename S> class Z {
//                              ^ punctuation.definition.generic.end    
};

template <template <class> T> struct W {
//                       ^ punctuation.definition.generic.end
//                          ^ punctuation.definition.generic.end
};

Compiler Explorer link: https://godbolt.org/z/o5P3MrjGM

Template functions are also not scoped correctly:

#include <type_traits>
#include <utility>

template <typename T> struct is_pyobject : std::false_type {};
//                  ^ punctuation.definition.generic.end

struct object {};

template <typename T, std::enable_if_t<is_pyobject<T>::value, int> = 0>
//                                                                    ^ punctuation.definition.generic.end
auto object_or_cast(T &&o) -> decltype(std::forward<T>(o)) {
    return std::forward<T>(o);
}

template <typename T, std::enable_if_t<!is_pyobject<T>::value, int> = 0>
//                                                                     ^ punctuation.definition.generic.end

object object_or_cast(T &&o);
//                          ^ - invalid.illegal

int main() {
    return 0;
}

Compiler Explorer link: https://godbolt.org/z/xb7zKb85j

For working with templates, I invite you to clone this project locally and see if you can make the template-heavy code in the header files work: https://github.com/pybind/pybind11/tree/master/include/pybind11.

I'm just reporting global themed problems as I go :) thanks once again for tackling this!

Comment on lines +200 to +201
// ^ keyword.operator.arithmetic.c
// ^ keyword.operator.assignment.c
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Operator patterns is a sensible part. You need to carefully organize them in correct order, starting with those of 3 chars, followd by 2 and then 1 to ensure they are matched correctly.

Suggested change
// ^ keyword.operator.arithmetic.c
// ^ keyword.operator.assignment.c
// ^^ keyword.operator.comparison.c

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment