-
Notifications
You must be signed in to change notification settings - Fork 593
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[C Family] Full rewrite of C family syntaxes #4147
base: master
Are you sure you want to change the base?
Conversation
- Add digit separator `'` (C23) - Add scopes for __VA_ARGS__ and __VA_OPT__ - Give defined macro a `suport.macro.c` scope - Add #embed, #elifdef, and #elifndef directives - Add multi-line double-slashed comments via `\` - Fix some bugs
…ht for fn pointer types in a typedef.
Some general advices:
I'd recommend to organize contexts in a logical tree-like top-down structure, grouping related things like preprocessors, declarations, statements, expressions, keywords, identifiers, constants, operators, variables etc. together. Various syntaxes even use All of that will certainly help to keep track of structure as syntax definition grows. Keeping same context order and structure especially is important when extending a syntax, which will be the casse for C++. The more syntaxes in this repo follow a common schema the easier it is to maintain them. Premature ordering can be kill of creativity, so order is not that important for prototyping. Maybe however don't wait too long to add some structure. Please don't just add some code snippets to syntax test files, but also design assertions on the way going. It sometimes takes more time to write tests than syntax, they are however the only way to ensure not to break wanted/designed behavior by automated syntax tests. When writing rules for certain constructs you have in mind what you find important. Tests are just an expression of those thoughts or tools to verify your ideas/expectations against reality and vice versa. Tests can (or even should) also contain incomplete statements/expressions, to ensure code doesn't break too badly. At least it helps to ensure to limit breaking highlighting to small regions or makes you think about how to achieve it. Assertions should also contain some negative tests to ensure meta scope boundaries are maintained correctly or to avoid unwanted overlapping meta scopes. These are the less obvious issues, which may however impact various functionalities such as syntax based folding. |
FWIW, context order and structure of existing C syntaxes may not be ideal nor a good template to rely on. Better look at something more recent, such as Java, PHP or Python. |
This comment was marked as outdated.
This comment was marked as outdated.
scope: punctuation.accessor.c | ||
operator.pointer: | ||
- match: \&|\* | ||
scope: keyword.operator.c |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I've seen these scoped as keyword.operator.reference
.
- match: '{{dir_start}}(?!\s*{{identifier}})' | ||
scope: keyword.control.c | ||
preprocessor.line-escape: | ||
- match: \\$\n |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is a line continuation is at EOF, this wont match. How do you want to handle that?
scope: punctuation.definition.string.end.objc | ||
pop: true | ||
variables: | ||
fn_pointer: \(\s*(\*|\^)({{identifier}})\s*\)\s*(\() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
syntax variables should never include capturing groups
|
First of all let me say I'm happy someone is looking at these syntaxes. I'm impressed by the sheer speed you are displaying. I haven't read the 50+ posts on this pull request but I'm trying it out locally now, specifically the C++ syntax. Problem 1: Eager scoping of entity.name.function. The first thing I'm noticing is that for plain function calls, it assigns entity.name.function. Regrettably, I would want this behavior as it was before. Function calls should be scoped variable.function. Examples where it too eagerly assigns entity.name.function: C& C::f(std::string_view z) noexcept
{
if (const auto e = a().b(z)) {
return f(*e); // f *should* be scoped as variable.function
}
return *this;
} C& C::f(const X& x) noexcept
{
const std::vector<X*> xs = f().g<X>(); // the call to g<X> *should* be variable.function
m = std::distance(xs.begin(),
std::find_if(xs.begin(), xs.end(), [&](const X* c) -> bool {
return c == std::addressof(x);
})
);
return *this;
} void Frame::OnOpenProject(wxCommandEvent& /*event*/)
{
if (/*...current content has not been saved...*/ 0) {
const wxString body("Current content has not been saved! Proceed?");
// ^^^^ variable.function
const wxString title("Please confirm");
// ^^^^^ variable.function
if (wxMessageBox(body, title, wxICON_QUESTION | wxYES_NO, this) == wxNO)
return;
}
wxFileDialog openFileDialog(this, wxT("Open a project file"), "", "", "JSON files (*.json)|*.json", wxFD_OPEN);
// ^^^^^^^^^^^^^^ variable.function
if (openFileDialog.ShowModal() == wxID_CANCEL)
return;
try {
Project project(Project::LoadFromFile(openFileDialog.GetPath()));
// ^^^^^^^ variable.function
SetCurrentWorkingDirectory(project.GetBaseDir());
// ^^^^^^^^^^^^^^^^^^^^^^^ variable.function
} catch (const std::exception& ex) {
wxLogError("Failed to load project: %s", ex.what());
// ^^^^^^^ variable.function
SetStatusText("project NOT loaded");
// ^^^^^^^^^^ variable.function
return;
}
SetStatusText("project loaded!");
} This eager scoping of SomeClass::SomeClass(const S& title, const P& pos, const Z& size)
: m_member(std::make_unique<G>(m_other_member))
// ^^^^^^^^^^^ variable.function
{
} It also too eagerly assigns entity.name.function to macro invocations: void C::Update()
{
LOG_INFO(FACILITY) << "hello there general kenobi";
// ^^^^^ - entity.name.function
// ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ string.quoted.double
const auto& state = getState();
if (m.Update(state)) {
LOG_INFO(FACILITY) << "frobnicating";
// ^^^^^ - entity.name.function
// ^^^^^^^^^^^^^^ string.quoted.double
getFoo().f(getBlorb());
LOG_INFO(FACILITY) << "did the frobnicating";
// ^^^^^ - entity.name.function
// ^^^^^^^^^^^^^^^^^^^^^^ string.quoted.double
}
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, seems low hanging fruits are pruned and we can start to the hard part of writing syntax defintions - the details. ;-)
A formal note ahead:
Not separating contexts with emptly lines in combination with separating patterns is odd and undesirable.
For readability reasons all contexts should be separated by an emptly line.
Use ###[ ... ]###
to group related contexts instead.
Highlighting seems to work, but tests are not sufficent to prove it being ready for the wild, to be honest. Especially meta scopes seem some general refinements to ensure correct order, boundaries or not overlapping them unintentionally. That's actually the most tricky part of writing syntax definitions.
So here are some examples, of what may need some attention:
statement: | ||
- match: ; | ||
scope: punctuation.terminator.c | ||
pop: 1 | ||
- include: expression | ||
|
||
expression: | ||
# priority issues with matching ^\s*{{identifier}} | ||
- match: ^\s+ | ||
branch_point: try-function | ||
branch: [try-function, pop-immediately] | ||
- include: keyword.control-flow | ||
- include: keyword.control | ||
- match: (?=^{{identifier}}\s+{{identifier}}\s*\() | ||
push: | ||
- include: function | ||
- include: else-pop | ||
- include: number | ||
- include: string | ||
- include: type | ||
- include: keyword | ||
- include: constant | ||
- include: punctuation | ||
- include: operator | ||
- include: generic | ||
- include: call | ||
- include: enclosure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not an ideal distinction of statements vs. expressions, IMHO.
Statements are e.g. control flow statements, function definitions, type definitions, assignment statements etc.
They consist of expressions.
Everything which can occur within parentheses for instance, belongs to expressions
. These are numbers, constants, operators, function calls, strings, etc.
Anything else belongs to statements.
Not respecting this distinction likely causes bugs and messy architecture.
I'd even always group all contexts after that schema:
- comments
- preprocessor
- definitions (functions, types, ...)
- statements (if, while, ...)
- expressions (groups, brackets, ...)
To some degree this is already true for C current syntax, but not to full degree.
- Function call and definitions are somewhat mixed (calls belong to expressions).
- case statement related contexts are somehow part of organized under "enclosures".
to mention some examples.
try-function: | ||
- include: keyword.control | ||
- include: keyword.control-flow | ||
- match: (?={{identifier}}\s+{{identifier}}\s*\() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks suspicious. Do we relly want to scope if void foo()
a function definition?
- match: ^\s+ | ||
branch_point: try-function | ||
branch: [try-function, pop-immediately] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note it being syntactically very correct to start function definitions after semi-colons. Especially in C family we can't rely on statements beginning on a line, only.
- match: \btypedef\b | ||
scope: keyword.declaration.type.c | ||
push: type.typedef | ||
- include: enum-def | ||
- match: \bstruct\b | ||
scope: keyword.declaration.struct.c | ||
branch_point: struct | ||
branch: [type.struct, pop-immediately] | ||
- match: \bunion\b | ||
scope: keyword.declaration.union.c | ||
branch_point: union | ||
branch: [type.union, pop-immediately] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe organize all sorts of definitions in separate contexts, grouped logically? This way we don't need to introduce exceptions like enum-def
in case an extending syntax needs to override it.
Problem 2: Does not handle templates wellThe following are all valid C++ template classes: template <class T> struct X {
// ^ punctuation.definition.generic.end
};
template <class T, class S> class Y {
// ^ punctuation.definition.generic.end
};
template <typename T, typename S> class Z {
// ^ punctuation.definition.generic.end
};
template <template <class> T> struct W {
// ^ punctuation.definition.generic.end
// ^ punctuation.definition.generic.end
};
Compiler Explorer link: https://godbolt.org/z/o5P3MrjGM Template functions are also not scoped correctly: #include <type_traits>
#include <utility>
template <typename T> struct is_pyobject : std::false_type {};
// ^ punctuation.definition.generic.end
struct object {};
template <typename T, std::enable_if_t<is_pyobject<T>::value, int> = 0>
// ^ punctuation.definition.generic.end
auto object_or_cast(T &&o) -> decltype(std::forward<T>(o)) {
return std::forward<T>(o);
}
template <typename T, std::enable_if_t<!is_pyobject<T>::value, int> = 0>
// ^ punctuation.definition.generic.end
object object_or_cast(T &&o);
// ^ - invalid.illegal
int main() {
return 0;
} Compiler Explorer link: https://godbolt.org/z/xb7zKb85j For working with templates, I invite you to clone this project locally and see if you can make the template-heavy code in the header files work: https://github.com/pybind/pybind11/tree/master/include/pybind11. I'm just reporting global themed problems as I go :) thanks once again for tackling this! |
Co-authored-by: deathaxe <[email protected]>
// ^ keyword.operator.arithmetic.c | ||
// ^ keyword.operator.assignment.c |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Operator patterns is a sensible part. You need to carefully organize them in correct order, starting with those of 3 chars, followd by 2 and then 1 to ensure they are matched correctly.
// ^ keyword.operator.arithmetic.c | |
// ^ keyword.operator.assignment.c | |
// ^^ keyword.operator.comparison.c |
I am currently working on a full rewrite of the C family using modern Sublime Text syntax highlighting features. I plan to rewrite C, Objective-C, C++, and Objective-C++ in that order. Note that as of right now, only the C syntax has been edited and the syntax test file does not yet have any annotations.
Changes made to C Highlighting
#if 0
and#if 1 #else
creating a commented out blocki
andj
to numerical suffixes (GNU extension)_Alignof
,alignof
, andoffsetof
,_Static_assert
,static_assert
,_Pragma
as word operators__cplusplus
is asupport.constant
nullptr_t
(C23) andmax_align_t
(C11) to list of default types_Thread_local
,thread_local
,_Atomic
,_Alignas
and_Noreturn
as storage modifiers__declspec
properties__attribute__
properties'
(C23)__VA_ARGS__
and__VA_OPT__
(C23)suport.macro.c
scope#embed
,#elifdef
, and#elifndef
directives (C23)\
typedef
support for function ptrstypedef
now share the same scoping as regular function defsIssues
const
as an indexed entity.name.type #3504