Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#247] Add .clangd configuration file syntax checker #249

Merged
merged 5 commits into from
Feb 9, 2024

Conversation

ghentschke
Copy link
Contributor

@ghentschke ghentschke commented Feb 6, 2024

  • Inform the user via markers in the .clangd file when the syntax cannot be parsed, because this leads to problems in the
    ClangdConfigurationFileManager. The path to the compile_commands.json cannot be set in the .clangd file when the .clangd file is corrupted. A marker informs the user about the problem:

image

fixes #247

- Inform the user via markers in the .clangd file when the syntax cannot
be parsed, because this leads to problems in the
ClangdConfigurationFileManager.
@ghentschke
Copy link
Contributor Author

Please notice: I've to add some unit tests. I'am working on that.

//throws ScannerException and ParserException:
yaml.load(inputStream);
} catch (Exception yamlException) {
addMarkerToClangdConfig(configFile, yamlException);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many Yaml exceptions - like the one raised in #245 (comment) - are of type MarkedYAMLException which have some utility methods like getProblemMark and getProblem which will allow you to deconstruct the exception without having to parse the string.

I haven't tried to use them in this context, but it looks like it should do the job

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for hint! It's much easier!

if (event.getDelta() != null && event.getType() == IResourceChangeEvent.POST_CHANGE) {
try {
event.getDelta().accept(delta -> {
if ((delta.getKind() == IResourceDelta.ADDED || delta.getKind() == IResourceDelta.REMOVED
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the .clangd file is removed we probably don't want to run the checker on it. My reading of this code seems to suggest that a CoreException will be thrown as a result of configFile.getContents()

return true;
});
} catch (CoreException e) {
StatusManager.getManager().handle(e, ClangdPlugin.PLUGIN_ID);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use a StatusManager handle here, but not in the other places that add exceptions to the log?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, copy and paste error ;-)

- Fetch info from MarkedYAMLException to create a marker for the .clangd
file.

fixes eclipse-cdt#247
Comment on lines 44 to 52
} catch (Exception exception) {
if (exception instanceof MarkedYAMLException yamlException) {
addMarkerToClangdConfig(configFile, yamlException);
} else {
//log unexpected exception:
Platform.getLog(getClass())
.error("Expected MarkedYAMLException, but was: " + exception.getMessage(), exception); //$NON-NLS-1$
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
} catch (Exception exception) {
if (exception instanceof MarkedYAMLException yamlException) {
addMarkerToClangdConfig(configFile, yamlException);
} else {
//log unexpected exception:
Platform.getLog(getClass())
.error("Expected MarkedYAMLException, but was: " + exception.getMessage(), exception); //$NON-NLS-1$
}
}
} catch (MarkedYAMLException yamlException) {
addMarkerToClangdConfig(configFile, yamlException);
} catch (Exception exception) {
//log unexpected exception:
Platform.getLog(getClass())
.error("Expected MarkedYAMLException, but was: " + exception.getMessage(), exception); //$NON-NLS-1$
}

private ClangdConfigMarker parseYamlException(MarkedYAMLException exception) {
var marker = new ClangdConfigMarker();
marker.message = exception.getProblem();
marker.line = exception.getProblemMark().getLine() + 1; //getLine() is zero based, IMarker wants 1-based
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible NPE here? A quick look says that MarkedYAMLException probably always has non-null problemMark, but an NPE here would be unfortunate.

marker.line = exception.getProblemMark().getLine() + 1; //getLine() is zero based, IMarker wants 1-based
int index = exception.getProblemMark().getIndex();
var buffer = exception.getProblemMark().getBuffer();
if (index == buffer.length) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is an incorrect assumption in this if - that is that buffer is the whole file contents. This is probably true for small files, but not so for bigger files. getPointer() is the index in the buffer, and getIndex() is the index in the file (but in code points, not bytes or characters)

Copy link
Contributor Author

@ghentschke ghentschke Feb 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is an incorrect assumption in this if - that is that buffer is the whole file contents. This is probably true for small files, but not so for bigger files

This implies that we should/could not rely on the buffer delivered with the exception? If so, I've to re-read the file in the MarkedYAMLException catch clause to determine the document/file length:

			} catch (MarkedYAMLException yamlException) {
                                try (var reReadStream = configFile.getContents()) {
				     var bufferLength = reReadStream.readAllBytes().length;
				      addMarkerToClangdConfig(configFile, yamlException, bufferLength);
                               }
			}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That code snippet is the number of bytes in the file :-( number of bytes, chars and code points can all be different.

Turning the code point index back into char pr byte index is a non-trivial operation. I would say it probably isn't worth it.

What is the problem that using the length of the file is trying to solve? Is it an out of bound exception on the charEnd by just adding 1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the problem that using the length of the file is trying to solve? Is it an out of bound exception on the charEnd by just adding 1?

When problemMark.getIndex() == buffer.length() then the marker index points to the non visible \r or \n character and the marker cannot be displayed:
image
The maker is only visible on the file in the project explorer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or, as above, when there is no \r\n, index + 1 > buffer.length

if (index == buffer.length) {
index = getIndexOfLastPrintableChar(buffer);
}
marker.charStart = index;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a small error here, index is in codePoints, but charStart is measured in characters. So you can get some slightly offset files. Consider this file that uses tab incorrectly:

bob:	bob

renders the error as expected:

image

But if you use a character requiring surrogate pairs such as 😂 you get an off by one error. If there are lots of such characters before the error in the file it can be quite far off.

b😂b:	bob

image

alice: 😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂
john: 😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂
bob:	bob

image

In this case since the error is a single character wide, you can end up having no red underline at all because the range is less than a displayed character:

alice: 😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂
john: 😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂
bob:	bob

image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems hard to solve. As you mentioned above, all we have is the number of bytes:

That code snippet is the number of bytes in the file :-( number of bytes, chars and code points can all be different.

Turning the code point index back into char pr byte index is a non-trivial operation. I would say it probably isn't worth it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - it is a small (but interesting) problem. I am not convinced you need to solve it, but just be aware of it.

You can choose to not solve it, and leave it as a known problem that it will get the charStart/End correct only if bytes == chars == code points. I don't know how often this will be a real world problem.

Same thing for the above end of file.

I think it is ok to have this deficiency because the error marker will be present on the file and error message will be correct.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PS If you do want to solve the problem, you can read the number of code points in the file and then you will have exact indexes. To do this you need to do BOM handling (see org.yaml.snakeyaml.reader.UnicodeReader) and convert char to code point (see org.yaml.snakeyaml.reader.StreamReader.update() or org.yaml.snakeyaml.error.Mark.toCodePoints(char[]) for code you may be able to call)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - it is a small (but interesting) problem. I am not convinced you need to solve it, but just be aware of it.
You can choose to not solve it, and leave it as a known problem that it will get the charStart/End correct only if bytes == chars == code points. I don't know how often this will be a real world problem.

I'll see if I can solve this later. I can open a separate issue for that.

Same thing for the above end of file.

It is indeed a problem on larger files. I've to re-read the file into a buffer stream to get the end of the buffer. The buffer in the MarkedYAMLException was limited to ~800 bytes:
image

I think it is ok to have this deficiency because the error marker will be present on the file and error message will be correct.

IMO the missing error marker at the end of the file is more relevant, because I think it is more a real world problem to forget the final } than using non UTF-8 characters in a txt file. So I would like to add the re-read of the file to the catch clause at mentioned here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

// GIVEN an existing .clangd configuration file with invalid yaml syntax (missing closing brace):
// WHEN a new .clang file gets added to the project (should trigger ClangdConfigFileMonitor.checkJob).
var configFile = createConfigFile(INVALID_YAML_SYNTAX_MISSING_BRACE);
// AND the UI thread sleeps for 1s:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this is to test that the marker gets created? Instead can you test in a loop for the end condition so that the test doesn't wait longer than needed.

Comment on lines 141 to 145
try {
Thread.sleep(50);
} catch (InterruptedException e) {
e.printStackTrace();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In test you can just let the InterruptedException be thrown by the test method. It makes the test shorter, and if a thread is interrupted here that would be unexpected, so having the stacktrace saved in the test results (instead of just stderr) is good.

Suggested change
try {
Thread.sleep(50);
} catch (InterruptedException e) {
e.printStackTrace();
}
Thread.sleep(50);

if (index == buffer.length) {
index = getIndexOfLastPrintableChar(buffer);
}
marker.charStart = index;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

- fix review comments
- use IResource.DEPTH_ZERO in deleteMarkers, since the config file has
no members.
@ghentschke ghentschke merged commit d4bca9e into eclipse-cdt:master Feb 9, 2024
3 checks passed
@ghentschke
Copy link
Contributor Author

@jonahgraham Thank your the many good hints!

@ghentschke ghentschke deleted the yaml-syntax-checker branch February 9, 2024 07:26
@ghentschke ghentschke added this to the 1.1.0 milestone Feb 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Check .clangd configuration file syntax
2 participants