Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid trying to interpret escape sequences inside comments. #95

Merged
merged 2 commits into from
Nov 25, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 14 additions & 4 deletions mathics_scanner/tokeniser.py
Original file line number Diff line number Diff line change
Expand Up @@ -472,10 +472,20 @@ def _skip_blank(self):
try:
self.incomplete()
except ValueError:
# Funny symbols like | in comments can cause a ValueError.
# Until we have a better fix -- like noting we are inside a
# comment and should not try to substitute symbols -- ignore.
pass
# `incomplete` tries to parse substrings like `\|AAAAA`
# that can be interpreted as a character reference.
# To do that, it tries to get the
# new line using the method
# `Prescanner.replace_escape_sequences()`
# Inside a comment, the special meaning of escape sequences
# like `\|` should not be taken into account.
#
# In case of error, just let's pick the code
# from the `input_line` attribute of
# prescanner:
self.code = self.prescanner.input_line
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, we can replace the whole try/except block by lines 476 and 477.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that works, it would be simpler.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested against the comment that produces the syntax error in GS1.m and works. Then I tested on each GS?.m and we still have one test wrong on GS2.m and GS3.m, but it seems it is not related to this issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking again, at found a problem with the "simpler" approach: it could happen that after the comment finishes, some code that needs to take into account the escape characters comes. For example,

(*
Hi, this is a test \|<-this would produce an error if we pass by `replace_escape_sequences` 
*)
a="\|AAAAAA"
(*The previous line would require the use of replace_escape_sequences*)

So I was wrong with the extra simplification.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still this fix does not cover a corner case of the form

(* corner case *) A="\|AAAAAA"

So maybe we need to track this, and reload the last line. I will change the comment to mention this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of these approaches feel hacky and not the way scanners usually work. Handling escape sequences is not something that is typically done separately from initial scanning. But right now this is not something I want to spend time on.

So right now I would prefer defer addressing until such time when we can do this properly. The code base has too many "quick hacks" and could use better-thought-out and more standard solutions.

Copy link
Member

@rocky rocky Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So maybe we need to track this, and reload the last line. I will change the comment to mention this.

Yes, please open an issue for this. Long comments on specific bugs are better put in the issue tracker. There, one can be extremely verbose, more so than in comments.

# TODO: handle the corner case where the rest of the line
# include escaped sequences, out of the comment.
else:
break
if comment:
Expand Down
Loading