Skip to content
This repository has been archived by the owner on Aug 17, 2023. It is now read-only.

Newline Fix #75

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

Newline Fix #75

wants to merge 4 commits into from

Conversation

AndrewHawes
Copy link

The parser was only looking for the line feed character ('\n'), so when it encountered an empty line with a Windows-style newline beginning with a carriage return ('\r\n'), it would miss it and instead return a PlaintextNode. This would mess up the structure of the generated HTML if the indentation prior to the newline characters was incorrect. For example, a blank line with no indentation would collapse all previous elements and put the following content after the html closing tag.

nodes.py was returning a PlaintextNode upon encountering a carriage return and skipping the following newline character. Added check for '\r'.
Parser was missing newlines on Windows files as it was only checking for the line feed character ('\n'). This was causing it to return a PlaintextNode when encountering a carriage return ('\r'), causing all sorts of problems if the white space before the newline character didn't provide the correct level of indentation (such as collapsing all elements on a completely blank line and placing following content after the html element's end tag).
@rowanseymour
Copy link
Member

The way new line support usually works in Python is that \n, \r\n, and \r are all translated to just \n at the file level.. isn't that happening in this case? See https://docs.python.org/3/library/functions.html#open

@AndrewHawes
Copy link
Author

It isn't in this case, because the file is being opened with codecs.open rather than open, which doesn't perform conversion on \n. So the parser is reading the line ending exactly as it's written, coming across the '\r', not recognizing it, and returning a PlaintextNode. https://docs.python.org/3/library/codecs.html?highlight=codecs#codecs.open

So with how it's working now, say you have a blank line between the head and body tags in your haml, and that blank line has no white space. The parser will return a PlaintextNode with an indentation of 0, closing all previously opened elements, and it will place the body tag after the closing html tag. It will only do this for a file with Windows line endings. It will render the same file normally if it's created with Unix line endings.

(I actually like the behavior better like this, as long as all of your blank lines are indented to where they need to be. This way, blank lines after a tag will be placed after the closing tag in the rendered document rather than before it, and there's no longer the problem with duplicate newlines. The problem is that if you're not paying attention, one missing space on a blank line can completely screw up the output.)

I guess I'd accidentally added white space to a couple empty lines when I last edited this, which is slightly ironic given the nature of the modification.
@@ -48,6 +48,16 @@ def read_node(stream, prev, compiler):
if indent:
indent = indent[0] * len(indent)

# empty lines with carriage returns are recorded as newlines on previous node
# if followed by a newline, it is skipped
if stream.text[stream.ptr] == '\r':
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it would be easier to understand this if it was happening at the stream level.. what if we added def is_newline() and def read_newline() to the Stream class?

@rowanseymour
Copy link
Member

@AndrewHawes ah ok that makes sense - have added a comment

Base automatically changed from master to main January 21, 2021 14:38
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants