Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions on whitespace handling #50

Open
mbakeranalecta opened this issue Sep 9, 2014 · 13 comments
Open

Questions on whitespace handling #50

mbakeranalecta opened this issue Sep 9, 2014 · 13 comments
Labels

Comments

@mbakeranalecta
Copy link
Contributor

Information on whitespace handling is located in several different places in the interface and in several different place in the documentation. This make it hard to get a sense of the whole and how the various parts and settings affect each other. I have been attempting to write a topic that stitches the whole picture together, but I have a number of questions:

  1. There are two different setting that deal with schema aware whitespace handling: Editor>Format>XML>Schema aware format and indent and Editor>Edit modes>Author>Schema aware>Schema aware normalization, format and indent. The first is obviously specific to author mode, but the second is general and should presumably apply to any mode. What is the actual difference between them and what do they affect?

  2. For elements listed in the default space list, is the content normalized and left as is, or is it normalized and then formatted and indented?

  3. If there are elements listed in the Preserve space, Default space, and Mixed content, and Schema aware format and indent are enabled, which takes precedence, the schema definitions or the content of these lists?

  4. There are three menu items that do indenting:

    • Document>Source>Format and Indent,
    • Document>Source>Indent Selection
    • Document>Source>Format and Indent Element

    Why does the second one not include the word "format". Is the functionality different? More limited?

  5. http://www.oxygenxml.com/doc/ug-editor/#topics/author-whitespace-handling.html list two sets of formatting and indenting rules, one for when a document is opened in author, and one for when it is saved.

    • Why would the rules be different for opening and for saving?
    • Which of these rules would be applied when switching from author mode to text mode (or grid mode).
    • Which would be applied when switching from text mode or grid mode to author mode?
    • Is the difference one that the reader actually needs to know about? In other words, is there some decision that the reader needs to make that depends on knowing the exact details of these rules and how they differ?
    • The section on the rules on opening end with "Otherwise the white-spaces are ignored." In this context it is not clear what "ignored" means. Does it mean that the white spaces are suppressed or that they are included as is?
  6. What does zero-size indent mean? Does it simply mean that There is no indenting of the content -- that every line starts at the left margin?

  7. http://www.oxygenxml.com/doc/ug-editor/#tasks/how-to-use-zero-size-indent.html says to use zero indent, disable Detect indent on open and set the indent to zero, but there is an option that says Use zero-indent if detected. Shouldn't the topic say, if Detect indent on open is selected, select *Use zero-indent if detected" and set the indent to zero?

  8. What is the difference between Preferences>Editor>Format>Indent with tabs and Preferences>Editor>Indent with tabs?

@georgebina
Copy link
Contributor

Hi Mark,

(1) The setting from the Author specific page replaces the one from the more generic options. So, the Author mode will look only at the setting from the Author specific page to determine if schema information will be used or not for loading and serialization of the document.
For example, if you have a document like:

<test xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:noNamespaceSchemaLocation="test.xsd">
  <p>test test test test test test test test test test test test test test test test test test test test </p>
</test>

and a schema like

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">
  <xs:element name="test">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="p"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
  <xs:element name="p" type="xs:string"/>
</xs:schema>

then the schema information will tell oXygen that the whitespace of the p element needs to be preserved, because the string built-in type has the whitespace facet with the value preserve. If this information is used then oXygen will not split the content of p on multiple lines, otherwise the content will be formatted by inserting a new line and indent, resulting in something like

<test xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:noNamespaceSchemaLocation="test.xsd">
  <p>test test test test test test test test test test test test test
    test test test test test test test </p>
</test>

Other whitespace information refers to default attributes like xml:space and mixed versus element only content. All this is important to get the correct behavior wrt whitespace.
Again the Author will look at the setting from the Author specific page while the Text will look at the more generic setting.

Regards,
George

@georgebina
Copy link
Contributor

Hi Mark,

(2) If an element is added to the default space list then it is processed as if it will have an xml:space="default" on it. This means that for example, if we have that element inside a pre element then default whitespace processing will be applied on that element, clearing the preserve flag for its content. That means the content of that element is freely formatted, indented and normalized ignoring the context information that may come from its parent and that may say we are in a whitespace preserve state. In the above example, if we enter inside the p element content something like

    <x> <y/> </x>

then this will remain as is unless we add x in the default list, in that case the y element will go on a separate line.

Regards,
George

@georgebina
Copy link
Contributor

Hi Mark,

(3) We can obtain whitespace information from multiple sources including the document itself, the options, the schema information or the CSS. Each of these will provide info that will tell us if the whitespace should be preserved or not, and in the later case if we are in mixed content or in element only content (that means we can completely ignore it). So, we end us basically with 3 levels - ignore, normalize and preserve. When an element is processed the parent also provides a whitespace processing context, materialized in a similar whitespace processing level. The precedence is not between these different sources of information, but on the whitespace processing levels - the source that sets the higher level will win. For example, if a source says that the whitespace should be preserved then we will preserve it, entering in a whitespace preserving state. This state will be exited when we get a specific indication of default whitespace processing, from options or from the document itself (xml:space="default"). Mixed will also win over element only, for example even if the schema says that some content is element only when we detect text inside that element then we switch to mixed mode.
One exception here is the option from the Author specific page that discards the schema information that can say an element is mixed if that element contains only elements and all of them are displayed as blocks by the associated CSS - this is the case for section or a table entry that are defined in the DTD as having mixed content but in many cases they contain only blocks and the whitespace between those blocks can be ignored.

Best Regards,
George

@georgebina
Copy link
Contributor

Hi Mark,

(4) The Indent selection does not introduce any new lines or other whitespace inside the line, it will change only the indenting from the start of the line. The other actions use format because they may change all the whitespace, not only the indenting.

Best Regards,
George

@georgebina
Copy link
Contributor

Hi Mark,

(6) Zero indent will perform formatting but the lines will start immediately at the left margin, no intenting whitespace will be added.

Best Regards,
George

@georgebina
Copy link
Contributor

Hi Mark,

(7) If detect indent on open is enabled then oXygen will try to detect the indent by analyzing the input document and the indent size will be the dafault, in case the detection determines that there is not enough information to decide on the indent size. The "use zero indent, if detected" option allows oXygen to propose zero as a detected indent, otherwise zero will be considered an invalid value and oXygen will report in that case that a specific indent value was not detected and the default will be used.
The instructions can say that if you want to use detect indent on open and work with zero-indented documents then you need to enable also the "use zero indent, if detected" option, otherwise zero will not be considered a valid indent size.

Best Regards,
George

@georgebina
Copy link
Contributor

Hi Mark,

(8) There is no such option now "Preferences>Editor>Indent with tabs".

Best Regards,
George

@raducoravu
Copy link
Contributor

Back to question (5):

The Open and Save behaviors are two important parts of the editing session.
When the XML content is loaded from disk and most of the text nodes are normalized because in the Author visual editing mode you do not want to see line breaks and indents in text corresponding to how the XML is saved on disk.
On Save, the normalized text which is not the content edited in the Author editing mode needs to be serialized as XML. Of course, I explained in another issue that by default only the modified nodes will be serialized as XML, the XML content of the other nodes will remain identical with the initial content. When this serialization to XML is done, Oxygen will usually indent and add line breaks in the text content, except in the cases described in the user's manual.

More details below:

   Why would the rules be different for opening and for saving?

Open and save are opposite operations, one needs to normalize or strip white spaces and the other needs to re-add spaces and line breaks in the XML so that it is easier to read.

   Which of these rules would be applied when switching from author mode to text mode (or grid mode).
  Which would be applied when switching from text mode or grid mode to author mode?

The switch to the Author mode does the same thing as opening the XML directly in the Author mode.
The switch from Author to Text or Grid also serializes the XML identically as saving it to disk.

   Is the difference one that the reader actually needs to know about? In other words, is there some decision that the reader needs to make that depends on knowing the exact details of these rules and how they differ?

The reader might at some point notice that Oxygen adds line breaks inside an XML element which the readers considers to be space preserve. If so, the reader would need to know in what places he should make modifications (CSS, associated schema, list of preserve space elements in the "Editor / Format / XML" preferences page or setting the xml:space attribute) so that he can prohibit Oxygen to add the line breaks and indent inside the element.
Actually the topic you described seems not to mention that the space-preserve information can also be obtained from the associated schema or from the preferences page specified above.

The section on the rules on opening end with "Otherwise the white-spaces are ignored." In this context it is not clear what "ignored" means. Does it mean that the white spaces are suppressed or that they are included as is?

I would say "stripped".

@mbakeranalecta
Copy link
Contributor Author

Open and save are opposite operations, one needs to normalize or strip white
spaces and the other needs to re-add spaces and line breaks in the XML so
that it is easier to read.

Ah, okay, that makes sense, and explains why they specific actions described are different.

The reader might at some point notice that Oxygen adds line breaks inside an XML
element which the readers considers to be space preserve. If so, the reader would
need to know in what places he should make modifications (CSS, associated schema,
list of preserve space elements in the "Editor / Format / XML" preferences page or
setting the xml:space attribute) so that he can prohibit Oxygen to add the line breaks
and indent inside the element.

Right, so I think the real issue that the reader needs to understand is how does Oxygen determine what is significant whitespace (which it should not mess with) and what is insignificant whitespace (which it is free to add or remove in order to format and indent the content.

What I am trying to do in the new topic I am writing is to describe the rules that Oxygen uses to determine whether whitespace is significant or not. The use case for the reader is to identify when Oxygen is treating whitespace the reader deems significant as insignificant, and to figure out how to tell Oxygen that this whitespace is significant.

The other use case that matters to the reader is to maintain consistency in how a document is formatted and to avoid unnecessary format changes, so as to avoid trivial differences between versions of a file in a VCS. The rules that govern this are in the Format preferences.

So, the reader needs to know two things:

  • What are the rules for identifying significant whitespace, and how to I affect the recognition of significant whitespace?
  • What are the rules for formatting and indenting content, and how do I affect them to minimize trivial differences between versions.

Knowing the exact steps that Oxygen takes when saving, loading, switching, etc. does not help the user with either of these use cases. What they need to know is that Oxygen formats and indents on these events, and that it follows the rules above when it does so.

Agreed?

@mbakeranalecta
Copy link
Contributor Author

George. Thanks so much for the very detailed responses!

Number 3 seems to be the key to understanding how the whole whitespace system works. I'm going to write to topic in those terms.

@mbakeranalecta
Copy link
Contributor Author

George,

Re 7: I was able to get Oxygen to recognize zero indent, and also to create a file with enough variation that it did not recognize it and defaulted to the specified indent size. However, Oxygen did not report the inability to determine the indent size (at least in any form I could see). It just switched to the default.

Was it supposed to warn me? And if so, how?

@georgebina
Copy link
Contributor

Hi Mark,

No, it will not warn the user in any way, it will just use the default indent.
A basic functionality of a text editor is to allow the user to set an indent size. This extends that and tries to detect an indent. In the cases when the users open files that are properly indented with a specific indent, oXygen will automatically detect and use that. If the file has mixed indenting that will not allow oXygen to determine the indent then the default indent will be used, as if this advanced feature was not available. I think a warning would be annoying - this undecided situation happens generally when the file contains only a few lines, and that is generally a new file that the user just starts working on, so using the default indent should be just fine and a warning will just add confusion.

Best Regards,
George

@georgebina
Copy link
Contributor

I agree that the user does not need all the internal details of whitespace handling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants