Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Whitespace handling in Text and Author modes #49

Open
mbakeranalecta opened this issue Sep 4, 2014 · 3 comments
Open

Whitespace handling in Text and Author modes #49

mbakeranalecta opened this issue Sep 4, 2014 · 3 comments

Comments

@mbakeranalecta
Copy link
Contributor

Is there any overlap between the settings that determine whitespace handling in Author mode (as described in author-whitespace-handling.dita) and the settings that handle Format and Indent in Text mode? It seems like the two sets of rules are entirely independent of each other. Is this true.

Incidentally, it will be confusing to readers that the term "whitespace handling" is used in regard to Author mode and "Format and Indent" in regard to Text mode.

I'm not certain which is the better term to use. "Whitespace" could mean any one of three things in Author -- whitespace in the XML, whitespace in the Author view, whitespace in the output. "Format" tends to imply font selection, etc.

I'd actually be inclined to call it "XML pretty printing" everywhere. That will help define its scope for the reader and avoid confusion with the whitespace/formatting of the editor display and/or output.

@raducoravu
Copy link
Contributor

 I'd actually be inclined to call it "XML pretty printing" everywhere. 

Actually we used to call the "Format and Indent" action "Pretty Print" a couple of years ago and we decided to use "Format and Indent" instead. In my opinion "Format and Indent" is a pretty good name for what the action does, if you google it, it is used for such actions. "Pretty Print" also is a good name for what the action does, it does not sound too professional in my opinion.

Now going back to whitespace handling in the Author visual editing mode, the most important setting in my opinion is this one:

Preferences->"Editor / Edit modes / Author" page has a section called "Format and indent" which by default is set to format and indent only the modified content.

This means that when you open an XML document in the Author editing mode and you modify two paragraphs from it, then save or switch to the Text editing mode all the other paragraphs/elements will be saved to text identically as they were loaded, without any extra space, etc. This is good when using version control because it produces minimal changes.

The modified paragraphs will be re-serialized to XML and when this is done, if they are not in a space preserve context (either specified via CSS, schema or xml:space attribute or by the list of space preserve elements from the "Editor / Format / XML" Preferences page) they will be format and indented using the settings in the "Editor / Format / XML". So there is an overlap.

The difference between the format and indent the Author editing mode does when serializing XML and the "Format And Indent" action called in the Text editing mode is that the Author mode in some cases looks at the CSS for hints about the element. For example it looks in the CSS to see if the element is marked as whitespace preserve.

It also looks in the CSS to see that even if an element is marked in the schema as having mixed content (both text and elements) it should still be treated as an element in which whitespaces can be added between child elements. This is controlled by the checkbox from the "Editor / Edit modes / Author / Schema aware" Preferences page called "Indent block-only content".

@mbakeranalecta
Copy link
Contributor Author

Thanks for the additional information on how whitespace handling works. I'll look at all the places it is mentioned and make sure we are telling a complete story.

On the terminology, my first concern is that we use the same term everywhere. "Format and Indent", "Whitespace handling" and "Pretty Print" are all good descriptions of what this function does, but we should be consistent and only use one of them. Currently we use both "Format and Indent" and "Whitespace handling". (Yes, technically "format and indent" and "pretty print" are both ways to do whitespace handling -- but that distinction is going to be lost on many readers.)

The question is, which one should we pick. There are plusses and minuses to each choice.

I don't think there is anything unprofessional sounding about "pretty print". It is a common industry term for this function and is used in many other editors. It is, however, a term from the programming community, not the writing community.

Whitespace handling is even more local -- it is a term from the XML community.

Format and indent is a term from the writing and publishing community. The problem is, it means something different in that community. To that community, format means font choices and page layout. In fact, the one thing many writers know about XML is that it separates formatting from content. So why then would you "format" XML? Others will think formatting XML means applying a stylesheet to it.

The challenge for the Oxygen Docs is that we have two distinct audiences -- writers and XML developers -- who have different vocabularies.

In these cases, my first principle is: don't let the reader guess wrong. Better that the reader not understand, and know that they do not understand, than to have them think they understand when they don't.

By that standard, "format and indent" has the highest chance of the reader guessing wrong. "Whitespace handling" is next. "Pretty print" has the lowest chance of the reader guessing wrong. All programmers will understand it. Some writers will understand it too -- a lot of them document software. Those who don't understand it will probably not guess, but Google it.

It is an important bonus that when you Google "pretty print" the results are unambiguous. They all refer to the same meaning of the word.

That said, I can make any of these work. My main concern is that we pick one term and stick to it.

@georgebina
Copy link
Contributor

In the context of Author the settings are serialization options - they determine how the Author document model is converted to XML.
For the XML source, the format and indent action does again a parse of the document and then the document is serialized back as XML source using those options.
The result may not always be pretty - it depends on these serialization options :).
Now, I think the term "XML Serialization" is too technical to be understood by normal users. It is however used in XSLT and XQuery and it was extracted as a separate spec called "XSLT and XQuery Serialization 3.0"
http://www.w3.org/TR/xslt-xquery-serialization-30/

The main difference between what we have at the beginning and what we have after a serialization (from Author, Grid, Design modes) or after a format and indent action (Text mode) consists in whitespace changes. That means the document is indented differently and there may be also whitespace removed, modified or added at different other positions, including new line characters, what we want to cover with the "format" part in format and indent.

Personally I got used to "Format an indent" for this action.
The Author specific options from "Editor / Edit modes / Author / Schema aware" allow the user to control if the schema information is used or not for serializing the Author document as XML (the schema-aware normalization flag) and the "indent block-only content" solves situations where in an element like the DITA section element, which accepts mixed content according to the schema/DTD the actual content is formed from blocks only, for instance only "p" elements. In this case, we can use the CSS information to identify that all children are elements and all are rendered with display:block and thus consider the content as element only and freely format and indent it (the whitespace will not be significant).

For 16.1 I will keep the same terminology, format and indent - this is used also in the UI and we cannot make any changes to that right now.

After 16.1 I will log an issue to look into re-structuring the formatting options and as part of that we can discuss also the terminology.

Best Regards,
George

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants