[Informational] PowerShells Default encoding is UTF-8 without a BOM for textfiles #59

Kriegel · 2020-08-16T12:28:09Z

Hi Dan!

In reaction to your great effort about File encoding, I like to point you to the following information about Microsofts decision about Textfile encoding. (fi you do not allready seem it):

Default encoding is UTF-8 without a BOM except for New-ModuleManifest

Please allow me some words about it and your module.
I am also a BIG fan of the BOM because files without BOM force us to guess the encoding ... so I am sad about MS decision too ...

I have the impressio that your Module try to serve 2 purposes. Encoding and Formatting.
I think it is better to separate this totally. So its better maintainable and contributors having it easier to contribute to one of this topic.

Anny way...
I really, really appreciate yout work done here.

Have a good time, stay well
Peter

DTW-DanWard · 2020-08-16T15:39:09Z

Thank you! I'm glad to see folks are still using this Beautifier tool. I've had no time to update it the past few years. Three or four years ago I was looking into rewriting in entirely with AST and had some crazy awesome ideas about functionality to add: obfuscation, auto-wrapping of functions in regions with the text Function: [function name], users could write their own rules, etc. Unfortunately I ran into a show stopper and had to abandon those ideas.

Yes, originally I had separated the encoding and formatting functionality into 2 separate modules and it was more friction for users to get it running so I ended up combining them. Of course, this was before the code was up in PowerShell Gallery, so maybe now I could separate them.

The automatic adding of the UTF8 BOM was something I had to do. I can't remember the exact API (probably the tokenize API in the .NET framework, not a particular cmdlet) but it would fail with an exception if there was non-ASCII code in a file without the UTF8 BOM. It wasn't smart enough to look ahead and determine the proper encoding. Of course this was with Windows PowerShell / earlier .NET code so it's possible & likely they fixed that with PS 6 / .NET Core. However if I remove that auto-adding of UTF8, it breaks for Windows PowerShell users. I guess I could check to see which version of PowerShell they are using....

Thank you again for the kind words!

Kriegel · 2020-08-17T17:46:40Z

Hi Dan,

If I understand correctly, your Module rely on the "old" .NET Tokenizer!?

This .NET Tokenizer, knows only 20 different token types.

Since in PowerShell 3, a new and more powerful .NET Parser was introduced that breaks up PowerShell Code in a much more detailed range of token.

[System.Management.Automation.Language.Parser]

The new Parser knows 150 different token kinds, each of which can be decorated with 26 token flags.
This provides a very detailed picture, especially when it comes to nested token.
Additionally it returns the Abstract Syntax Tree (AST).

In my experience , the AST is not the holy grail, I prefer Tokens and using the AST only as a helper.

users could write their own rules

Yes! The PSScriptAnalyzer and his cmdlet Invoke-Formatter offer this in a very complicate way... (and rules are mostly in C# :-( )

obfuscation

Is the opposit to pritty-Print / beautify PowerShell code lol
Use the conversion to a Base64 String and your done lol

Do you plan further development to this module?
Then I think it will end in a complete rewrite.

Currently I am developing my own PowerShell beautifier here on Github in an very early Stage.
BeautyOfPower

In my Testings I only use the new Parser that reads the PowerShell sourcecode files without Problems.
Tested only with Windows PowerShell 5.1 and 7 now.

Even your "Bad" testing scripts UTF8_NoBOM.ps1 (and the others without BOM).
So currently I do not care about Encoding.
(And if so, I think this is a topic that must be solved by the enduser not in my Module, except to warn the user and display Encoding Errors).

I do not like to present other mans (hard) work as mine!
If you do not Plan to develop your Powershell Beautifier further, please allow me (explicit) to copy over (adopt) code AND
to copy over large parts of your documentation to my Module "BautyOfPower" from your Module here.

(I also like your Documentation Style and the deepness to dive into internals)

DTW-DanWard · 2020-10-06T22:03:20Z

Hi - sorry for the very late reply, this got lost in my Inbox. Yes, I've abandoned this for now and have no intention of picking it back up. Yes, I built this on the older tokenizer and the code does need a rewrite but I don't have time for that. Feel free to borrow any code or ideas that you'd like.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Informational] PowerShells Default encoding is UTF-8 without a BOM for textfiles #59

[Informational] PowerShells Default encoding is UTF-8 without a BOM for textfiles #59

Kriegel commented Aug 16, 2020

DTW-DanWard commented Aug 16, 2020 •

edited

Loading

Kriegel commented Aug 17, 2020 •

edited

Loading

DTW-DanWard commented Oct 6, 2020

[Informational] PowerShells Default encoding is UTF-8 without a BOM for textfiles #59

[Informational] PowerShells Default encoding is UTF-8 without a BOM for textfiles #59

Comments

Kriegel commented Aug 16, 2020

DTW-DanWard commented Aug 16, 2020 • edited Loading

Kriegel commented Aug 17, 2020 • edited Loading

DTW-DanWard commented Oct 6, 2020

DTW-DanWard commented Aug 16, 2020 •

edited

Loading

Kriegel commented Aug 17, 2020 •

edited

Loading