-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Informational] PowerShells Default encoding is UTF-8 without a BOM for textfiles #59
Comments
Thank you! I'm glad to see folks are still using this Beautifier tool. I've had no time to update it the past few years. Three or four years ago I was looking into rewriting in entirely with AST and had some crazy awesome ideas about functionality to add: obfuscation, auto-wrapping of functions in regions with the text Function: [function name], users could write their own rules, etc. Unfortunately I ran into a show stopper and had to abandon those ideas. Yes, originally I had separated the encoding and formatting functionality into 2 separate modules and it was more friction for users to get it running so I ended up combining them. Of course, this was before the code was up in PowerShell Gallery, so maybe now I could separate them. The automatic adding of the UTF8 BOM was something I had to do. I can't remember the exact API (probably the tokenize API in the .NET framework, not a particular cmdlet) but it would fail with an exception if there was non-ASCII code in a file without the UTF8 BOM. It wasn't smart enough to look ahead and determine the proper encoding. Of course this was with Windows PowerShell / earlier .NET code so it's possible & likely they fixed that with PS 6 / .NET Core. However if I remove that auto-adding of UTF8, it breaks for Windows PowerShell users. I guess I could check to see which version of PowerShell they are using.... Thank you again for the kind words! |
Hi Dan, If I understand correctly, your Module rely on the "old" .NET Tokenizer!? This .NET Tokenizer, knows only 20 different token types. Since in PowerShell 3, a new and more powerful .NET Parser was introduced that breaks up PowerShell Code in a much more detailed range of token. [System.Management.Automation.Language.Parser] The new Parser knows 150 different token kinds, each of which can be decorated with 26 token flags. In my experience , the AST is not the holy grail, I prefer Tokens and using the AST only as a helper.
Yes! The PSScriptAnalyzer and his cmdlet Invoke-Formatter offer this in a very complicate way... (and rules are mostly in C# :-( )
Is the opposit to pritty-Print / beautify PowerShell code lol Do you plan further development to this module? Currently I am developing my own PowerShell beautifier here on Github in an very early Stage. In my Testings I only use the new Parser that reads the PowerShell sourcecode files without Problems. Even your "Bad" testing scripts UTF8_NoBOM.ps1 (and the others without BOM). I do not like to present other mans (hard) work as mine! (I also like your Documentation Style and the deepness to dive into internals) |
Hi - sorry for the very late reply, this got lost in my Inbox. Yes, I've abandoned this for now and have no intention of picking it back up. Yes, I built this on the older tokenizer and the code does need a rewrite but I don't have time for that. Feel free to borrow any code or ideas that you'd like. |
Hi Dan!
In reaction to your great effort about File encoding, I like to point you to the following information about Microsofts decision about Textfile encoding. (fi you do not allready seem it):
Default encoding is UTF-8 without a BOM except for New-ModuleManifest
Please allow me some words about it and your module.
I am also a BIG fan of the BOM because files without BOM force us to guess the encoding ... so I am sad about MS decision too ...
I have the impressio that your Module try to serve 2 purposes. Encoding and Formatting.
I think it is better to separate this totally. So its better maintainable and contributors having it easier to contribute to one of this topic.
Anny way...
I really, really appreciate yout work done here.
Have a good time, stay well
Peter
The text was updated successfully, but these errors were encountered: