-
Notifications
You must be signed in to change notification settings - Fork 305
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle large history file properly by reading lines in the streaming way #3810
Conversation
// After seeking, the current position may point at the middle of a history record, or even at a | ||
// byte within a UTF-8 character (history file is saved with UTF-8 encoding). So, let's ignore the | ||
// first line read from that position. | ||
sr.ReadLine(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@iSazonov Regarding you comment "I could save the file in Utf16", PSReadLine always saves the history file in UTF-8 encoding. Both File.CreateText and File.AppendText writes UTF-8 encoding text to the file, so if a user saves the file in Unicode, then it may be broken to read later after PSReadLine writes something to the file. So, I will assume it's UTF-8 encoding.
Can you please review again? Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you see no problem that the history file may be corrupted in very rare cases this code looks good.
For reflection only, I have the following questions and doubts.
- The user can actually save the file in a different encoding.
- Only .Net Core at some point started using Utf8 by default. Windows PowerShell can write in Utf16 by default.
- Although StreamReader detects the encoding automatically, the current code seems to ignore this as it skips the beginning of the file.
- Why do we store the beginning of a large file when now we don't use it anyway and the user doesn't even know about it? Probably we can find a way to trim it, although it's not easy to think of.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The user can actually save the file in a different encoding.
If the user manually changes encoding, to like UTF-16 LE, the history file will be corrupted once PSReadLine writes updates to the file, because PSReadLine always writes text in UTF-8. I verified this using both Windows PowerShell and PowerShell 7+.
See the code below (Both File.CreateText and File.AppendText writes UTF-8 encoding text to the file, on both .NET and .NET Framework):
PSReadLine/PSReadLine/History.cs
Lines 366 to 379 in d045b50
using (var file = overwritten ? File.CreateText(Options.HistorySavePath) : File.AppendText(Options.HistorySavePath)) | |
{ | |
for (var i = start; i <= end; i++) | |
{ | |
HistoryItem item = _history[i]; | |
item._saved = true; | |
// Actually, skip writing sensitive items to file. | |
if (item._sensitive) { continue; } | |
var line = item.CommandLine.Replace("\n", "`\n"); | |
file.WriteLine(line); | |
} | |
} |
Only .Net Core at some point started using Utf8 by default. Windows PowerShell can write in Utf16 by default.
Again, File.CreateText and File.AppendText writes UTF-8 encoding text to the file, on both .NET and .NET Framework
Although StreamReader detects the encoding automatically, the current code seems to ignore this as it skips the beginning of the file.
I tried it out, and it turns out that StreamReader
detects encoding when creating the instance.
Why do we store the beginning of a large file when now we don't use it anyway and the user doesn't even know about it? Probably we can find a way to trim it, although it's not easy to think of.
I think you missed something in the code. If the user decides to set max history count to more than 20,000, we will read all content from the file. So, the history is still accessible to the user as long as they want.
@iSazonov Can you please approve the PR if all your concerns are resolved? I need an approval to merge the PR. |
Approved with one note but I haven't full permissions :-) |
PR Summary
Fix #3771, Fix #1360, Fix #537
Handle large history file properly by reading lines in the streaming way.
PR Checklist
Microsoft Reviewers: Open in CodeFlow