-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
progress indicator #632
Comments
No real plans yet, sadly. It would take effort to make it go nicely with multithreading and our Y/N user prompts.
|
I was looking into the original pr cause it seems interesting and I had a few questions
|
I think this is the most important use case from the usability perspective, because when users are waiting for 10+ seconds, they want to perform other tasks while waiting, so they should be given feedback indicating when the task is going to be completed. |
@AntoniosBarotsis, we already have a problem with multithreading not going well with our STDIN user prompts, so, I'll explain briefly my "rewrite idea". If we want to do multithreading properly, we need a Otherwise, we need another way for threads to synchronize STDIN usage, and to prevent them to write to STDOUT while the bar is being drawn. STDIN is used when asking users to overwrite a file, for example. The problem with each thread locking STDOUT is that this is all blocking IO, so threads stop compressing and decompressing when waiting for a lock to be freed. With that in mind, I think we need to use channels (which leads me to the orchestrator idea). This enables a progress bar and multithreaded compression (reminder for the future: remove that :v Any thoughts on this? |
Some thoughts after going through your reply and the code again. DecompressingSTDOUTI think getting the logs to work across threads properly wouldn't be too much of an issue in principle but I don't think it would be just a few changes to the info macro either. Looking at this par_iter usage, we could use an What I am a bit hesitant with how easy it would be for someone to accidentally introduce print statements in the code that would negate this. A way around this would be to deny the usage of STDINStdin seems like it will be more annoying. I think a "simple" way to work around it would be the following:
Now from what I understand, the asking for user input here will be essentially after the heavy work has been done (this being the actual decompression) and we only ask the user if they want to overwrite/create files. This means that there is no performance gain from doing this part in parallel so sequential is fine. If that is not the case, we could put stuff at Rayon's job queue. This I think is simple enough but it does have a problem: the user would have to wait for all decompressions to finish before being asked for input on the files that need it. I think that that is not a big deal and fixing this would be a lot more complicated which is funny of me to say considering my suggested solution is almost describing a budget async runtime 😅. Note Small edit, technically the Progress BarsI think much of my confusion is that I'm not sure whether we want a progress bar for compressed single files or for stuff like zips and tarballs. I'm guessing that there is no real way of making this work (at least for decompression) without somehow keeping of how much of the file we've already read? Should decompression work the same way where we try and figure out how much of the compressed file we've read already? I'm not sure if these algorithms (de)compress as they read the file or if they need to read all of it first. Side note, the ClosingI think the stdin/stdout stuff is pretty doable but they also seem like they should be implemented regardless of whether the progress bar returns or not. Maybe "should" is a strong word, something like "a separate problem that could potentially be an issue" is a little more accurate. |
One problem regarding the STDIN part, a thread can't return Those threads need STDIN confirmation to then continue the work they're doing. So no returning early is allowed, the work needs to resume 🤔. |
This makes it decently more convoluted cause it looks like we would need bidirectional communication between the main and the worker threads then. Maybe we should move this to a separate issue at this point 😅. I guess a simple way to test for this now is to do something like this ouch c README.md tmp.tar
ouch d tmp.tar tmp.tar The logs get a bit messed up and it is not immediately obvious that u got asked about overwriting the file twice. I don't like the idea of having a bunch of channels everywhere as I feel like that would make the code potentially more annoying to work with but at the same time, I'm not sure how else to go about this. We could have the main thread receive messages from the worker threads that would be something like "print this string" and "show this message and ask for user input". Prints could be buffered both to not lock stdout all the time but also to allow reads to take precedence over them, this would prevent logs from looking like this for the example I showed earlier [INFO] Created temporary directory [...]\ouch\.\.tmpfmajP3 to hold decompressed elements.
[INFO] Created temporary directory [...]\ouch\.\.tmpRYc9qN to hold decompressed elements.
[INFO] ".tmpfmajP3\\README.md" extracted. (5.99 kiB)
[INFO] ".Do you want to overwrite '.\README.md'? [Y/n] tmpRYc9qN\\README.md" extracted. (5.99 kiB)
Do you want to overwrite '.\README.md'? [Y/n] # Prints stop here but my input is actually for the line above An advantage of sending the logs to the main thread instead of printing them directly would theoretically also be that the threads would not stop their work and wait until they can lock stdout anymore since only main would be doing that. This would essentially be like implementing a shared message queue (or two, one for worker-> main and one for main->worker) where messages from the main to the worker threads would need to have some identifier for the correct thread to pick them up (thread-id would likely not work for this alone since threads could be reused). |
Exactly what I was thinking! Even though it adds complexity, I don't see a better way to move with it. Thanks for bringing up those points, that's probably the way forward. |
I was tinkering with this for a bit and ended up with something: $ cargo r -- d tmp.tar tmp.tar
Created temporary directory C:\Users\anton\Documents\Github\ouch\.\.tmpbfFY84 to hold decompressed elements.
Created temporary directory C:\Users\anton\Documents\Github\ouch\.\.tmpoShACS to hold decompressed elements.
".tmpbfFY84\\README.md" extracted. (5.99 kiB)
".tmpoShACS\\README.md" extracted. (5.99 kiB)
Do you want to overwrite '.\README.md'? [Y/n] # <-- waits here for input
Do you want to overwrite '.\README.md'? [Y/n] # <-- waits here for input
Successfully moved C:\Users\anton\Documents\Github\ouch\.\.tmpbfFY84\README.md to .\README.md.
Successfully decompressed archive in current directory.
Files unpacked: 1
Successfully moved C:\Users\anton\Documents\Github\ouch\.\.tmpoShACS\README.md to .\README.md.
Successfully decompressed archive in current directory.
Files unpacked: 1 I ended up doing this slightly differently than I was initially considering; I made a log sender/receiver channel for sending messages from worker threads to be logged but I didn't do anything for the stdin stuff we were discussing. I'm not sure if there's a point making another set of channels for requesting user input and then sending said input back since each will have to lock anyway, in retrospect it feels like its just a lot of complexity for nothing although I may be missing something here 😅. What I did instead is lock stdout from The logging itself is a bit different as well. Initially, I was trying to just log everything from the main thread, the problem with that was that that This is incomplete as I haven't made these changes for all logs and prints, just the ones that my test would hit and I'm also not sure if I've made a large oversight somewhere that I failed to notice, hence why I didn't make this into a pr yet, I'm still at a proof of concept stage in my head. The code itself could also likely be more neat, I'm not immensely fond of the API as is right now but I didn't give it enough thought yet. How does this help with progress bars?Well so far, this doesn't really do anything. I'm not sure how we should deal with asking for user input while showing the progress bar but all these channels being set up already would make something like sending progress data to the logging thread trivial which could be the way to go about that. Let me know your thoughts! Curious if this is going in a different direction than you were expecting initially or if you feel like (some of) this is a bad idea. |
Cool! I'd like to see that commit (even if it's half-baked and needs polishing).
Well, that's a good point. Taking a step back to the progress bar, what would STDIN prompts look like? Imagine the following:
What happens when a question arrives?
Maybe the progress bar should freeze, perhaps change color, and say "(waiting)", when question is answered, all accumulated logs would be printed and the progress bar would be updated again.
I think locking STDOUT will block (de)compression progress in other threads. We also know that logging from a worker thread causes #77. I'm starting to consider having just the progress bar and removing the verbose logs, but putting them behind |
Performance
This probably needs some more investigating to be sure but I think this may not be the case anymore since the worker threads are not blocked by the stdout lock since they send messages over the channels I added in my commit. I didn't formulate my sentence too well, I was talking about having channels that receive "read user input" requests and then send the user input back to the worker thread that asked for it because I don't see how that would be an improvement over what is currently implemented in my fork which has the worker threads block stdout themselves when they want user input. The reason I don't think this should slow them down is because, from what I understand, they used to slow down because each worker thread would try and lock stdout for their verbose logs, this is no longer the case though since they send their logs to the logging thread without the need to lock anything. By having that middleman logging thread I think the issue should go away. Relevant code for that logging thread is here.
Ok so I went ahead and implemented buffering and set the buffer size to 10 for now. This is significantly faster than before. I made very rough benchmark where I zipped one of the git repos I had locally, copied the resulting zip so I have 2 archives to decompress in parallel and timed the following: `cargo r -r -- d test1.zip test2.zip -y. Before any of my changes, this particular repo took about 50 seconds, with the changes + buffering it only took 12. Something I did not fully expect was how increasing the buffer size by a lot (I tried 500) did not increase performance that much, it was only faster by around a second. I was expecting this to be a bit more slower because of how many files the repo had but I guess its great that it is not since we don't have to mess much with the buffer sizes haha. Commit is here. I have not yet tried asking for user input though but considering that with my example, I only get asked once for the entire folder, this should cause no issues. Progress bar
I'm not sure how well progress bars work with for example the movement that would be caused in the terminal by printing the question text; in my experience things tend to break somewhat when that happens (might end up with 1 progress bar that does not get updated and a new one that does or something similar to that). I haven't used indicatif specifically so I don't know if they have this issue.
Honestly this seems like a better idea. |
Does the implementation of this make sense to you? I could finish up changing the rest of the logs to use this system of using a channel to send the logs to the logging thread and make a PR for it |
Nice graph and explanation, thanks. I misread what you said but now I understand. It does make sense and I'm surprised by the 50s -> 12s improvement, looks promising 😃. Before giving a green light for a PR, I wonder if the "Logger" couldn't be replaced by "Main". Imagine you construct a (Instead of Then What do you think? Would that also work? If so, it might be a simpler solution. |
The issue with that is that technically the code is blocking:
That is what I tried initially. The code itself would not really be much different; there wouldn't be any need for that mutex+condvar stuff I did to not keep the main thread busy waiting but everything else would be the same. |
Ooh that's right, I tried a better solution but had no success, we need to early-return any errors so your solution seems optimal. Then it all sounds good to me ✔️ 🙂 . |
Wouldn't it be better if the progress bar appeared only after answering a possible question, and not before it? |
If I understood this correctly then a decent amount of work needs to be done before we know whether we need to ask a question or not, it is not something we can know ahead of time, that's the issue here. We put the progress bar before the question because we know we will be making progress before we know whether a question needs to be asked. Later on we figured that limiting logs would be best (hide all info behind a |
Thank you for explanation. Yes this actually makes sense and is logical. I have no idea about programming, so I can't provide proper support, but I read the entire thread again and I understand the problem you wrote about. I know and associate the progress bar with TUI applications and especially GUI applications, but I have never used a CLI application that would have a progress bar. What would it even look like? If new text appeared and break terminal flow, couldn't the progress bar simply "break" and reappear a column below? |
ANSI escape codes are the way we can talk to terminals, it's how indicatif re-draws progress bars in the same line.
We can clear the line where the bar is, then output a log line, and redraw the bar in the next line (I believe it is the only way). Besides adding a "WAITING" message, we should also change the progress bar color when waiting for a question, to make it more distinguishable (should be easy to do). |
Is there a plan to add a progress indicator back? (It is a must have for me at least.) Thanks!
Was removed in #300.
The text was updated successfully, but these errors were encountered: