Replies: 8 comments
-
Yes, I'm observing this too and it's currently the most expensive flaw of this solution. I don't know if we can do something about it though as this is how the FlashBoot works.
source: https://blog.runpod.io/introducing-flashboot-1-second-serverless-cold-start/ |
Beta Was this translation helpful? Give feedback.
-
For me, it is stuck further down at "Clearing outputs..." 2024-01-23T06:34:38.100634873Z Preload pipeline |
Beta Was this translation helpful? Give feedback.
-
Hello @CyrusVorwald. Just to be sure - Do you mean the logs are "stuck" there for a while, or is your endpoint not generating any outputs at all? And if it's the first case, how long does it take? And does it get better when the Flashboot kicks in while in frequent use? |
Beta Was this translation helpful? Give feedback.
-
Without flashboot, it takes 40+ seconds to generate an output, upwards of 60-100, every time. With flashboot, this occurs the first run but subsequent runs take about 10 seconds. For me, the bulk of the time on the first cold start is spent further down from the gap you mentioned. It only takes about 5 seconds to generate up to the I have not added logs to the steps between when the |
Beta Was this translation helpful? Give feedback.
-
The "gap" in the logs has visually changed now with the new version because I moved outputs clearing from start.sh to handler.py. It was cached with Flashboot there before and didn't actually clean the outputs in frequent use. In reality, the "gap" is the pipeline loading. You can easily see this when adding
nonstop for ~40s, then it connects and generates the output in ~8s. Anyway, to eliminate this we would have to find a way how to speed up loading of the Fooocus itself. I have one thing in mind. But it requires me to rework most of this repo to even test it. I'll let you know here once I make some progress. |
Beta Was this translation helpful? Give feedback.
-
Good news. I made a new branch that makes the cold-starts quicker and the cached ones even a bit faster. It's a Standalone version with models and all files already baked into the container image (like the A1111 image stated above) which eliminates such long pipeline loadings. You can now choose if you want to use the network volume and be able to change things on the fly or the faster and effective version. Give it a try and let me know if it's fast enough now, or if you have any ideas how to make it even faster. Thanks for your contributions to this repo! |
Beta Was this translation helpful? Give feedback.
-
I was caching my model with the network volume before this update. I haven't tried this method, but I do run into this issue still. What's different? |
Beta Was this translation helpful? Give feedback.
-
@CyrusVorwald You mean what's the difference between network and standalone? Standalone version:
Network version:
The v0.3.30 update also included some minor changes like setting the environment to |
Beta Was this translation helpful? Give feedback.
-
Hi @davefojtik
I was trying the further optimize the speed and what I noticed is the docker image takes 40+ seconds in the first startup on a 4090 to start
usually, most of your requests are passed to workers that already have the app up and running so you don't experience the 40+ wait time (which actually is considered part of the Execution Time and you are paying for it) but at times that you don't have frequent requests or times when runpod throttles your workers frequently (which is happening a lot for me recently) you get the 40+ second wait when the container has started and even uvicorn is up but the app is literally doing nothing for 40+ seconds till it eventually starts generating.
At first I thought it might be natural but I did some testing on
https://github.com/runpod-workers/worker-a1111
and first startup on that endpoint is around 3 seconds
do you have any idea what is causing this?
Beta Was this translation helpful? Give feedback.
All reactions