-
Hi. I'm using vHIve to study cold start latency in snapshot-enabled serverless framework. To verify that vHive works correctly in my setup, I tried to measure cold start latency when snapshot is created . However, I found out that the cold start latency is measured much greater than that reported in ASPLOS paper Fig. 2 consistently(Paper: 232 ms, My Case: 1500-2000 ms). I wonder if I'm the only one having this issue. For cold start latency measurement, I went through the following steps. I'm testing on a bare-metal server with Intel(R) Xeon(R) CPU E5-2640 v3 @ 2.60GHz and 128GB memory, and the root filesystem is mounted on a high speed SSD whose peak read TP is up to 2GB/s. First I deployed helloworld.yaml using the example deployer to vHive with snapshot enabled. Then I invoked helloworld for snapshot creation and waited for helloworld's active pod to be destroyed. After cleaning up page cache, I invoked helloworld again, and I considered latency the invoker reports in that invocation as the cold start latency. From logs, I discovered function invocation takes unexpectedly long time in vHive CRI and other SW stack. For debugging, I first added prints right before and after SayHello invocation in the example invoker(example/invoker/client.go).
And the log of vHive during that period is
This shows that creating and starting fresh container takes more than 1s since the user's invoke request, which is unexpectedly long. I would be really grateful if you share your cold start latency measurement for comparison! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 6 replies
-
hi @sosson97, thank you for bringing this up. The logs indicate that there are delays in the control plane that require further investigation. Can you please attach the complete logs of containerd and firecracker-containerd (as files)? Note that a VM loads from its snapshot in just 24ms:
For the paper evaluation, we didn't run Kubernetes/Knative. To reproduce the paper results, please follow the instructions in the artifact. Meanwhile, we will investigate this control plane delays, please provide us the logs. |
Beta Was this translation helpful? Give feedback.
hi @sosson97, thank you for bringing this up. The logs indicate that there are delays in the control plane that require further investigation. Can you please attach the complete logs of containerd and firecracker-containerd (as files)?
Note that a VM loads from its snapshot in just 24ms:
For the paper evaluation, we didn't run Kubernetes/Knative. To reproduce the paper results, please follow the instructions in the artifact. Meanwhile, we will investigate this control plane delays, please provide us the logs.