-
-
Notifications
You must be signed in to change notification settings - Fork 468
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
benchmarking v0.14.4 release - discussion about performence issues / high latency #1238
Comments
I can't say for sure without knowing the details of the benchmark, but since we cannot pass the input data on GPU directly through the Task API for now, there should be some performance degradation as a result ( see #1076 ) |
I will tell you all the details of the benchmark in more detail. I hope that will help installing in these numbers. Here are the big "culprits" I can see for now
May I ask, if you have a method of quantifying the performance / latency of the plugin? Or do you just make a qualitative assessment of the plugin's speed? I hope we can contribute to improving the plugins performance in the near future, as the latest release has really interesting features, but sadly is too slow for production. |
Could you show me in the code exactly from which point to which point the time is being measured? |
I'm using a high speed camera (1000 FPS) to measure changes on a test device (e.g. Samsung S7) vs changes in a mirror (image of the setup and explanation here). I'm sorry if the explanation is hard to grasp. Yes, I'm measuring the performance of the sample application. There is no other way to measure the speed of the plugin on Android without writing custom code. So in the end I'm not sure where the problem lies with the plugin or the sample as they are coupled. That's what I'm trying to find out. |
I mean, please indicate the start and end points of the measurement in the code (= line numbers). For example, Line 79 in 9251ba5
to |
There is no "measurement in code". Here is a 12 s extract of the slow-motion video (shot with 1000 FPS) that shows the measurement:
In slow motion you can clearly see the end-to-end latency and you can measure it by counting frames. The only change to the code is the color and thickness of the skeleton lines, since it is needed by our automatic evaluation code. In the above video, both devices have the same apk installed:
|
It might be correct as a benchmarking of the sample app, but the sample app itself is not implemented with the best latency in mind. Therefore, I don't think it is appropriate to conclude that the plugin's performance has degraded based on a specific metric of the sample app. If the performance of your application has noticeably deteriorated due to the plugin update, there may be an issue with the plugin itself. If that is the case, please open a new issue. |
It’s a bit cumbersome to explain everything, so I'll just comment on one point: |
There is no well defined / robust way to benchmark the plugins latency / performance other than benchmarking the pure sample app. Having a minimal working example of using the plugin in the most performant way possible is the holy grail here. It's not easy to rewrite the samples from scratch. We tried to port the old samples (from the 0.6.2 release) to a newer release, but the performance was still bad. Since this was the work of a colleague, I cannot confidently exclude any errors during the import. Actually in my struggle to create more performant samples, that will be my first step... I will follow up with some more latency measurements. Any supervision / tips from your side would be highly appreciated. For example, what was the "intentional change to avoid spikes"? Is there any specific point in code you are referring to or is it more about an architectural change? |
Here are some "in code" latency measurements: 0.6.2 vs 0.14.4 For 0.14.4:Line 79 in 9251ba5 Line 112 in 9251ba5 For 0.6.2:from to just before this line: MediaPipeUnityPlugin/Assets/Mediapipe/Samples/Graphs/PoseTracking/Scripts/PoseTrackingGraph.cs Line 61 in a754416
Here are the results (average + standard deviation, all numbers in Milliseconds) -> on Samsung A8 0.6.2 is 4x faster then 0.14.4 |
To reiterate, the method of generating input images is different.
I don’t think there’s a need to intentionally slow it down, but I believe it’s important for the sample app to be easy to understand. Moreover, I don’t think the definition of ‘the most performant’ is self-evident in the first place.
Will you share the link? |
In my opinion, the performance of the plugin should be measured by the time from when the input texture is acquired to when the output result is received. |
That's nice to hear. I'm about to benchmark the "GPU copy feature" in ab78d5c asap and share the results here. I'm suprised that copying the image within a single frame (frame budget probably ~33ms) would cause problems on some devices. We also test a weak device (Samsung A8) and a strong device (Samsung S7) here and both seem to work better with the "copy in a single frame" approach (assuming that is what's happening in 0.6.2) From my experience: Copying an image should be in the range of 1-3 ms whereas running an image through a TFLite model on android takes 10ms (S7) to 20ms (A8). These numbers were gathered with the official adb-tflite-benchmark tool. Anyway, I'll get back with more numbers soon. |
So it's a bit hard for me to benchmark the new feature. I just tested the 0.15.0 release. It's definitely much faster, probably twice as fast but I don't trust the numbers yet, since the evaluation is automatic and the pose is flickering a lot, as can be seen from the following video: https://drive.google.com/file/d/1vl0tUBc3qGyuEDOQO5cyx07EiBdaAMKh/view?usp=sharing My hunch is that
Also: The sync mode is not working anymore, I could only test the async mode. I guess sync mode is just dead code / left over. Thats a pity since the sync mode can be very beneficial on strong devices and is also good for benchmarking. |
When the running mode is IMAGE or VIDEO, inference is executed synchronously. |
Since there was no delay when copying the input image on the CPU, it seems to be an issue with my implementation. Maybe this is a situation we've encountered before. |
Have you thought about potential issues with the timestamps (as mentioned earlier)? I don't know why else the detection of the arms would flicker up and down in the slo-mo video. This has nothing to do with latency. I currently don't have much time looking into it, since our startup lost funding and I have to look for a new job. |
If so, I would like you to create a separate issue.
|
Plugin Version or Commit ID
v0.14.4
Unity Version
2022.3.34f1
Your Host OS
Windows 10 Pro
Target Platform
Android
Description
I just downloaded the precompiled 0.14.4 release and ran it through our standardized latency benchmark (explained here). It looks like the performance of the plugin keeps on deteriorating on our benchmarking devices (Samsung Tab S7 and A8).
As can be seen from the table below, the latency is almost double compared to a release in 2021 (0.6.2). We will now start to investigate the problem. My gut feeling is that the plugin deteriorated over the time because there were no robust benchmark to test for performance, so nobody could "prove" that it actually became slower. There are probably many things that could have impacted the performance.
My next step will be to compile and benchmark the pure mediapipe v0.10.14 and post the results here.
Since I already benchmarked an older version of the pure mediapipe (probably v0.8.6) and the latency was ok (A8 = 198 +- 14 ms and S7 = 175 +- 10 ms), so I don't think the problem lies with mediapipe.
I hope this github issue can become a lively discussion about possible reasons.
Code to Reproduce the issue
download the v0.14.4 release (precompiled)
Additional Context
I hope people weigh in on the discussion. The latest plugin has some really interesting features (multi person pose estimation) but unacceptable performance.
The text was updated successfully, but these errors were encountered: