-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable orc code generation at build-time for iOS #26
Comments
Just a comment on current performance: I just tested the latest master of everything on iPad Mini 1, and performance is crawling Bowser to a near halt. Using Time Analyzer, the main bottlenecks seem to be the orc_* related functions involved with video (notably orc_executor_emulate). Not really surprising. P.S. Another bottleneck seems to be Opus and all the audio analysis it's performing, but I believe that was mentioned in another issue already, as well as aes_encrypt taking some time. |
I ran Bowser on an iPod Touch, and the top threads for CPU consumption were:
There are a number of issues to tackle here:
|
do you know if we're sending h264 in that case? I expect so... In which case maybe that isn't weird at all. The memory copy in intervideosrc was surprising. I think zero copy and queue like behavior are really needed for those to have low latency and high performance/low power drain applications. videoconvert is more prominent in Bowser because of the rendering mechanism which needs to convert both the self and remote views to BGRA to send as bitmaps to the WebView for rendering. In a real app using glimagesink or similar, those would not be visible. As such and in general, we should perhaps try to optimize the colour space of the video for encoders and rely on having fast colour space conversion in renderers. As for audio, absolutely we should try to avoid conversions that are unnecessary. Noops for the win. |
Correction: we're already on Opus 1.1, but it doesn't seem like we're actually using the appropriate optimisations. Will take a look at this. Update: the recipe's already supposed to be setting things up to use intrinsics, so bumping investigating this down. |
We're using vp8enc, which has 1.2% CPU usage. It's possible the frames just aren't reaching the encoder. |
Or that they're perfect black frames? |
intervideosrc and gst_buffer_make_writable() will only copy the underlying memory if it wasn't shareable, which is probably the problem here. We can't push non-writable buffers out there because we need to update the timestamps |
So here's the full analysis. Scenario: testing with demo.openwebrtc.io:38080, A-side: Bowser (master), B-side: Chrome (OS X 10.9.5), local LAN. Remarks:
Time Analysis (running ~10 minutes, calling ~9 minutes):
(Excludes minor unavoidable functions involved with sockets and iOS libraries such as QuartzCore.) Quite a few interesting bottlenecks apart from orc. What's also interesting is the fact that the video encoder itself doesn't really show up here, while audio and video conversion does seem to be a significant hit. Seems a little backwards. |
I don't think there's so much left after that. 😄 Crypto pops up but maybe addressing EricssonResearch/openwebrtc#199 will fix that |
@ijsf you mentioned earlier that you were interested in helping out. How about looking at this issue (ORC)? |
Sure, I'll try and have a look at this once my rebuild is done. I've forked the orc repository at https://github.com/ijsf/OpenWebRTC-orc |
Currently working on adding a --static-implementation option in the above fork. |
This fork is now capable of compiling static functions with inline asm code (and backup C functions for those that don't have an asm implementation). It's almost compilable. The drawback is that I also had to clone gstreamer-common, the gstreamer and all gst* repositories, as common/orc.mak (which contains the arguments for orcc) is a submodule (gstreamer-common) in the gst* and gstreamer git repositories. Furthermore, I found out there's also a "--target " (neon for iOS) command-line argument missing there (just hardcoded to neon for now, needs to be fixed). So in short, the command-line needs to change from "... --implementation ..." to "... -target --static-implementation". Could somebody give me some pointers on how to do this as easy as possible? |
I think there should be a configure parameter to select this: Then orc.mak would do its thing depending on that. |
This is done as far as I'm concerned. |
@ijsf I'll keep it open until it lands and the above bugzilla ticket is closed. Should we create a separate issue about 64-bit ARM support? As far as I recall, that's a requirement for new App Store submissions. Well, we can support 64-but ARM with fallback C code but the performance is horrible. |
I think 64-bit ARM support is also important, but I won't be able to do this as I do not have any arm64 devices available. The move to 64-bit ARM probably won't require a huge amount of changes. The most important changes are described here: http://community.arm.com/docs/DOC-8453 |
We do and we can test and report back: |
One of the most important changes is the reordering of the s/d/q registers between 32-bit and 64-bit. It's not really an option for me as I believe this will require a lot of back and forth testing/verification on arm64. |
I was under the impression that ARM 64 bit instruction set is backwards compatible with 32 bit. Maybe that is not relevant here? |
Correct, but as far as I know only in 32-bit mode. Check the above docs for more information. |
This would be needed for using videoconvert and videoscale properly on iOS, as it's impossible to allocate executable memory on iOS nowadays for Apple reasons.
https://bugzilla.gnome.org/show_bug.cgi?id=742843 is about this issue too. Basically orcc can output NEON assembly at build time already, but there are some problems with getting that to run and also with outputting something for orc programs that contain unsupported opcodes for the specific target.
Ideally we would change orcc to output backup C code plus inline assembly (or in a separate file is ok too), and then at runtime it selects which one to use. As that is still possible on iOS.
The text was updated successfully, but these errors were encountered: