Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High cpu usage even when no source is playing when HRTF is enabled #1040

Open
fbirot opened this issue Sep 6, 2024 · 5 comments
Open

High cpu usage even when no source is playing when HRTF is enabled #1040

fbirot opened this issue Sep 6, 2024 · 5 comments

Comments

@fbirot
Copy link

fbirot commented Sep 6, 2024

Hello,

I've noticed a pretty high CPU usage in my iOS app due to OpenAL Soft when no audio sources are playing (6% on iPhone 15 pro).

After investigating, I found that the issue stems from the DeviceBase::postProcess function called in DeviceBase::renderSamples. This function seems to perform a lot of processing, even when no active sources are playing. Specifically, it seems to be processing HRTF, even though no sound is being rendered.

I don't fully understand everything that postProcess is doing, but I was wondering if it would be possible to skip certain processing steps when there are no active voices or sources playing. This could significantly reduce CPU usage.

As a temporary test, I added a condition to check if any voices are currently playing before calling postProcess:

if(hasPlayingVoices())
{
    /* Apply any needed post-process for finalizing the Dry mix to the RealOut
     * (Ambisonic decode, UHJ encode, etc).
     */
    postProcess(samplesToDo);
}

Here's how I implemented the hasPlayingVoices function:

bool DeviceBase::hasPlayingVoices()
{
    for(ContextBase *ctx : *this->mContexts.load(std::memory_order_acquire))
    {
        const al::span<Voice*> voices{ctx->getVoicesSpanAcquired()};
        
        /* Search for voices that have a playing source. */
        for(Voice *voice : voices)
        {
            const Voice::State vstate{voice->mPlayState.load(std::memory_order_acquire)};
            if(vstate != Voice::Stopped && vstate != Voice::Pending)
            {
                return true;
            }
        }
    }
    
    return false;
}

I'm not sure if this is the most appropriate or efficient solution, but it works for my use case. I was hoping to get feedback on whether there is a better or more standard way to reduce CPU usage when no audio sources are playing, particularly in relation to HRTF processing.

Thank you!

@kcat
Copy link
Owner

kcat commented Sep 7, 2024

With HRTF, the post-process function applies HRTF to a 4-channel mix, that effects and some/all sources mix into. I'm a bit curious about why it's taking 6% CPU just to apply HRTF to those 4 channels, which would be roughly equivalent to applying HRTF to 4 mono sources. The phone's general performance isn't that bad, is it? Maybe it has to do with some kind of power-saving behavior, since it's only applying that HRTF and little else, perhaps the CPU is kept in a lower power state which makes the cost appear larger, and doing more would increase the CPU speed and lower the relative percentage.

You can use the ALC_SOFT_pause_device extension to pause processing on the device if you know you're not playing anything and have nothing to play for the time being. It wouldn't be very efficient to pause the device whenever nothing plays, since it takes a bit of time to start and stop the device playback and the post-process cost would be there every moment sound is playing anyway, but it can be useful if you know nothing will be playing for some time.

As for the code, a small issue there is you shouldn't get the context and voice lists a second time in the mixer, since they can be asynchronously changed. I don't think anything too bad should happen, but it may end up checking some things that weren't mixed, or miss checking what may have been. That's unlikely to actually happen often though, and is unlikely to cause incorrect return values even when it does happen.

A slightly bigger issue is that the voice state should be checked before it's mixed, rather than right before post-processing is applied. When voice->mix is called, the state is updated after it's mixed, so a source that just mixed its final output will be detected as Stopped. The voice's output may already be silent by its final output, but there can be times it's not, especially for sources stopped with alSourcePause, alSourceStop, or alSourceRewind.

But a more fundamental problem is that there can still be audio playing even if no sources are. Effects like reverb can be audible for several seconds after sources stop playing, and the post-process itself can have a bit of residual audio to play when input is silenced. For this to work, in addition to checking if any sources/voices have mixed anything, there would also need to be a way to check if any effects output anything, and the post-process function should get at least one final run. I don't see this as being particularly easy to implement, increasing the CPU costs when it is playing, and not saving much with normal usage.

Audio servers/services will usually have a timeout delay, where it will wait a bit of time after the last stream stops before suspending the device. That helps it avoid the potential delay when a stream starts immediately after one stops, where it wouldn't have saved that much CPU time anyway, and still keep power use down during the long periods where no audio is playing. The pause extension enables apps to do similar behavior with OpenAL output.

@fbirot
Copy link
Author

fbirot commented Sep 9, 2024

Thanks a lot for your detailed response. Here is a bit more context and some additional observations based on your feedback.

CPU Usage Details:

As mentioned, on an iPhone 15 Pro, I'm seeing:

With compilation optimizations:

  • 6% CPU usage when no sources are playing.
  • 15% CPU usage when a sound is playing.

Without optimizations:

  • 21% CPU usage when no sources are playing.
  • 50% CPU usage when a sound is playing.

The iPhone 15 Pro is highly performant (scoring 2896 in single-core and 7193 in multi-core on Geekbench), which is why I was surprised to see 6% CPU usage even when no sources are playing. However, I thought that HRTF calculation was quite intensive, so the 6% CPU usage doesn't surprise me if 4 channels are being processed all the time.

Question about HRTF:

You mentioned that HRTF is applied to a 4-channel mix. Could you explain me how it works ? My understanding was that HRTF would apply to two channels (left and right) per source, with mixing happening at the end.

ALC_SOFT_pause_device:

I did try using the ALC_SOFT_pause_device extension, but I noticed some issues:

  • There seems to be a bit of latency when resuming from the paused state (but I haven't measured it, it is just a feeling)
  • Additionally, I noticed audio artifacts when pausing and then reactivating the device to play a sound.

Code Adjustments:

I made some changes to my code based on your remarks.

I initially tried calling hasPlayingVoices before ProcessContexts to handle the fact that some voice states might change in ProcessContexts (if I understoof correctly). However, I encountered artifacts (clicks) when the sound started playing.

I thus modified the ProcessContexts function itself by adding an output parameter: bool * isPostProcessNecessary. I'm not very happpy with it, but that is the easiest and safest way I found. Within ProcessContexts, I set isPostProcessNecessary to true if hasPlayingVoices or if hasActiveEffects is true (to account for any active effects).

Here is the code of the function after my modification:

void ProcessContexts(DeviceBase *device, const uint SamplesToDo, bool *isPostProcessNecessary)
{
    ASSUME(SamplesToDo > 0);

    const nanoseconds curtime{device->ClockBase +
        nanoseconds{seconds{device->SamplesDone}}/device->Frequency};
    
    bool hasPlayingVoices = false;
    bool hasActiveEffects = false;

    for(ContextBase *ctx : *device->mContexts.load(std::memory_order_acquire))
    {
        const EffectSlotArray &auxslots = *ctx->mActiveAuxSlots.load(std::memory_order_acquire);
        const al::span<Voice*> voices{ctx->getVoicesSpanAcquired()};

        /* (...) Removed code to facilitate reading */

        /* Process voices that have a playing source. */
        for(Voice *voice : voices)
        {
            const Voice::State vstate{voice->mPlayState.load(std::memory_order_acquire)};
            if(vstate != Voice::Stopped && vstate != Voice::Pending)
            {
                voice->mix(vstate, ctx, curtime, SamplesToDo);
                hasPlayingVoices = true;
            }
        }

        /* Process effects. */
        if(const size_t num_slots{auxslots.size()})
        {
            /* (...) Removed code to facilitate reading */

            for(const EffectSlot *slot : sorted_slots)
            {
                EffectState *state{slot->mEffectState.get()};
                state->process(SamplesToDo, slot->Wet.Buffer, state->mOutTarget);
                hasActiveEffects = true;
            }
        }

        /* (...) Removed code to facilitate reading */
    }
    
    if(isPostProcessNecessary != nullptr)
    {
        *isPostProcessNecessary = (hasPlayingVoices || hasActiveEffects);
    }
}

Otherwise maybe ProcessContexts could update two variables in the device, that would indicate whether some voices are playing and if some effects are active.

Thanks

@kcat
Copy link
Owner

kcat commented Sep 10, 2024

CPU Usage Details:

As mentioned, on an iPhone 15 Pro, I'm seeing:

With compilation optimizations:

* 6% CPU usage when no sources are playing.

* 15% CPU usage when a sound is playing.

Without optimizations:

* 21% CPU usage when no sources are playing.

* 50% CPU usage when a sound is playing.

The iPhone 15 Pro is highly performant (scoring 2896 in single-core and 7193 in multi-core on Geekbench), which is why I was surprised to see 6% CPU usage even when no sources are playing. However, I thought that HRTF calculation was quite intensive, so the 6% CPU usage doesn't surprise me if 4 channels are being processed all the time.

For reference, HRTF could work with multiple sources on a single-core 1 ~ 1.5GHz desktop CPU (or at least it could, there hasn't been significant changes to the core mixer that should add significant CPU drain as far as HRTF is concerned). The 9% increase for a single source, when it's a 6% baseline with just the post-processor is far higher than I expected. It might be a little more sensible if it's a stereo source, then it's effectively doing HRTF on two channels (the left and right source buffer channels are mixed independently) so one source channel would be adding about 4.5%, but that still seems a bit high.

Was NEON detected at build and runtime? What version/commit of OpenAL Soft are you using?

Question about HRTF:

You mentioned that HRTF is applied to a 4-channel mix. Could you explain me how it works ? My understanding was that HRTF would apply to two channels (left and right) per source, with mixing happening at the end.

For each source buffer channel (1 for a mono source buffer, 2 for a stereo source buffer, etc), HRTF is applied to one input channel and mixed to two output channels (one for the left ear/output, one for the right ear/output). The specific filter applied is dependent on the direction the sound is intended to come from.

However, not all sound comes directly from sources, such as effects, and not all source buffer formats have channels that can be defined in terms of a direction. The device also includes a B-Format mixing buffer for this extra audio; B-Format being a speaker-agnostic representation of 3D audio, using as few as 4 channels for full 3D audio. Effects and B-Format source buffer formats mix into that buffer using a plain summation mix rather than having HRTF filters applied to each directly. Instead, special HRTF filters are applied to these 4 channels after everything's mixed into them to get a binaural mix from it. This is what the post process does with HRTF enabled, applying HRTF to that B-Format mixing buffer in case anything's been mixed into it.

Incidentally, OpenAL Soft also has an option to mix sources into that B-Format mixing buffer too instead of directly applying HRTF to them. This can greatly increase performance since the costly HRTF is only a fixed post-process, everything else just does a plain mix into a 4-channel buffer, while still resulting in full 3D binaural audio. Though this comes at the cost of sound directionality being a bit more diffuse, the perceived direction of a sound being not being as pin-point accurate. This can be improved with other options to increase the number of channels in the B-Format mixing buffer to 9 or 16 channels, increasing the cost of the HRTF post-process to 9 or 16 channels, while sources and everything else do a plain mix into the 9- or 16-channel buffer.

ALC_SOFT_pause_device:

I did try using the ALC_SOFT_pause_device extension, but I noticed some issues:

* There seems to be a bit of latency when resuming from the paused state (but I haven't measured it, it is just a feeling)

* Additionally, I noticed audio artifacts when pausing and then reactivating the device to play a sound.

What kind of artifacts? If audio is still playing when you pause the device, it will cut off immediately, and if there's anything that didn't finish processing before pausing, it will resume as if nothing happened.

Code Adjustments:

I made some changes to my code based on your remarks.

I initially tried calling hasPlayingVoices before ProcessContexts to handle the fact that some voice states might change in ProcessContexts (if I understoof correctly). However, I encountered artifacts (clicks) when the sound started playing.

I thus modified the ProcessContexts function itself by adding an output parameter: bool * isPostProcessNecessary. I'm not very happpy with it, but that is the easiest and safest way I found. Within ProcessContexts, I set isPostProcessNecessary to true if hasPlayingVoices or if hasActiveEffects is true (to account for any active effects).

You could probably have it return whether any sources or effects were processed.

bool ProcessContexts(DeviceBase *device, const uint SamplesToDo)
{
    ASSUME(SamplesToDo > 0);

    const nanoseconds curtime{device->ClockBase +
        nanoseconds{seconds{device->SamplesDone}}/device->Frequency};

    /* These could be combined into one bool. */
    bool hasPlayingVoices = false;
    bool hasActiveEffects = false;

    /* ... */

   return hasPlayingVoices | hasActiveEffects;
}

Then the return value gets OR'd with a local bool variable in the caller. If the resulting bool is false, none of the contexts processed anything, otherwise something did.

@fbirot
Copy link
Author

fbirot commented Sep 10, 2024

Thanks a lot for all those precisions.

Was NEON detected at build and runtime? What version/commit of OpenAL Soft are you using?

Yes, I used the debugger and confirmed that the Neon mixer (mixer_neon.cpp) is being used. I am using the latest release, v1.23.1 (commit d3875f3).

What kind of artifacts? If audio is still playing when you pause the device, it will cut off immediately, and if there's anything that didn't finish processing before pausing, it will resume as if nothing happened.

I conducted further tests and ensured that I only call alcDevicePauseSOFT after the source has fully finished playing. Now, I hear a kind of fade-in effect when starting the playback, along with occasional small clicking sounds.
However, I’ve realized that the issue most likely stems from the fact that I’m using Bluetooth earphones. I believe that when I call alcDevicePauseSOFT, no audio is sent via Bluetooth anymore. When I resume playback, the audio is sent over Bluetooth again. It's possible that the earphones exhibit specific behavior (like fading or ramping up CPU usage) when they start receiving sound. I tested without the earphones, and I didn’t notice the issue anymore.

You could probably have it return whether any sources or effects were processed.

That’s true, but I actually opted to pass the bool as a parameter because I find it could be confusing for ProcessContexts to return a boolean that isn't related to success or failure.

Would it be possible to completely disable the B-Format mixing buffer when there are no active effects and no B-Format sources present? I tested modifying my code to only call PostProcess when effects are active, but it didn't work. I believe a better solution would be to properly disable the use of the B-Format buffer when it’s not needed, as this would significantly reduce CPU usage in many cases (not just when not playing sources).

@kcat
Copy link
Owner

kcat commented Sep 10, 2024

I conducted further tests and ensured that I only call alcDevicePauseSOFT after the source has fully finished playing. Now, I hear a kind of fade-in effect when starting the playback, along with occasional small clicking sounds.
However, I’ve realized that the issue most likely stems from the fact that I’m using Bluetooth earphones. I believe that when I call alcDevicePauseSOFT, no audio is sent via Bluetooth anymore. When I resume playback, the audio is sent over Bluetooth again. It's possible that the earphones exhibit specific behavior (like fading or ramping up CPU usage) when they start receiving sound. I tested without the earphones, and I didn’t notice the issue anymore.

alcDevicePauseSOFT effectively calls AudioOutputUnitStop on macOS/iOS/etc, so it does stop the output (OpenAL Soft doesn't mix any more, and stops the device from continuing to play audio). I don't know why there would be small clicking sounds though, if it's otherwise fine during playback. The mixer doesn't do anything special when resuming, it mixes and sends audio as requested by the system. Maybe the Bluetooth device doesn't like getting non-silent samples too soon after starting, before it "stablizes", but that's just a wild guess.

Would it be possible to completely disable the B-Format mixing buffer when there are no active effects and no B-Format sources present?

The B-Format mixing buffer is actually the main mixing buffer that sources and everything use normally. It's the buffer that is expected to be used. After everything's mixed into it, the post-process then takes that main mixing buffer and generates the discrete speaker feeds for the device output (the stereo channels, or 5.1, etc). HRTF adds a bit of a hack for normal sources to bypass the main buffer, to mix some sources directly to the device output buffer with an appropriate HRTF filter and skip the main buffer for those sources, while also applying HRTF to the main buffer for anything that still got mixed into it (like effects and other sources). It's not really expected for the B-Format mixing buffer to go unused, which is what makes it difficult to track when it is or isn't being used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants