Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lots of fun #1

Open
wants to merge 14 commits into
base: rimbon
Choose a base branch
from
Open

Lots of fun #1

wants to merge 14 commits into from

Conversation

ghost
Copy link

@ghost ghost commented May 25, 2021

No description provided.

kdrag0n and others added 14 commits May 14, 2021 21:08
Unlinking death recipients (from linkToDeath) is a normal part of Binder
operation, and doing it when a BpBinder's refcount has reached 0 is not
much different.

This log message is constantly spamming when swiping as part of a back
navigation gesture:

04-05 22:29:41.402   655  3888 I BpBinder: onLastStrongRef automatically unlinking death recipients: <uncached descriptor>
04-05 22:29:41.413   655  3888 I BpBinder: onLastStrongRef automatically unlinking death recipients: <uncached descriptor>
04-05 22:29:41.424   655  3888 I BpBinder: onLastStrongRef automatically unlinking death recipients: <uncached descriptor>
04-05 22:29:41.435   655   681 I BpBinder: onLastStrongRef automatically unlinking death recipients: <uncached descriptor>
04-05 22:29:41.447   655  3888 I BpBinder: onLastStrongRef automatically unlinking death recipients: <uncached descriptor>

Logging is relatively expensive on Android and the log spam isn't
useful, so suppress this log message by setting it to the verbose log
level.

Change-Id: I774d0c59ca6f70a4e2ed33b9fac3fb5b86d8ff0a
Rendering 3 or more layers of blur on top of each other makes little to
no visual difference in the final result, but it comes at a big
performance penalty. Only blurring the two frontmost layers saves a lot
of GPU time with minimal difference in output quality.

Change-Id: I9ec8129751a183db00ad200080207434f086a63e
This improved terminology helps clarify what each uniform refers to,
which will become more important once we add dithering.

Change-Id: I607a08b63c6370d091412905d32bd7d89ae71769
This reduces ALU utilization by avoiding unnecessary calculations that
are discarded in the final fragment color. This saves ~100 µs in another
6-pass blur implementation.

Change-Id: I4ec24518eefce369dbb592ac03a14caeea86df9a
Blurring images usually creates a lot of gradual color transitions,
especially at higher radii. When the output is quantized to 8-bit RGB
(8 bpc / 24 bpp) for display, the lost information (quantization error)
results in visible banding. This is most apparent in scenes that are
predominantly grayscale because of the reduced color fidelity, but it
can still be an issue in scenes with many colors.

To fix the banding, this commit dithers the blur output during
quantization. This is done in the final output mixing step because
dithering works best when it operates on each individual pixel;
upscaling a dithered image will not help much.

Error diffusion dithering is ideal, but it is not practical for GPU
fragment shaders because it requires processing each pixel sequentually.
Instead, we use ordered dithering, which tiles a pattern across the
entire image to influence rounding during quantization.

The most visually-appealing pattern is used for ordered dithering: blue
noise [1]. Other patterns considered:
  - Bayer matrix: visible pattern in output
  - White (random) noise: high-frequency components are distracting and
    make tiling repetitions apparent
  - Interleaved gradient noise, generated in shader [2]: aliasing
    artifacts, somewhat visible pattern, and expensive ALU operations
  - Oculus Dither17 pattern, generated in shader [3]: same issues as
    interleaved gradient noise
Blue noise contains very few low-frequency components, making it ideal
for dithering because it tiles seamlessly and does not distract from the
actual image.

When dithering, the blue noise is reshaped from uniform distribution to
a triangular PDF distribution. This makes the noise appear more uniform
when used for dithering: instead of some areas having less visible noise
than others, the noise appears to be evenly spread across the entire
image [4] [5]. A naïve implementation of triangular reshaping is
relatively expensive (+60 µs), but using an optimized implementation of
sign() reduces the cost to ~30 µs [6].

Finally, to avoid adding noise to pure black (#000000) images or
affecting saturation, the dithering implementation needs to perform
gamma correction [5]. The real sRGB transfer function is relatively
expensive because it's a piecewise function with linear and exponential
(gamma 2.4) parts that average to gamma 2.2, so we approximate it
instead with gamma 2. This makes the performance cost of gamma
correction negligible while still producing acceptable results.

Most applications of dithering add up to 1x LSB (least significant bit -
i.e. 1/256 for 8-bpc output) of noise, but this implementation adds up
to 4x LSB (i.e. 1/64 for 8-bpc) of noise. While this adds more noise to
the output, it was empirically determined to be more effective for
reducing banding in color gradients than 1x LSB. There are still nearly
no visible noise artifacts when using blue noise and gamma correction.

Note that dithering requires the blurred image to be rendered at 10-bit
HDR (10 bpc / 30 bpp) internally; otherwise, it would just add noise
without fixing banding.

This increases rendering time by ~300 µs on an Adreno 640 GPU at
1440x3040 resolution, which is a worthwhile tradeoff for the
significant improvement in quality.

[1] http://momentsingraphics.de/BlueNoise.html
[2] http://www.iryoku.com/downloads/Next-Generation-Post-Processing-in-Call-of-Duty-Advanced-Warfare-v18.pptx
[3] https://developer.oculus.com/blog/tech-note-shader-snippets-for-efficient-2d-dithering/
[4] https://loopit.dk/banding_in_games.pdf
[5] https://loopit.dk/rendering_inside.pdf
[6] https://twitter.com/SebAaltonen/status/878250919879639040

Change-Id: I80559654a19c6cc6f2f53c94b64963d0bb888af5
The values of the 2D texture sampler uniforms never change, so set them
when initializing the BlurFilter instance instead of updating them every
frame. This reduces clutter in the rendering path and results in a
negligible performance gain from reducing API overhead.

Change-Id: I335ea0dbdd2d4409ea71b58b7c7f8b87b04a18d7
Unintuitively, combining the initial blur pass with downscaling makes
the rendering process slower because sampling from the high-resolution
image with bilinear sampling uses more memory bandwidth. It also
increases the total amount of ALU work because it effectively introduces
an unnecessary blur pass.

By downscaling the image with glBlitFramebuffer before running blur
passes, we can save a blur pass and render a more correct result. When
tested with 2 layers of another 6-pass blur implementation, this saves
~800 µs of rendering time on an Adreno 640 GPU at 1440x3040 resolution.

Change-Id: Ie897a52f1628e40d34c3c31f5f779020594bb091
While blur rendering is mostly memory-bound, reducing ALU utilization is
still helpful. This doesn't result in any noticeable artifacts in the
blurred image.

When tested with 2 layers of another 6-pass blur implementation, this
saves ~50 µs of rendering time on an Adreno 640 GPU at 1440x3040
resolution.

Change-Id: I09798e0d98665b92771e601704653609a240c065
Instead of calculating the UV coordinate to sample for each blur tap in
the vertex shader, calculate them as varying attributes in the vertex
shader and let the GPU interpolate them for each fragment. This allows
the GPU to prefetch pixel data because we no longer rely on dependent
texture reads.

When tested with 2 layers of another 6-pass blur implementation, this
saves ~100 µs of rendering time on an Adreno 640 GPU at 1440x3040
resolution.

Change-Id: Id2a700decef060ea23d1d5e2cccaea864581bf16
Using a VBO for a simple full-screen triangle is unnecessarily
complicated, and because blur rendering is memory-bound, every little
bit of memory bandwidth counts. This removes the VBO in favor of
calculating the UV and clip space coordinates for a full-screen triangle
in the vertex shader, which only takes a few ALU instructions in a
shader that is only run 3 times per rendering step.

When tested with 2 layers of another 6-pass blur implementation, this
saves ~50 µs of rendering time on an Adreno 640 GPU at 1440x3040
resolution.

Change-Id: I6a3c6d691baa54412f4e04891129f101be6cdb60
This signals to the GPU driver that the FBO contents do not need to be
preserved. According to ARM, invalidating framebuffers after rendering
and unbinding them won't do anything on Mali GPUs [1], but it improves
performance with Qualcomm's Adreno GPU drivers.

When tested with 2 layers of another 6-pass blur implementation, this
saves ~100 µs of rendering time on an Adreno 640 GPU at 1440x3040
resolution.

[1] https://community.arm.com/developer/tools-software/graphics/b/blog/posts/mali-performance-2-how-to-correctly-handle-framebuffers

Change-Id: Ib26e904e66fbc95924d14bbe132cf8a0505d4f19
We need set policy before dropping capabilities.

Bug: 175178520
Test: trace and check sched_policy
Change-Id: I24114325f00f19113b360f71a427ebf6bf45dc9b
Signed-off-by: Akash <[email protected]>
Change-Id: I59021fd41dcea891de0dc7c9e1183be3985ed9fa
This is as low as we can go (in steps of 5%) without significant
artifacts on 1440p.

Change-Id: I029f0de3e94dcd8fae3ea548ab1b86e2093674b3
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants