-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lots of fun #1
Open
ghost
wants to merge
14
commits into
Bootleggers-BrokenLab:rimbon
Choose a base branch
from
unknown repository
base: rimbon
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Lots of fun #1
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Unlinking death recipients (from linkToDeath) is a normal part of Binder operation, and doing it when a BpBinder's refcount has reached 0 is not much different. This log message is constantly spamming when swiping as part of a back navigation gesture: 04-05 22:29:41.402 655 3888 I BpBinder: onLastStrongRef automatically unlinking death recipients: <uncached descriptor> 04-05 22:29:41.413 655 3888 I BpBinder: onLastStrongRef automatically unlinking death recipients: <uncached descriptor> 04-05 22:29:41.424 655 3888 I BpBinder: onLastStrongRef automatically unlinking death recipients: <uncached descriptor> 04-05 22:29:41.435 655 681 I BpBinder: onLastStrongRef automatically unlinking death recipients: <uncached descriptor> 04-05 22:29:41.447 655 3888 I BpBinder: onLastStrongRef automatically unlinking death recipients: <uncached descriptor> Logging is relatively expensive on Android and the log spam isn't useful, so suppress this log message by setting it to the verbose log level. Change-Id: I774d0c59ca6f70a4e2ed33b9fac3fb5b86d8ff0a
Rendering 3 or more layers of blur on top of each other makes little to no visual difference in the final result, but it comes at a big performance penalty. Only blurring the two frontmost layers saves a lot of GPU time with minimal difference in output quality. Change-Id: I9ec8129751a183db00ad200080207434f086a63e
This improved terminology helps clarify what each uniform refers to, which will become more important once we add dithering. Change-Id: I607a08b63c6370d091412905d32bd7d89ae71769
This reduces ALU utilization by avoiding unnecessary calculations that are discarded in the final fragment color. This saves ~100 µs in another 6-pass blur implementation. Change-Id: I4ec24518eefce369dbb592ac03a14caeea86df9a
Blurring images usually creates a lot of gradual color transitions, especially at higher radii. When the output is quantized to 8-bit RGB (8 bpc / 24 bpp) for display, the lost information (quantization error) results in visible banding. This is most apparent in scenes that are predominantly grayscale because of the reduced color fidelity, but it can still be an issue in scenes with many colors. To fix the banding, this commit dithers the blur output during quantization. This is done in the final output mixing step because dithering works best when it operates on each individual pixel; upscaling a dithered image will not help much. Error diffusion dithering is ideal, but it is not practical for GPU fragment shaders because it requires processing each pixel sequentually. Instead, we use ordered dithering, which tiles a pattern across the entire image to influence rounding during quantization. The most visually-appealing pattern is used for ordered dithering: blue noise [1]. Other patterns considered: - Bayer matrix: visible pattern in output - White (random) noise: high-frequency components are distracting and make tiling repetitions apparent - Interleaved gradient noise, generated in shader [2]: aliasing artifacts, somewhat visible pattern, and expensive ALU operations - Oculus Dither17 pattern, generated in shader [3]: same issues as interleaved gradient noise Blue noise contains very few low-frequency components, making it ideal for dithering because it tiles seamlessly and does not distract from the actual image. When dithering, the blue noise is reshaped from uniform distribution to a triangular PDF distribution. This makes the noise appear more uniform when used for dithering: instead of some areas having less visible noise than others, the noise appears to be evenly spread across the entire image [4] [5]. A naïve implementation of triangular reshaping is relatively expensive (+60 µs), but using an optimized implementation of sign() reduces the cost to ~30 µs [6]. Finally, to avoid adding noise to pure black (#000000) images or affecting saturation, the dithering implementation needs to perform gamma correction [5]. The real sRGB transfer function is relatively expensive because it's a piecewise function with linear and exponential (gamma 2.4) parts that average to gamma 2.2, so we approximate it instead with gamma 2. This makes the performance cost of gamma correction negligible while still producing acceptable results. Most applications of dithering add up to 1x LSB (least significant bit - i.e. 1/256 for 8-bpc output) of noise, but this implementation adds up to 4x LSB (i.e. 1/64 for 8-bpc) of noise. While this adds more noise to the output, it was empirically determined to be more effective for reducing banding in color gradients than 1x LSB. There are still nearly no visible noise artifacts when using blue noise and gamma correction. Note that dithering requires the blurred image to be rendered at 10-bit HDR (10 bpc / 30 bpp) internally; otherwise, it would just add noise without fixing banding. This increases rendering time by ~300 µs on an Adreno 640 GPU at 1440x3040 resolution, which is a worthwhile tradeoff for the significant improvement in quality. [1] http://momentsingraphics.de/BlueNoise.html [2] http://www.iryoku.com/downloads/Next-Generation-Post-Processing-in-Call-of-Duty-Advanced-Warfare-v18.pptx [3] https://developer.oculus.com/blog/tech-note-shader-snippets-for-efficient-2d-dithering/ [4] https://loopit.dk/banding_in_games.pdf [5] https://loopit.dk/rendering_inside.pdf [6] https://twitter.com/SebAaltonen/status/878250919879639040 Change-Id: I80559654a19c6cc6f2f53c94b64963d0bb888af5
The values of the 2D texture sampler uniforms never change, so set them when initializing the BlurFilter instance instead of updating them every frame. This reduces clutter in the rendering path and results in a negligible performance gain from reducing API overhead. Change-Id: I335ea0dbdd2d4409ea71b58b7c7f8b87b04a18d7
Unintuitively, combining the initial blur pass with downscaling makes the rendering process slower because sampling from the high-resolution image with bilinear sampling uses more memory bandwidth. It also increases the total amount of ALU work because it effectively introduces an unnecessary blur pass. By downscaling the image with glBlitFramebuffer before running blur passes, we can save a blur pass and render a more correct result. When tested with 2 layers of another 6-pass blur implementation, this saves ~800 µs of rendering time on an Adreno 640 GPU at 1440x3040 resolution. Change-Id: Ie897a52f1628e40d34c3c31f5f779020594bb091
While blur rendering is mostly memory-bound, reducing ALU utilization is still helpful. This doesn't result in any noticeable artifacts in the blurred image. When tested with 2 layers of another 6-pass blur implementation, this saves ~50 µs of rendering time on an Adreno 640 GPU at 1440x3040 resolution. Change-Id: I09798e0d98665b92771e601704653609a240c065
Instead of calculating the UV coordinate to sample for each blur tap in the vertex shader, calculate them as varying attributes in the vertex shader and let the GPU interpolate them for each fragment. This allows the GPU to prefetch pixel data because we no longer rely on dependent texture reads. When tested with 2 layers of another 6-pass blur implementation, this saves ~100 µs of rendering time on an Adreno 640 GPU at 1440x3040 resolution. Change-Id: Id2a700decef060ea23d1d5e2cccaea864581bf16
Using a VBO for a simple full-screen triangle is unnecessarily complicated, and because blur rendering is memory-bound, every little bit of memory bandwidth counts. This removes the VBO in favor of calculating the UV and clip space coordinates for a full-screen triangle in the vertex shader, which only takes a few ALU instructions in a shader that is only run 3 times per rendering step. When tested with 2 layers of another 6-pass blur implementation, this saves ~50 µs of rendering time on an Adreno 640 GPU at 1440x3040 resolution. Change-Id: I6a3c6d691baa54412f4e04891129f101be6cdb60
This signals to the GPU driver that the FBO contents do not need to be preserved. According to ARM, invalidating framebuffers after rendering and unbinding them won't do anything on Mali GPUs [1], but it improves performance with Qualcomm's Adreno GPU drivers. When tested with 2 layers of another 6-pass blur implementation, this saves ~100 µs of rendering time on an Adreno 640 GPU at 1440x3040 resolution. [1] https://community.arm.com/developer/tools-software/graphics/b/blog/posts/mali-performance-2-how-to-correctly-handle-framebuffers Change-Id: Ib26e904e66fbc95924d14bbe132cf8a0505d4f19
We need set policy before dropping capabilities. Bug: 175178520 Test: trace and check sched_policy Change-Id: I24114325f00f19113b360f71a427ebf6bf45dc9b Signed-off-by: Akash <[email protected]>
Change-Id: I59021fd41dcea891de0dc7c9e1183be3985ed9fa
This is as low as we can go (in steps of 5%) without significant artifacts on 1440p. Change-Id: I029f0de3e94dcd8fae3ea548ab1b86e2093674b3
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.