-
Notifications
You must be signed in to change notification settings - Fork 278
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EGL_KHR_partial_update for FBOs #547
Comments
To be clear, you want to specify a rectangle(s) for renderbuffers or textures that are render targets, like a scissor or similar to QCOM's: https://registry.khronos.org/OpenGL/extensions/QCOM/QCOM_tiled_rendering.txt ? And a tiler would limit its fetch and write back to those rectangles? saving some bandwidth - being the benefit here? |
Yeah, I think that's an accurate description, cc @fooishbar. |
@pdaniell-nv , Piers, lets put this on the GLEs/EGL agenda for discussion. thx |
We discussed this briefly and we're not sure there is a strong use case for FBOs. For EGL there can be more than one context accessing the surface, so the window-level extension makes sense. But for FBOs, which are within the scope of a GL context, it seems all the information needed is already available, and there are no external factors to keep track of. |
I'm using FBOs in a Wayland compositor. The FBOs are bound to an EGLImageKHR which comes from a DMA-BUF, shared with Wayland client or the kernel. |
Not my expertise, but I would expect that the binning process would exclude any area that doesn't have a draw, so no tiles would be fetched or written for that area. So it should be automatic- no? |
EGL_KHR_partial_update was for communicating a damage region, or scissor effectively, between an EGL surface and the thing where that surface was presented. E.g., between a wayland client and a wayland compositor, or a wayland compositor and a DRM-KMS display that supported self-refresh and/or partial shadow framebuffer updates of some sort. If it helped some tiler logic, I'm pretty sure that was by accident, and I'm curious how it helped. Regardless, that doesn't sound anything like a tiler spilling optimization thing described above, which as noted, should be local and shouldn't care whether an EGLImage wrapping a dma-buf or a simple texture is being used, and should be accomplishable with existing GL mechanisms. Are you instead trying to avoid a tiler flush/writeback between the client rendering and the compositor texturing or something? That seems like it would be way more complicated than a partial update spec unless the hardware does something like that automatically internally at the HW or firmware level, in which case it seems like we'd need more info on exactly what the tiler needs to accomplish that beyond what it has now. Now I'm just guessing at what's going on though. Can you be more specific about what it is that goes wrong when you switch from using an EGLSurface to an FBO in the wayland compositor code? |
You're thinking of
That was absolutely the intent behind and the cause for |
I was concerned from beginning that details for partial_update, swap_with_damage, buffer age, and window system particulars would confuse the specific use case here. Hence the need to clarify. |
You're right. Now I remember constantly confusing the two even when we were working on them as well. I'm still not clear what additional API is being requested though. |
One can port code from What I'm requesting is an equivalent of |
Your render target has the contents of the previous draw, so you know it preserved, and you don't need to account for 'n' previous frames. |
No, my render target has the contents of the n-th previous draw. I have my own swapchain implementation with triple buffering. I'm unsure how to check whether the tiler is doing some useless fetches or not. |
I see. You are cycling through n-textures like a swapchain. OK. |
Okay, that's helpful! I'd definitely be happy to drop this issue if Mesa drivers already Do The Right Thing™. I'm not too knowledgeable about tilers, only happened to notice the discrepancy between |
Right, but what do you propose as an API for that? I'm unclear what information conveyance is missing. As Jeff says, there are other APIs (I'm not clear why glScissor() is insufficient, but if it is, I would assume the QCOM ext is sufficient?) to convey the concept of a partial framebuffer update. The EGL partial update extension seems mostly useful to plumb some data in before OpenGL commands trigger a readback, but when you're by definition using only GL as a client API, it doesn't seem like there needs to be any winsys-level API involved, since it doesn't have any special surface-level knowledge to add. I could still be wrong. I've always had trouble with the partial update concept, so I'm just trying to understand what is missing at the API level Vs. implementation level. Hopefully Jeff is right and all is already well. Are you seeing perf or power degradation after switching up the code in wlroots, or just assuming things are suboptimal? |
Hm, my understanding was that indeed drivers need the information ahead-of-time, and that glBindFramebuffer(GL_FRAMEBUFFER, fbo);
glScissor(0, 0, 42, 42);
// draw…
glScissor(100, 100, 42, 42);
// draw again… The second scissor here comes "too late" AFAIU. The API could look like this: void glSetFramebufferDamageRegion(GLenum target, EGLint *rects, EGLint n_rects); called right after
It's tricky to tell, because wlroots has switched to FBOs a long time ago. I will try to compare wlroots and Weston, but not sure the results will be meaningful. |
OK, I can see how a multi-region command might allow larger batches, but I would have assumed with two scissors, you just get two batches of rendering corresponding to each scissored set of draws. I'm curious whether the tilers actually do more with multiple regions, or whether they just union them into a rect spanning both. Regardless, it does seem worth validating that there is something to gain here before adding API. Either by inspection of driver code/GPU specs by the relevant IHVs, or via perf measurements. |
" I'm curious whether the tilers actually do more with multiple regions, or whether they just union them into a rect spanning both." |
Imagination Tech and Qualcomm also think that bins that have no draw will not be fetched or drawn, but that can be invalidated with a clear. If RenderDoc or something can indicate the bins - unknown, its a very internal optimization feature. |
What can one use to benchmark this kind of thing? In kwin, I believe that we merged partial update support while taking "it improves performance" for granted. |
Same for Arm (except for some older GPUs). We could still get some benefit from partial updates, but probably a micro-optimization at this point.
Easiest if you could get vendor specific performance counters, I expect. In extreme cases, say updating a single pixel in a large resolution frame, you may see performance impact if an implementation fetched the full frame vs only the modified bin/area. |
@jadahl has pointed me to a Mutter merge request from Erico Nunes:
|
EGL_KHR_partial_update
allows tilers to more efficiently re-use a surface buffer as a render target. It assumesEGLSurface
is used, and doesn't support FBOs.It would be nice to have an equivalent extension for FBOs.
The text was updated successfully, but these errors were encountered: