-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scale gs sort bucket size with scene size #7155
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect
Damn - this is lovely. 😍 |
I am concerned about the added memory needs especially for large scenes on mobile with very limited memory. What about storing the distances in exponential space instead of linear space? something like Needs changes in min max calculation, as well as inverting the results. |
Hi @fimbox, Are you suggesting 4MB overhead is too much for a scene comprising, say, 3M splats? Such a scene would require over 50MB of compressed data, never mind the other work buffers we create. Percentage-wise I actually think this is a fine trade-off to lower flickering on large scenes. (Note that smaller scenes allocate smaller buffers). Nice suggestion about non-linear depth. I actually did try this (briefly), but it didn't result in smaller worst-case bucket size. Might be worth investigating further though or trying different mappings. As it stands, what I see using this update is that actually the linear buckets are fast and work surprisingly well with these added bits. Thanks so much for you input! |
Hi @slimbuck Here is a demo of non-linear depth with my (very) old gsplat port. You can switch between Linear on Non-linear by pressing the button. I reduced the bit size to 11 to have an effect on the current loaded scene: https://playcanvas.com/editor/scene/2124910 The idea is two-fold:
|
Hey this is so cool! I'm glad you couldn't resist! :D The non-linear buckets actually seem to work better here. I also implemented non-sorting behind the camera with updated buckets, but it was actually slower and having sorted splats behind the camera is good when camera moves and hasn't got latest sorted indices. Don't you want to just implement this in the engine and submit a PR? |
Need to report that after this change, one of our AVP gsplat demos became unusable. The demo features a large splat and was already near the performance threshold. Post-change, the AVP intermittently shows a black screen, suggesting CPU/GPU overload. The screen recovers briefly before going black again. |
Hi @fimbox, Thanks for reporting this issue. I'm really sorry for the performance regression. I was thinking of making the number of sorting bins configurable. This will help with performance, but as you say ideally we wouldn't be forced to chose between flicker and rendering speed. I have learned a few things on this topic since merging this PR: Firstly, the reason for the slowdown on large scenes might not be what you think. The sorting code running on CPU does run a bit slower now since there are more buckets, but that doesn't account for the rendering drop-off. The reason for rendering performance hit is that placing splats into more buckets results in more render re-ordering of splats. This results in more memory access cache misses, slowing rendering considerably. (For some context on the importance of memory ordering to performance see #6357). This implies that reordering splat data into render order at runtime might be the only solution here... Secondly, I realised that it's precisely when gaussians move from one bucket to the next that flickering occurs. (The counting sort we use is stable, so it keeps the splat order otherwise). This means that the more accurate log scales mentioned above actually exacerbate the flickering issue. I am busy investigating again approaches to speed up rendering in the face of these seemingly incompatible requirements. If you make any headway on this, please do share! Thanks |
Hi @slimbuck, I made the following observations while playing around with the SortWorker.update function:
However, the number of buckets needed is not just dependent on the number of splats per se (as done in the current implementation). It is also dependent on the splat distribution. Worst case: a few distant splats combined with a very detailed splat assembly in the middle can produce flicker, even with a small number of splats. To improve accuracy with limited memory, I tried:
I think a WebGPU sort is the next thing to try. What are your next steps? |
@fimbox I'm wondering whether leveraging SIMD could improve your WASM sort. |
Just quickly asked ChatGPT:
I don't have much insight into this stuff personally, but thought I'd kick off a discussion about WASM optimization... |
If there is very little communication between JS and WASM, and high computation workload in the meantime, with fixed memory footprint (no re-allocations), with simple raw buffer data communications, then WASM SIMD can work, but it very depends on what gets compiled into WASM, to ensure it is as small weight and dependency-less solution, that just solves a specific computation. Also worth remembering, that WASM SIMD will unlikely to perform as well as compute shaders solution with WebGPU, which in a long term is the way to go. |
I just optimized the hell out of the WASM radix sort (check here), only one vertex loop per pass and so on, its faster, but its still at half speed compared to the JS single bucket sort. I also played around with wasm SIMD, but the hardest part (Scatter) is not easily vectorizeable so the speed-up might be negligible. |
I initially wrote radix sort as well, use 4x8 bits. To speed it up, I switched it to 2x16bits. And the logical conclusion was a single pass, but dropped it to 16 bits. So perhaps try 2 passes instead of 4 as well? |
I created a project that compares the 3 approaches (linear bucket, non-linear and radix). It also outputs the sorting times to the logs. For us it seems we are good with non-linear mapping and we can easily patch it on demand. It solves flicker on close objects and is as fast as linear sorting. It has some flickering in distance areas though. With typical Scaniverse scans you can also observe flickering with the linear sorting approach even with higher buckets counts. https://playcanvas.com/project/1285780/overview/gaussian-splatting-sorting |
This PR scales the number of sorting buckets to the size of the scene. This is to address flickering found in larger scenes.
Notes:
This is the before and after comparison on urban scene:
BEFORE:
Screen.Recording.2024-11-29.at.11.34.30.mov
AFTER:
Screen.Recording.2024-11-29.at.11.35.21.mov