Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replacing Shader Code for FMA #506

Merged
merged 1 commit into from
Oct 21, 2024
Merged

Replacing Shader Code for FMA #506

merged 1 commit into from
Oct 21, 2024

Conversation

Malidos
Copy link
Contributor

@Malidos Malidos commented Oct 2, 2024

I added fma into all shader snippets I could identify, fulfilling #26.2, but since fma only accepts variables of the same type I had to replace some float constants with vectors.

fma combines multiplication and addition into a single operator which should double the speed of that specific operation, but since the Shader contains many other more complex operations, this resulted in an insignificant performance increase of about 1% which makes me doubt whether this should be merged at all, but that's up to you.

It should be compatible with the changes in #500 because it introduces a replacement macro for fma.

@TokisanGames
Copy link
Owner

Thank you. When making PRs you need to make a branch in your repository, then make a (Pull) Request to merge your branch into our main. Never your main to my main, and never add merge commits into your PRs. Never use git merge, nor "sync" on gh desktop. You can probably fix it by dropping the merge commit; I can probably as well.

Read and follow the Godot PR Workflow (and all our contributor docs).

At the least, I won't merge until 500 is merged, then we can rebase this and look at it then. I'll want to try it on my system and get more data points.

Why the changes from vec4s to mat4s?

@Xtarsia Thoughts?

@Xtarsia
Copy link
Contributor

Xtarsia commented Oct 2, 2024

any case where there are extra steps needed just to make fma() work, are usually not worth it. the same applies to nesting fma() instructions, which can prevent the compiler from being able to parallelize something it might have before.

the swap to mat4 and the following nested fma() are likley not gaining anything - and may even be slowing things down. But other areas like the rotation functions could well be faster.

it is handy to have this laid out with all currently possible places where fma() could be put.

@TokisanGames
Copy link
Owner

@Xtarsia Ok, I'll go through and take the simpler ones, especially in utility functions, and skip the nested fmas and conversions to mat4s and vec3s before release.

@TokisanGames TokisanGames self-assigned this Oct 9, 2024
@TokisanGames TokisanGames added this to the 0.9 milestone Oct 9, 2024
@TokisanGames TokisanGames added the shaders Shader development label Oct 9, 2024
Inserted as many fma functions as I could identify, which required changing the weights for the final texture adjustment to a matrix since fma only accepts variables of a the same type.
@TokisanGames TokisanGames merged commit 74533cc into TokisanGames:main Oct 21, 2024
12 checks passed
@TokisanGames
Copy link
Owner

After more research it seems that fma is one operation while a*b+c is two, so it is technically faster and may protect against precision loss. However, compilers are able to convert between the two so may already be optimizing our code. Also nesting should be no issue.

On my card, it also makes no difference; again perhaps the compiler is already using fma where it makes sense. We find much better gains with new algorithm improvements.

It was a worthy experiment, thanks for testing it out. I reverted the more complicated ones where clarity of code is preferred and kept a few in the utility calls.

Reminder: on future PRs make a branch not named main, and rebase when needed; never merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
shaders Shader development
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants