Add `FromLinear` and `IntoLinear` lookup table creation to build script #416

Kirk-Fox · 2024-08-19T06:24:05Z

This PR adds the creation of float ↔ uint conversion lookup tables for FromLinear and IntoLinear to the crate's build script. As a result, this crate no longer has a dependency on fast-srgb8 (I have confirmed that the lookup table used by fast-srgb8 is identical to the one generated by the build script).

Along with this, I have added the features: float_lut (for u8 to float conversions), float_lut16 (for u8to float and u16 to float conversions), fast_uint_lut (for fast f32 to u8 conversions), and fast_uint_lut16 (for fast f32 to u8 and f32 to u16 conversions). Of these, I have added float_lut and fast_uint_lut16 as the default features. float_lut must be default to not cause a breaking change for Srgb and RecOetf, whose lookup tables were replaced by the build script. I included fast_uint_lut16 as a default feature since its largest generated table contains only 1152 u64s, although I would understand wanting to replace it with fast_uint_lut as the default since that only generates tables of less than 200 u32s.

Building the crate seems to take considerably longer now due to the new code. I added some statements to the main.rs file in the build script that might prevent the crate being built unnecessarily often (although I doubt it since every time I run a test it seems to rerun the build script even though I haven't changed anything). If there are improvements to be made, please let me know and I'll implement them.

Also, does removing a dependency qualify as a breaking change? If so, I can add back fast-srgb8 so that it can be removed at the next major update.

codspeed-hq · 2024-08-19T06:36:26Z

CodSpeed Performance Report

Merging #416 will degrade performances by 24.6%

_{Comparing Kirk-Fox:lookup (544e40f) with master (234309c)}

Summary

❌ 2 regressions
✅ 45 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

	Benchmark	`master`	`Kirk-Fox:lookup`	Change
❌	`matrix_inverse`	352.5 ns	467.5 ns	-24.6%
❌	`linsrgb_f64 to rgb_u8`	4.7 µs	5.3 µs	-11.49%

Kirk-Fox · 2024-08-19T08:13:38Z

I'm not sure I understand what's causing the tests to fail at this point.

Ogeon · 2024-08-19T09:31:07Z

Interesting. The longer build times are worrying, but there may be optimization opportunities. I'm still having that cold I mentioned, so I'll have a proper look when I'm feeling better. A few quick things for now:

I suppose it doesn't make sense to depend on fast-srgb8 anymore and that's not breaking, but its replacement has to be available unconditionally. It wasn't optional before.
The interpolation code doesn't need to be completely generated if it's largely the same every time.
The tests fail because IntoLinear is included unconditionally, but unused when some features are disabled. Expand the last command here.
For the cargo features in general, I would probably slice it the other way and have one (or two) per gamma function. That reduces the code size impact for each feature.

Kirk-Fox · 2024-08-19T15:17:20Z

Alright, thanks for the direction. I'll fix the issue with IntoLinear and FromLinear being included unconditionally when I get the chance. How should I handle the interpolation code instead? I think it is useful to have this general purpose algorithm somewhere in the codebase, so how might I reconcile that with not fully running the interpolation code each time?

Kirk-Fox · 2024-08-19T22:22:02Z

I changed around the features as you mentioned and I think I was also able to fix the issue with the build script running more often than it needs to. It should now, hopefully, only rerun if any of the files in the build folder are changed or if any features are changed. I also made it so that the code for Srgb isn't conditional so that it doesn't constitute a breaking change by removing fast-srgb8.

Currently, the conditional compilation for the 16-bit lookup table methods/structs are controlled by repeated uses of #[cfg(feature = "prophoto_lut")]. Ideally, I would have written it as #[cfg(any(feature = "prophoto_lut"))], so that it's clear this should be compiled if any 16-bit lookup table feature is enabled (if more are added later), but this leads to a warning for redundancy. Is there a more idiomatic way of doing that?

Additionally, even though the build script runs less often now, it is still quite slow and seems to drastically increase the time the PR tests take, so, as mentioned before, do you have a suggestion on how I can reconcile including the float to uint lookup table generation code somewhere in the codebase while not fully rerunning it each time?

Kirk-Fox · 2024-08-20T00:23:02Z

I may be starting to realize a way to optimize the build script, but it's going to take some further research. I'll hopefully be pushing a new commit within the week that will at least somewhat improve the build time.

palette/build/lut.rs

Kirk-Fox · 2024-08-22T23:13:34Z

I went ahead and optimized the table building to use integration instead of summation since these are all exponential or linear functions. I also removed the tests for errors and just put in the values for the number of mantissa bits used in the index since they do not seem to change (3 for u8s and 7 for u16s).

Kirk-Fox · 2024-08-23T00:34:39Z

Would it also be a good idea to generate the u8 to f32 tables separately? I'm not sure how much of a performance cost it is to convert from f64 to f32

Ogeon

Alright, I'm feeling much better now and the workweek is over. Thanks for your patience in the meantime. I have also had some time to think about this and the slower build times from the code generation makes me think we should move away from build.rs for this. Considering none of this really depends on anything other than cargo features, I think it would be possible to generate the code once and making sure the larger tables are opt-in.

I'll try some things with a separate codegen crate...

It's also going to take me a moment to digest what everything does here. Some of the generated code becomes a bit hard to follow. Bear with me.

Would it also be a good idea to generate the u8 to f32 tables separately? I'm not sure how much of a performance cost it is to convert from f64 to f32

You can try and see if it affects the benchmarks. Something seems to have made it a bit slower.

palette/build/lut.rs

Ogeon · 2024-08-24T16:19:50Z

I'm porting over the named colors in #417. I have also modernized it and made use of quote for nicer syntax.

Kirk-Fox · 2024-08-24T22:31:17Z

I'll start working on consolidating those into singular library functions and moving things to the codegen crate. I'll also try to document the code a bit more so it's clearer what's going on

Ogeon · 2024-08-25T09:04:28Z

Thank you and sorry for the extra work. I do think this direction will be better. Now when the cost of generating the code is paid for in advance, we could change the cargo feature setup to something simpler (sorry again). I would suggest something like this:

u8 to/from float - always included.
u16 to/from float - behind an opt-in gamma_lut_u16 feature.

Simple as that. This avoids inflating the test set too much and relies on the compiler's ability to disregard constants it doesn't use. The feature is more there so it's possible to keep the binary size slim. A u16 to f32 table is 256 kilobytes, while a u8 to f32 table is "only" one kilobyte, so it matters more if a u16 table is included by accident. I hope that makes sense...

As for documentation, anything that explains the reasoning (especially where it diverges from the original) will help future us or other people who haven't been part of the process. Like with other things, I don't want this to turn into a black box later. I would also like to have references to the original implementation(s). That's key for tracing them back to the original reasoning and having something to compare to if there refactoring is required or if there are any issues. I know it's perhaps not the most fun thing to do, but it's even less fun to not have them later. So thanks in advance for taking some time to explain it. 🙏

A side note with the new codegen crate; let me know if you run into any rough edges. One thing I have noticed with quote is that it's usually best to keep each invocation relatively small. Rust-analyzer and/or clippy seemed to struggle a bit with very large blocks. It's still generally better than strings IMO.

Ogeon

Nice, thank you! This is already easier to follow. The split with generated tables and handwritten functions seems to work well too. I will go into more details later.

palette/src/encoding/lut.rs

palette/src/encoding/srgb.rs

Ogeon

I hope all is well and sorry for the absence. I think I lost more momentum than expected when I got sick, but it's better now after some time away from programming related things during my free time.

I have added a few comments where I would like to have some documentation. I think this will be good to go once them and the other comments have been resolved. How does that sound?

Ogeon · 2024-09-14T11:58:57Z

codegen/src/lut.rs

+        let (linear_scale, alpha, beta) =
+            if let Some((linear_scale, linear_end)) = is_linear_as_until {
+                (
+                    Some(linear_scale),
+                    (linear_scale * linear_end - 1.0) / (linear_end.powf(gamma.recip()) - 1.0),
+                    linear_end,
+                )
+            } else {
+                (None, 1.0, 0.0)
+            };


A comment that describes what this calculates and how it relates to the transfer function would be nice. The u16 version can refer to this.

codegen/src/lut.rs

palette/src/encoding/lut.rs

Kirk-Fox · 2024-09-23T01:43:28Z

Sorry that I've been less active. The school semester started and is kicking my butt. Thank you for the extra comments and I'll try to address them soon

Ogeon · 2024-09-23T16:28:43Z

No worries, I know how it can be. 😬 Thanks for the update, and let me know if it turns out to be too stressful to find time. Good luck with the studies and don't forget to sleep!

Ogeon · 2024-10-27T11:04:33Z

Hi, how's it going? I just wanted to check in and also let you know that there are some fixes in master that you will need to (preferrably) rebase in if you get back to this. Clippy will complain about unrelated lifetime annotations otherwise.

How does it look regarding getting a moment for this PR? Would it help if we scope it down a bit? The most important change from my perspective is to have the rationale for the algorithm changes in your words. The rest are things I can take care of.

Kirk-Fox · 2024-10-28T18:26:39Z

Yeah, scoping it down would help. I also am trying to type up a short document describing how I arrived at the formulae that I did, since it's a bit too in-depth to be written in comments in the code. I'll still include a basic explanation of what's going on in the documentation, though. Does that sound alright?

Ogeon · 2024-10-28T19:19:33Z

That sounds great, thank you! I don't want you to feel like you are stuck with this, so the basic explanation could be good enough on its own if you want to keep it short. The point is to answer why this change was made (performance?) and what makes it equivalent to the original (referring to or showing an equation, algorithm, etc.). That usually works as reference for refactoring and bug fixing. Details for extra clarity are still appreciated, but don't sweat it.

When you feel like this is done, I would be grateful if you could also squash the commits into a single commit, like before. That should be it unless you want to address any of the other comments.

Ogeon · 2024-12-22T08:56:26Z

Hello again! I hope things are going well. I just want to check in and see if there's a chance that you will have time to at least squash and rebase this change anytime soon. Skip the documentation if it's too much. That can always be improved later if you happen to have time.

I would love to have it in the next release, which I'll put together sometime in the beginning of the next year. I may look into options to merge this anyway, if you didn't have time for it before, while of course preserving its connection to you for proper credits. Only to move it forwards. No hard feelings involved. That's not happening during the coming few weeks, though.

Kirk-Fox · 2025-01-07T02:24:06Z

Okay, I did a lot to hopefully better organize the code and make it more readable/understandable. I still hope to make a full write-up at some point in the future, but it's pretty involved. I didn't quite understand what you meant regarding hand-writing certain functions, so I didn't change anything on that front.

I also added the u8 to f32 tables as you suggested. I think something might be wrong with the benchmarking tool, so I hope to see how that's affected things after that's back up and running. If you decide we should also include u16 to f32 tables, let me know and I will add them.

Let me know if you have any other comments. Once it looks good I can squash the commits into one again.

Ogeon · 2025-01-07T12:53:03Z

Thank you, this is great! I should upgrade the benchmark action, like you noticed... I'll let you know when it should work again.

I still hope to make a full write-up at some point in the future, but it's pretty involved.

I'm absolutely satisfied with the current comments, but feel free to add to them later if you want to and have the time. This is voluntary work, after all, and things like your studies are more important.

I didn't quite understand what you meant regarding hand-writing certain functions, so I didn't change anything on that front.

It's fine, I just think they would be better to have in the "regular" source files, since they don't contain any generated values (other than the constant names), and only have the table constants in the generated code. Just to keep the types and their implementations together when possible. It's the kind of nit picks I can take care of later, so feel free to skip it.

If you decide we should also include u16 to f32 tables, let me know and I will add them.

We can hold off on them for now, since it's not a regression if they are a bit slower. I would like to check how they affect the package size before deciding. I realized that large tables may eventually become an issue with the max package size.

Ogeon · 2025-01-07T14:08:51Z

The benchmark should work now, after rebasing to bring in the latest test config. 🙂

codecov · 2025-01-07T18:07:46Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 83.11%. Comparing base (234309c) to head (b0aec2f).
Report is 2 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #416      +/-   ##
==========================================
+ Coverage   82.63%   83.11%   +0.47%     
==========================================
  Files         131      133       +2     
  Lines       20485    20622     +137     
  Branches    20485    20622     +137     
==========================================
+ Hits        16927    17139     +212     
+ Misses       3378     3343      -35     
+ Partials      180      140      -40

Components	Coverage Δ
palette	`83.18% <100.00%> (+0.50%)`	⬆️
palette_derive	`81.98% <ø> (ø)`

Ogeon · 2025-01-12T13:50:40Z

Thank you and sorry for not noticing that you had already squashed it before. Github isn't always super clear in that regard.

I had a look at the benchmark and it seems like the regression for f64 to u8 is pretty consistent. I'm not sure what's causing it, but I'll accept it as a future improvement opportunity. There's also a regression for f32 that's just below the threshold for appearing in the table: https://codspeed.io/Ogeon/palette/branches/Kirk-Fox%3Alookup. It could be that some slight difference prevents an optimization from kicking in. It's sometimes hard to know with tiny benchmarks like these.

But let's wrap this up. Thank you so much for your time and help! ⭐

okaneco reviewed Aug 20, 2024

View reviewed changes

palette/build/lut.rs Outdated Show resolved Hide resolved

Kirk-Fox marked this pull request as draft August 22, 2024 20:18

Kirk-Fox marked this pull request as ready for review August 22, 2024 23:13

Ogeon reviewed Aug 24, 2024

View reviewed changes

palette/build/lut.rs Outdated Show resolved Hide resolved

palette/build/lut.rs Outdated Show resolved Hide resolved

Ogeon mentioned this pull request Aug 24, 2024

Replace the build script with a codegent crate #417

Merged

Kirk-Fox force-pushed the lookup branch from e7a6c95 to a2e8ae7 Compare August 26, 2024 03:02

Kirk-Fox marked this pull request as draft August 26, 2024 03:23

Ogeon reviewed Aug 26, 2024

View reviewed changes

palette/src/encoding/lut.rs Outdated Show resolved Hide resolved

palette/src/encoding/srgb.rs Outdated Show resolved Hide resolved

Ogeon reviewed Sep 14, 2024

View reviewed changes

Kirk-Fox force-pushed the lookup branch from 548aad4 to b0aec2f Compare January 7, 2025 17:59

Add FromLinear and IntoLinear lookup table creation to build script

544e40f

Kirk-Fox force-pushed the lookup branch from 2279c6e to 544e40f Compare January 7, 2025 20:43

Kirk-Fox marked this pull request as ready for review January 7, 2025 20:45

Ogeon merged commit 8c19ae9 into Ogeon:master Jan 12, 2025
18 of 19 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `FromLinear` and `IntoLinear` lookup table creation to build script #416

Add `FromLinear` and `IntoLinear` lookup table creation to build script #416

Kirk-Fox commented Aug 19, 2024

codspeed-hq bot commented Aug 19, 2024 •

edited

Loading

Kirk-Fox commented Aug 19, 2024

Ogeon commented Aug 19, 2024 •

edited

Loading

Kirk-Fox commented Aug 19, 2024

Kirk-Fox commented Aug 19, 2024

Kirk-Fox commented Aug 20, 2024

Kirk-Fox commented Aug 22, 2024

Kirk-Fox commented Aug 23, 2024

Ogeon left a comment

Ogeon commented Aug 24, 2024

Kirk-Fox commented Aug 24, 2024

Ogeon commented Aug 25, 2024

Ogeon left a comment

Ogeon left a comment

Ogeon Sep 14, 2024

Kirk-Fox commented Sep 23, 2024

Ogeon commented Sep 23, 2024

Ogeon commented Oct 27, 2024

Kirk-Fox commented Oct 28, 2024

Ogeon commented Oct 28, 2024

Ogeon commented Dec 22, 2024 •

edited

Loading

Kirk-Fox commented Jan 7, 2025

Ogeon commented Jan 7, 2025 •

edited

Loading

Ogeon commented Jan 7, 2025

codecov bot commented Jan 7, 2025

Ogeon commented Jan 12, 2025

Add FromLinear and IntoLinear lookup table creation to build script #416

Add FromLinear and IntoLinear lookup table creation to build script #416

Conversation

Kirk-Fox commented Aug 19, 2024

codspeed-hq bot commented Aug 19, 2024 • edited Loading

CodSpeed Performance Report

Merging #416 will degrade performances by 24.6%

Summary

Benchmarks breakdown

Kirk-Fox commented Aug 19, 2024

Ogeon commented Aug 19, 2024 • edited Loading

Kirk-Fox commented Aug 19, 2024

Kirk-Fox commented Aug 19, 2024

Kirk-Fox commented Aug 20, 2024

Kirk-Fox commented Aug 22, 2024

Kirk-Fox commented Aug 23, 2024

Ogeon left a comment

Choose a reason for hiding this comment

Ogeon commented Aug 24, 2024

Kirk-Fox commented Aug 24, 2024

Ogeon commented Aug 25, 2024

Ogeon left a comment

Choose a reason for hiding this comment

Ogeon left a comment

Choose a reason for hiding this comment

Ogeon Sep 14, 2024

Choose a reason for hiding this comment

Kirk-Fox commented Sep 23, 2024

Ogeon commented Sep 23, 2024

Ogeon commented Oct 27, 2024

Kirk-Fox commented Oct 28, 2024

Ogeon commented Oct 28, 2024

Ogeon commented Dec 22, 2024 • edited Loading

Kirk-Fox commented Jan 7, 2025

Ogeon commented Jan 7, 2025 • edited Loading

Ogeon commented Jan 7, 2025

codecov bot commented Jan 7, 2025

Codecov Report

Ogeon commented Jan 12, 2025

Add `FromLinear` and `IntoLinear` lookup table creation to build script #416

Add `FromLinear` and `IntoLinear` lookup table creation to build script #416

codspeed-hq bot commented Aug 19, 2024 •

edited

Loading

Ogeon commented Aug 19, 2024 •

edited

Loading

Ogeon commented Dec 22, 2024 •

edited

Loading

Ogeon commented Jan 7, 2025 •

edited

Loading