-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Approx tests #130
base: main
Are you sure you want to change the base?
Approx tests #130
Conversation
@DanaCase Hi, glad you enjoy the project! You raise a valid concern. I do not remember that anyone have reported different tests results on different hardware, but if this is the case then we should fix that, maybe with our suggested approach. Out of curiosity, which MLX version are you using (should be For reference, I have tested the latest main on my M1 Pro and M2 Max running 2 versions of MacOS (Sonoma and Sequoia) with identical test results. One thing I have noted earlier, but not bothered to fix yet, is that since the image generation tests saves a temporary file to disk, it can sometimes fail to be removed properly between tests, causing the test code to read the wrong image (this is because the existing Regarding pixel-level specific tests, I can think of some pros/cons with this approach: Pros:
Cons:
The most important things we want to capture with the tests is the perpetual similarity to the reference images, and if you get test failures on a clean install of the project (i.e with |
I have fixed this unintended behaviour for the tests in the latest |
Thanks for the response and the update to main. I am running MLX 0.23.1, and tried running tests on a clean install (including clearing my hugging-face cache) but am still seeing slight differences in the output images. It makes sense to aim for pixel-perfect comparisons when possible. A conditional test might be useful, allowing a perfect match for testing changes during local refactoring or other cases where pixel precision is expected, but not required for a pre-commit hook. |
Hi there,
I love your project! It has been a great tool for me to train LoRAs locally on my mac. I was playing around and trying to run tests locally.
I noticed that the image generation tests failed on a pixel by pixel comparison, but the images looked perceptually identical.
For example:


Is there a reason to expect outputs to be pixel-identical on different setups? It makes sense to me that there would be some small differences from float arith etc...
For reference I'm using an M4 Max.
Anyway I made a MSE comparison that is still pretty strict but still allows for some small differences in output.
What do you think?