-
Notifications
You must be signed in to change notification settings - Fork 307
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] Visual Regression Testing #1222
Comments
This comment was marked as spam.
This comment was marked as spam.
2 similar comments
This comment was marked as spam.
This comment was marked as spam.
This comment was marked as spam.
This comment was marked as spam.
This comment was marked as spam.
This comment was marked as spam.
6 similar comments
This comment was marked as spam.
This comment was marked as spam.
This comment was marked as spam.
This comment was marked as spam.
This comment was marked as spam.
This comment was marked as spam.
This comment was marked as spam.
This comment was marked as spam.
This comment was marked as spam.
This comment was marked as spam.
This comment was marked as spam.
This comment was marked as spam.
I would pay $50 a month for this feature |
This comment was marked as spam.
This comment was marked as spam.
Hey all! We're thinking about implementing this feature and we need to know what you want :) All feedback, opinions, ideas, are very much appreciated. New command -
|
Love these ideas, I would love to add some snapshot testing to our workflows - the where to store/upload and then compare to is an interesting problem though!
My other initial thoughts:
|
I wonder if converting the image to webp or another format would help with the size issue. |
Firstly: WOOO! This looks awesome! Little stuff:
Bigger stuff:
On your questions:
|
@bartekpacia That's great news, thanks a lot for considering this! 🤩 We have an app with dynamic content, so partial screenshot comparison would be much appreciated. Maybe this could at some point even lead to screenshot selectors, e.g. "click on the book cover looking like the image in this file". One question for the screenshot taken if comparison with golden fails: will it also include diff markers (e.g. pixels which are different are tinted light red)? That would be super useful to quickly spot issues. |
hey, thanks a lot for all the feedback! optional failures
Platform in path
@Fishbowler could you explain what you want in more detail? Partial screenshotsI'm hesitant toward that, as it'll quickly increase the complexity of your tests, and make them more flaky and much less portable across different devices. Diff markers
This is a very good idea (actually a necessary one to allow for a pleasant workflow) |
Regarding visual regression tests: This is what the Chromatic user interface looks like: https://www.chromatic.com/videos/visual-test-hero.mp4 As you can see, there is also a toggle to display the visual differences more specifically. And there is a button to accept or reject the change. I think there will probably be no getting around a graphical user interface similar to that of Chromatic. Or rather, I have no idea how it could be solved differently. By the way: If this feature gets implemented and it works as good as Chromatic it would be a complete game changer for many people. It would bring maestro to a whole new level |
Partial screenshots would be useful for isolating specific components. |
Platform in path I'd like to be able to check that the login screen in Android looks like it did before. I'd like to be able to check that the login screen in iOS looks like it did before. They're close, but not close enough - Native components and whatnot. I don't want lots of conditional logic in my tests, I want Partial Screenshots I'd like to be able to care about what I care about and not need to always set consistent data to make the screenshots match (like data that would come from an API). Hierarchy for a selector would give good coordinates that would likely remain mostly consistent for the same app running on the same device. |
Hey all, Thanks a lot for all the valuable feedback. It means a lot to us and helps us build what you need. It's clear that this feature holds a lot of value. That said, there's an inherent problems with it: having to manually maintain the baseline. The larger the app, the more time consuming it'll get. ProposalTL:DR We want to give you advantages of Based on our experience building App Quality Copilot, we're quite confident that this can actually work and be useful. Here's how we envision it: - tapOn: Get started
- assertVisualAI:
assertion: "Assert that login button is visible with multi factor authentication of OTP"
optional: <bool> # if `true`, it'll be a warning, not a failure
If you don't want the burden of maintaining baseline screenshots though, but still want some assurance that your screens "look right", we want to make it possible (and easy). In particular,
Actually, the We will also provide a way to improve AI responses by flagging false positives. Maestro Studio integrationWe'd like to surface responses you get from AI in Maestro Studio, to make experience smoother. Of course, at the same time we'd make sure it works equally well in CLI-only mode. Model selectionWe don't want to force any specific AI model. There'd be configuration in The futureWe have many ideas around this. One of the is taking some existing model and finetuning it to perform even better for exactly this kind of task – quality assurance of mobile app UI. Overall – what do you think? We'd love to get your thoughts on this. |
I think it's a cool option to have available, but right now I'd rather maintain my baseline and retain my deterministic tests. Caveat: I've not played with the App Quality Copilot at all. Slightly O/T: My experience in testing AI systems (as opposed to using them to help me test) has given me a strong scepticism that the generative models can be relied upon for consistency of output, which in my current context (healthcare) means I can't rely on it for testing evidence. |
Something like this would be a great start from my perspective and perhaps it can use deeplinks or universal links so you can just specify a bunch of paths and getting screenshots. https://docs.fastlane.tools/img/getting-started/ios/htmlPagePreviewFade.jpg Can add more specific component testing after. Just would like to look at my overall layout on a bunch of different simulators first. |
For comparison and inspiration, We use Playwright for our web testing, and their screenshot and visual comparison features are pretty comprehensive https://playwright.dev/docs/test-snapshots, and when diff is detected, it attaches the expected, actual, and diff image to it's test report to help easy browsing of the changes; 1, It allows taking screenshots of a component/element only, using any of its supported selector like css/testid/etc, allowing you to only capture image for something like the header, footer, navigation bar, center area, popup; which is useful for large organizations that each team probably only owns and maintain a section on the page, 2, it allows applying masks to certain area/element of the screen, like the date-time field, or anything that cannot be made static/consistent across each test runs. The masked area will be covered by a solid color block in the screen shot, so image diff will effectively ignore those masked areas; One quirk we did have to deal with is how it treats new screenshots. Out of the box it always consider new screenshots as test failures, since there's no expected image to compare with, but it will save these new screenshots as expected images, so when you re-run the same tests again, tests will pass; this is probably ok for projects that are already stable and visual changes rarely happen; but for fast evolving projects, or projects with frequent new features, this isn't very friendly. Luckily it was written in Typescript and whether it's intended or not, it exposes quite a lot of the internals for us to tweak its behavior; we were able to tweak/hack it that when it's a new screenshot with no expected screenshot in place, we will catch the error thrown, and mark the test status as "skipped" as opposed to "failed", and attach the new screenshot to test report as well, so in the report, all tests that generates new screenshots will be grouped as skipped, allowing us to browse them separately,from the true failed or passed tests. And we have automation following a PR merge to simply refresh all screenshots, so these new screenshots will become expected ones in next test run. Hope this gives some perspective and inspiration, that we can have some similar or even better for mobile/reactnative. |
Regarding use of AI without needing expected images, I think it's valuable, but for very different use cases than pixel-by-pixel comparisons, and two aren't mutually exclusive, that there are cases both are valuable. I can imagine for small or in development projects, such AI can be a convenient way to detect things that are obviously off and broken; but as the project grows and become more serious, I think eventually we will need something more definitive. Not sure AI will be smart enough to tell something like "This otherwise perfectly looking button should be green instead of red", this is where we need image diff with human approved baseline images. On the other hand, AI can still supplement the image diff tests, since there's always a chance of human mistakes that the baseline images can be messed up, and AI can be the second layer of checks to catch those. |
We're currently working on setting up Maestro for our E2E testing and the Looking at the comment above about possible problems, a thought about the "accepting the change" piece. I think having the CI version be informational and not worry about being able to accept changes would be fine, especially for an early version of the command. I see a draft PR exists to add |
@bartekpacia - any update's on this ? Would love to help as i got some time i can spend on this feature request. |
Hi all, |
Is your feature request related to a problem? Please describe.
It always takes a lot of time to check if certain components looks different/wrong. Tools like Chromatics visual regression testing help a lot with that. They take a screenshot of a component and with every build a new screenshots is taken and being compared against the old one.
Describe the solution you'd like
- assertVisual: xyz
- which creates a screenshot the first time it is used. the xyz is a keyword so you can compare multiple screenshots and flowsThe text was updated successfully, but these errors were encountered: