Goal: Create a tool that uses a multimodal LLM to describe testing instructions for any digital product's features, based on the screenshots. What You Need to Build
Front-end: A simple web page with the following inputs: A text box for optional context. A multi-image uploader for screenshots (required). A button 'Describe Testing Instructions'.
Back-end: Integrate a multimodal LLM to process the screenshots and optional text context. Output should describe a detailed, step-by-step guide on how to test each functionality.