With the recent advancements in computer vision and optical character recognition and using a convolutional neural network to cut out the product from a picture, it has now become possible to reliably extract ingredient lists from the back of a product using the Anthropic API. Open-weight or even only on-device optical character recognition lacks the quality to be used in a production environment, although the progress in development is promising. The Anthropic API is also currently not feasible due to the high cost of 1 Swiss Franc per 100 pictures.
An inference example can be found on HuggingFace.
This is an entry for the 2024 Swiss AI competition.
Read the full report here.
GNU/Linux is required. Debian GNU/Linux 12 or later is recommended.
sudo apt install git
git clone https://github.com/lenamerkli/ingredient-scanner
cd ingredient-scanner
sudo apt install gcc
sudo apt install ffmpeg
Install the proprietary driver from NVIDIA.
Make sure that all PATH variables are set correctly.
Install the CUDA Toolkit, version 12.1.0 or later. Tested with version 12.5.0. Additional information can be found on the NVIDIA website.
Make sure that all PATH variables are set correctly.
Install Python 3.11.9 or later, below version 3.12:
sudo apt update
sudo apt install python3.11
sudo apt install python3.11-venv
python3 -m venv .venv
source .venv/bin/activate
pip3 install nvidia-pyindex
pip3 install -r requirements.txt
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
pip3 install "unsloth[cu121-ampere-torch230] @ git+https://github.com/unslothai/unsloth.git"
sudo apt install python3-tk
cd data/full_images
python3 video_to_frames.py
Create a file named .env
with the following content:
OPENAI_API_KEY=YOUR_API_KEY
Replace YOUR_API_KEY
with your API key which you can find in your anthropic console.
cd data/full_images
python3 generate_synthetic.py
python3 train.py
cd data/ingredients
python3 generate_synthetic.py
python3 train_llm.py
python3 build.py
A working example with trained models can be found on HuggingFace.
Here is how to cite this paper in the bibtex format:
@misc{merkli2024ingriedient-scanner,
title={Ingredient Scanner: Automating Reading of Ingredient Labels with Computer Vision},
author={Lena Merkli and Sonja Merkli},
date={2024-07-16},
url={https://huggingface.co/lenamerkli/ingredient-scanner},
}