Ingredient Scanner

Abstract

With the recent advancements in computer vision and optical character recognition and using a convolutional neural network to cut out the product from a picture, it has now become possible to reliably extract ingredient lists from the back of a product using the Anthropic API. Open-weight or even only on-device optical character recognition lacks the quality to be used in a production environment, although the progress in development is promising. The Anthropic API is also currently not feasible due to the high cost of 1 Swiss Franc per 100 pictures.

An inference example can be found on HuggingFace.

This is an entry for the 2024 Swiss AI competition.

Report

Read the full report here.

Installation

GNU/Linux is required. Debian GNU/Linux 12 or later is recommended.

Git

Installation

sudo apt install git

Cloning

git clone https://github.com/lenamerkli/ingredient-scanner
cd ingredient-scanner

GNU C Compiler

sudo apt install gcc

FFmpeg

sudo apt install ffmpeg

NVIDIA driver

Install the proprietary driver from NVIDIA.

Make sure that all PATH variables are set correctly.

CUDA

Install the CUDA Toolkit, version 12.1.0 or later. Tested with version 12.5.0. Additional information can be found on the NVIDIA website.

Make sure that all PATH variables are set correctly.

Python

Install Python 3.11.9 or later, below version 3.12:

sudo apt update
sudo apt install python3.11

Virtual environment

sudo apt install python3.11-venv
python3 -m venv .venv
source .venv/bin/activate

Libraries

pip3 install nvidia-pyindex
pip3 install -r requirements.txt
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
pip3 install "unsloth[cu121-ampere-torch230] @ git+https://github.com/unslothai/unsloth.git"

tkinter

sudo apt install python3-tk

Convert videos to frames

cd data/full_images
python3 video_to_frames.py

Optional: Anthropic API

Create a file named .env with the following content:

OPENAI_API_KEY=YOUR_API_KEY

Replace YOUR_API_KEY with your API key which you can find in your anthropic console.

Usage

Generate synthetic images

cd data/full_images
python3 generate_synthetic.py

Train the model

python3 train.py

Generate synthetic text

cd data/ingredients
python3 generate_synthetic.py

Train the large language model

python3 train_llm.py

Build the inference project

python3 build.py

Inference

A working example with trained models can be found on HuggingFace.

Citation

Here is how to cite this paper in the bibtex format:

@misc{merkli2024ingriedient-scanner,
    title={Ingredient Scanner: Automating Reading of Ingredient Labels with Computer Vision},
    author={Lena Merkli and Sonja Merkli},
    date={2024-07-16},
    url={https://huggingface.co/lenamerkli/ingredient-scanner},
}

Name		Name	Last commit message	Last commit date
Latest commit History 119 Commits
.idea		.idea
build_artifacts		build_artifacts
data		data
latex		latex
models		models
.gitignore		.gitignore
README.md		README.md
build.py		build.py
distort.py		distort.py
general.py		general.py
ocr.py		ocr.py
requirements.txt		requirements.txt
rhlf.py		rhlf.py
train.py		train.py
train_llm.py		train_llm.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ingredient Scanner

Abstract

Table of Contents

Report

Installation

Git

Installation

Cloning

GNU C Compiler

FFmpeg

NVIDIA driver

CUDA

Python

Virtual environment

Libraries

tkinter

Convert videos to frames

Optional: Anthropic API

Usage

Generate synthetic images

Train the model

Generate synthetic text

Train the large language model

Build the inference project

Inference

Citation

About

Releases

Packages

Contributors 2

Languages

lenamerkli/ingredient-scanner

Folders and files

Latest commit

History

Repository files navigation

Ingredient Scanner

Abstract

Table of Contents

Report

Installation

Git

Installation

Cloning

GNU C Compiler

FFmpeg

NVIDIA driver

CUDA

Python

Virtual environment

Libraries

tkinter

Convert videos to frames

Optional: Anthropic API

Usage

Generate synthetic images

Train the model

Generate synthetic text

Train the large language model

Build the inference project

Inference

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages