Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: ✨ Official docker images for docTR #1322

Merged
merged 26 commits into from
Sep 28, 2023
Merged
Show file tree
Hide file tree
Changes from 25 commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/docker.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ jobs:
- name: Build docker image
run: docker build . -t doctr-tf-py3.8-slim
odulcy-mindee marked this conversation as resolved.
Show resolved Hide resolved
- name: Run docker container
run: docker run doctr-tf-py3.8-slim python -c 'import doctr'
run: docker run doctr-tf-py3.8-slim python3 -c 'import doctr'
odulcy-mindee marked this conversation as resolved.
Show resolved Hide resolved

pytest-api:
runs-on: ${{ matrix.os }}
Expand Down
89 changes: 89 additions & 0 deletions .github/workflows/public_docker_images.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
# https://docs.github.com/en/actions/publishing-packages/publishing-docker-images#publishing-images-to-github-packages
#
name: Docker image on ghcr.io

on:
push:
tags:
- 'v*'
pull_request:
branches: main
schedule:
- cron: '0 2 29 * *' # At 02:00 on day-of-month 29
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@felixT2K I changed the cron so it should be triggered tomorrow morning. We'll see if it works or not

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alright 👍🏼


env:
REGISTRY: ghcr.io

jobs:
build-and-push-image:
runs-on: ubuntu-latest

strategy:
fail-fast: false
matrix:
# Must match version at https://www.python.org/ftp/python/
python: ["3.8.18", "3.9.18", "3.10.13"]
framework: ["tf", "torch"]
system: ["cpu", "gpu"]

# Sets the permissions granted to the `GITHUB_TOKEN` for the actions in this job.
permissions:
contents: read
packages: write

steps:
- name: Checkout repository
uses: actions/checkout@v4

- name: Log in to the Container registry
uses: docker/login-action@65b78e6e13532edd9afa3aa52ac7964289d1a9c1
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}

- name: Extract metadata (tags, labels) for Docker
id: meta
uses: docker/metadata-action@9ec57ed1fcdbf14dcef7dfbe97b2010124a938b7
with:
images: ${{ env.REGISTRY }}/${{ github.repository }}
tags: |
# used only on schedule event
type=schedule,pattern={{date 'YYYY-MM'}},prefix=${{ matrix.framework }}-py${{ matrix.python }}-${{ matrix.system }}-
# set latest tag only if `enable` is True
# see https://github.com/docker/metadata-action#latest-tag
type=raw,value=latest,enable=${{ matrix.framework == 'tf' && matrix.python == '3.8.18' && matrix.system == 'gpu' && github.ref == format('refs/heads/{0}', 'main') }}
odulcy-mindee marked this conversation as resolved.
Show resolved Hide resolved
# used only if a tag following semver is published
type=semver,pattern={{raw}},prefix=${{ matrix.framework }}-py${{ matrix.python }}-${{ matrix.system }}-

- name: Build Docker image
id: build
uses: docker/build-push-action@f2a1d5e99d037542a71f64918e516c093c6f3fc4
with:
context: .
build-args: |
FRAMEWORK=${{ matrix.framework }}
PYTHON_VERSION=${{ matrix.python }}
SYSTEM=${{ matrix.system }}
DOCTR_REPO=${{ github.repository }}
DOCTR_VERSION=${{ github.sha }}
push: false # push only if `import doctr` works
tags: ${{ steps.meta.outputs.tags }}

- name: Check if `import doctr` works
run: docker run ${{ steps.build.outputs.imageid }} python3 -c 'import doctr'

- name: Push Docker image
# Push only if the CI is not triggered by "PR on main"
if: github.ref == 'refs/heads/main' && github.event_name != 'pull_request'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not 100% sure that this works on publish (release)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@felixT2K Mmmh, actually it works for tags but it's not done on publish: [released] as I'm afraid the CI job metadata from docker does not handle this event. I'll double check

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@felixT2K I did a test on my fork. When a draft release is created, a new tag is not created. When the release is published, the tag is created, so it triggers the on: push: tags event 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice :)

uses: docker/build-push-action@f2a1d5e99d037542a71f64918e516c093c6f3fc4
with:
context: .
build-args: |
FRAMEWORK=${{ matrix.framework }}
PYTHON_VERSION=${{ matrix.python }}
SYSTEM=${{ matrix.system }}
DOCTR_REPO=${{ github.repository }}
DOCTR_VERSION=${{ github.sha }}
push: true
tags: ${{ steps.meta.outputs.tags }}
83 changes: 67 additions & 16 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,21 +1,72 @@
# Use the TensorFlow GPU image as the base image. This image also works with CPU-only setups
FROM tensorflow/tensorflow@sha256:b4676741c491bff3d0f29c38c369281792c7d5c5bfa2b1aa93e5231a8d236323
FROM ubuntu:22.04

ENV PYTHONUNBUFFERED=1
ENV PYTHONDONTWRITEBYTECODE=1
ENV DOCTR_CACHE_DIR=/app/.cache
ENV DEBIAN_FRONTEND=noninteractive
ENV LANG=C.UTF-8
ENV PYTHONUNBUFFERED=1
ENV PYTHONDONTWRITEBYTECODE=1

WORKDIR /app
ARG SYSTEM=gpu

COPY . .
# Enroll NVIDIA GPG public key and install CUDA
RUN if [ "$SYSTEM" = "gpu" ]; then \

Check notice on line 11 in Dockerfile

View check run for this annotation

Codacy Production / Codacy Static Code Analysis

Dockerfile#L11

Avoid additional packages by specifying `--no-install-recommends`

Check notice on line 11 in Dockerfile

View check run for this annotation

Codacy Production / Codacy Static Code Analysis

Dockerfile#L11

Delete the apt-get lists after installing something

Check warning on line 11 in Dockerfile

View check run for this annotation

Codacy Production / Codacy Static Code Analysis

Dockerfile#L11

Pin versions in apt get install. Instead of `apt-get install <package>` use `apt-get install <package>=<version>`
apt-get update && \
apt-get install -y gnupg ca-certificates wget && \
# - Install Nvidia repo keys
# - See: https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#network-repo-installation-for-ubuntu
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb && \
dpkg -i cuda-keyring_1.1-1_all.deb && \
apt-get update && apt-get install -y --no-install-recommends \
cuda-command-line-tools-11-8 \
cuda-cudart-dev-11-8 \
cuda-nvcc-11-8 \
cuda-cupti-11-8 \
cuda-nvprune-11-8 \
cuda-libraries-11-8 \
cuda-nvrtc-11-8 \
libcufft-11-8 \
libcurand-11-8 \
libcusolver-11-8 \
libcusparse-11-8 \
libcublas-11-8 \
# - CuDNN: https://docs.nvidia.com/deeplearning/sdk/cudnn-install/index.html#ubuntu-network-installation
libcudnn8=8.6.0.163-1+cuda11.8 \
libnvinfer-plugin8=8.6.1.6-1+cuda11.8 \
libnvinfer8=8.6.1.6-1+cuda11.8; \
fi

# Install necessary dependencies for video processing and GUI operations
RUN apt-get update \
&& apt-get install --no-install-recommends ffmpeg libsm6 libxext6 -y \
&& apt-get autoremove -y \
&& rm -rf /var/lib/apt/lists/*
RUN apt-get update && apt-get install -y --no-install-recommends \

Check warning on line 37 in Dockerfile

View check run for this annotation

Codacy Production / Codacy Static Code Analysis

Dockerfile#L37

Pin versions in apt get install. Instead of `apt-get install <package>` use `apt-get install <package>=<version>`

Check notice on line 37 in Dockerfile

View check run for this annotation

Codacy Production / Codacy Static Code Analysis

Dockerfile#L37

Use semicolon or linefeed before 'fi' (or quote to make it literal).
# - Other packages
build-essential \
pkg-config \
curl \
wget \
software-properties-common \
unzip \
git \
# - Packages to build Python
tar make gcc zlib1g-dev libffi-dev libssl-dev liblzma-dev libbz2-dev \
# - Packages for docTR
libgl1-mesa-dev libsm6 libxext6 libxrender-dev libpangocairo-1.0-0 \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/* \
fi

# Install the current application with TensorFlow extras and modify permissions
RUN pip install --upgrade pip setuptools wheel \
&& pip install -e .[tf] \
&& chmod -R a+w /app
# Install Python
ARG PYTHON_VERSION=3.10.13

RUN wget http://www.python.org/ftp/python/$PYTHON_VERSION/Python-$PYTHON_VERSION.tgz && \

Check warning on line 57 in Dockerfile

View check run for this annotation

Codacy Production / Codacy Static Code Analysis

Dockerfile#L57

Use WORKDIR to switch to a directory
tar -zxf Python-$PYTHON_VERSION.tgz && \
cd Python-$PYTHON_VERSION && \
mkdir /opt/python/ && \
./configure --prefix=/opt/python && \
make && \
make install

ENV PATH=/opt/python/bin:$PATH

# Install docTR
ARG FRAMEWORK=tf
ARG DOCTR_REPO='mindee/doctr'
ARG DOCTR_VERSION=main
RUN pip3 install -U pip setuptools wheel && \

Check warning on line 71 in Dockerfile

View check run for this annotation

Codacy Production / Codacy Static Code Analysis

Dockerfile#L71

Pin versions in pip. Instead of `pip install <package>` use `pip install <package>==<version>`
pip3 install "python-doctr[$FRAMEWORK]@git+https://github.com/$DOCTR_REPO.git@$DOCTR_VERSION"
45 changes: 42 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
<img src="docs/images/Logo_doctr.gif" width="40%">
</p>

[![Slack Icon](https://img.shields.io/badge/Slack-Community-4A154B?style=flat-square&logo=slack&logoColor=white)](https://slack.mindee.com) [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](LICENSE) ![Build Status](https://github.com/mindee/doctr/workflows/builds/badge.svg) [![codecov](https://codecov.io/gh/mindee/doctr/branch/main/graph/badge.svg?token=577MO567NM)](https://codecov.io/gh/mindee/doctr) [![CodeFactor](https://www.codefactor.io/repository/github/mindee/doctr/badge?s=bae07db86bb079ce9d6542315b8c6e70fa708a7e)](https://www.codefactor.io/repository/github/mindee/doctr) [![Codacy Badge](https://api.codacy.com/project/badge/Grade/340a76749b634586a498e1c0ab998f08)](https://app.codacy.com/gh/mindee/doctr?utm_source=github.com&utm_medium=referral&utm_content=mindee/doctr&utm_campaign=Badge_Grade) [![Doc Status](https://github.com/mindee/doctr/workflows/doc-status/badge.svg)](https://mindee.github.io/doctr) [![Pypi](https://img.shields.io/badge/pypi-v0.7.0-blue.svg)](https://pypi.org/project/python-doctr/) [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/mindee/doctr) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/mindee/notebooks/blob/main/doctr/quicktour.ipynb)
[![Slack Icon](https://img.shields.io/badge/Slack-Community-4A154B?style=flat-square&logo=slack&logoColor=white)](https://slack.mindee.com) [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](LICENSE) ![Build Status](https://github.com/mindee/doctr/workflows/builds/badge.svg) [![Docker Images](https://img.shields.io/badge/Docker-4287f5?style=flat&logo=docker&logoColor=white)](https://github.com/mindee.doctr/packages) [![codecov](https://codecov.io/gh/mindee/doctr/branch/main/graph/badge.svg?token=577MO567NM)](https://codecov.io/gh/mindee/doctr) [![CodeFactor](https://www.codefactor.io/repository/github/mindee/doctr/badge?s=bae07db86bb079ce9d6542315b8c6e70fa708a7e)](https://www.codefactor.io/repository/github/mindee/doctr) [![Codacy Badge](https://api.codacy.com/project/badge/Grade/340a76749b634586a498e1c0ab998f08)](https://app.codacy.com/gh/mindee/doctr?utm_source=github.com&utm_medium=referral&utm_content=mindee/doctr&utm_campaign=Badge_Grade) [![Doc Status](https://github.com/mindee/doctr/workflows/doc-status/badge.svg)](https://mindee.github.io/doctr) [![Pypi](https://img.shields.io/badge/pypi-v0.7.0-blue.svg)](https://pypi.org/project/python-doctr/) [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/mindee/doctr) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/mindee/notebooks/blob/main/doctr/quicktour.ipynb)


**Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2 & PyTorch**
Expand Down Expand Up @@ -260,10 +260,49 @@ Check out our [TensorFlow.js demo](https://github.com/mindee/doctr-tfjs-demo) to

### Docker container

If you wish to deploy containerized environments, you can use the provided Dockerfile to build a docker image:
[We offers Docker container support for easy testing and deployment](https://github.com/mindee/doctr/packages).
odulcy-mindee marked this conversation as resolved.
Show resolved Hide resolved
odulcy-mindee marked this conversation as resolved.
Show resolved Hide resolved

#### Using GPU with docTR Docker Images

The docTR Docker images are GPU-ready and based on CUDA `11.8`.
However, to use GPU support with these Docker images, please ensure that Docker is configured to use your GPU.

To verify and configure GPU support for Docker, please follow the instructions provided in the [NVIDIA Container Toolkit Installation Guide](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html).

Once Docker is configured to use GPUs, you can run docTR Docker containers with GPU support:

```shell
docker run --it --gpus all ghcr.io/mindee/doctr:tf-py3.8.18-v0.7.1 bash
```

#### Available Tags

The Docker images for docTR follow a specific tag nomenclature: `<framework>-py<python_version>-<doctr_version|YYYY-MM>`. Here's a breakdown of the tag structure:

- `<framework>`: `tf` (TensorFlow) or `torch` (PyTorch).
- `<python_version>`: `3.8.18`, `3.9.18`, or `3.10.13`.
- `<doctr_version>`: a tag >= `v0.7.1`
- `<YYYY-MM>`: e.g. `2023-10`

Here are examples of different image tags:

| Tag | Description |
|----------------------------|---------------------------------------------------|
| `tf-py3.8.18-v0.7.1` | TensorFlow version `3.8.18` with docTR `v0.7.1`. |
| `torch-py3.9.18-2023-10`| PyTorch version `3.9.18` with a monthly build from `2023-10`. |

odulcy-mindee marked this conversation as resolved.
Show resolved Hide resolved
#### Building Docker Images Locally

You can also build docTR Docker images locally on your computer.

```shell
docker build -t doctr .
```

You can specify custom Python versions and docTR versions using build arguments. For example, to build a docTR image with TensorFlow, Python version `3.9.10`, and docTR version `v0.7.0`, run the following command:

```shell
docker build . -t <YOUR_IMAGE_TAG>
docker build -t doctr --build-arg FRAMEWORK=tf --build-arg PYTHON_VERSION=3.9.10 --build-arg DOCTR_VERSION=v0.7.0 .
```

### Example script
Expand Down
Loading