Ensure Docker is installed on your system. For installation instructions, please visit the official guide.
DevBench offers two versions of Docker images: the base version and the complete version. Select the appropriate version based on your requirements:
Tag | dep. for DevBench utilities and baseline system | dep. for Python repos | dep. for C++/Java/JS repos |
---|---|---|---|
base | ✅ | ❌ | ❌ |
complete | ✅ | ✅ | ❌ |
-
Base Version: Ideal for evaluating environment setup tasks, this version provides a clean and initial setting with essential tools. It is tailored for scenarios where a clean, minimal environment is crucial. It includes:
- Core dependencies for DevBench utilities and the baseline system.
- Key utilities such as compilers (gcc, g++ 11), GoogleTest, Java 14, Python 3.11, node.js, cmake, and MiniConda.
- Specific third-party tools for C++, JS, and Java repositories, like Redis for Java redis-cache and Sqlite for the C++ database management repository.
- Essential packages to operate the baseline agent system efficiently.
-
Complete Version: Recommended for implementation and testing tasks, it encompasses everything in the base version plus:
- Comprehensive dependencies for all Python repositories, facilitating a seamless development experience without the hassle of manual dependency management.
Note: Our evaluation of environment setup primarily targets Python and JavaScript projects. We've chosen to concentrate on these languages because they have well-established tools for package management, unlike C++. C++ lacks a universally accepted package manager, which makes standardized environment setup challenging. In contrast, Java projects often incorporate their own comprehensive build and package management systems, commonly utilizing scripts like Gradle. Therefore, our environment setup process is less relevant to Java projects. Instead, for Java, we integrate the environment setup evaluation with the implementation phase. During this combined phase, the model or agent is required to execute a complete Gradle script, which includes all necessary dependency listings and build instructions.
This structured approach ensures that developers can choose the most suitable environment, enhancing their efficiency and the overall development workflow.
- Base image:
cd docker/base
docker build -t $IMAGE_NAME:$VERSION .
- Complete image:
cd docker/complete
docker build -t $IMAGE_NAME:$VERSION .
Alternatively, you can pull the pre-built images from docker hub.
- Base image.
docker pull elyndendu/devbench:base
- Complete image.
docker pull elyndendu/devbench:complete
To use nvidia GPUs within Docker, install the nvidia container toolkit by following these steps:
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
To create a container with nvidia GPU support:
docker run -it --gpus all --name $CONTAINER_NAME -e NVIDIA_DRIVER_CAPABILITIES=compute,utility -e NVIDIA_VISIBLE_DEVICES=all $IMAGE_NAME:$VERSION bash
For a standard setup without GPU support:
docker run -it --name $CONTAINER_NAME devbench:$VERSION bash
git clone [email protected]:open-compass/DevBench.git
Run unit and acceptance tests for each repo as per the instructions in the repo_config.json
in repo_config.json
found in benchmark_data
.
For example, for the repo ./benchmark_data/python/readtime
, a fragment of repo_config.json
is shown as follows:
{
"unit_test_script": "pytest --cov=. --cov-report=json:unit_test_cov.json --json-report --json-report-file=unit_test_report.json unit_tests",
"acceptance_test_script": "pytest --cov=. --cov-report=json:acceptance_test_cov.json --json-report --json-report-file=acceptance_test_report.json acceptance_tests",
}
If tests fail due to network issues, simply retry.