If you are interested in contributing to cuDF, your contributions will fall into three categories:
- You want to report a bug, feature request, or documentation issue
- File an issue describing what you encountered or what you want to see changed.
- The RAPIDS team will evaluate the issues and triage them, scheduling them for a release. If you believe the issue needs priority attention comment on the issue to notify the team.
- You want to propose a new Feature and implement it
- Post about your intended feature, and we shall discuss the design and implementation.
- Once we agree that the plan looks good, go ahead and implement it, using the code contributions guide below.
- You want to implement a feature or bug-fix for an outstanding issue
- Follow the code contributions guide below.
- If you need more context on a particular issue, please ask and we shall provide.
- Follow the guide at the bottom of this page for Setting Up Your Build Environment
- Find an issue to work on. The best way is to look for the good first issue or help wanted labels
- Comment on the issue saying you are going to work on it
- Code! Make sure to update unit tests!
- When done, create your pull request
- Verify that CI passes all status checks. Fix if needed
- Wait for other developers to review your code and update code as needed
- Once reviewed and approved, a RAPIDS developer will merge your pull request
Remember, if you are unsure about anything, don't hesitate to comment on issues and ask for clarifications!
Once you have gotten your feet wet and are more comfortable with the code, you can look at the prioritized issues of our next release in our project boards.
Pro Tip: Always look at the release board with the highest number for issues to work on. This is where RAPIDS developers also focus their efforts.
Look at the unassigned issues, and find an issue you are comfortable with contributing to. Start with Step 3 from above, commenting on the issue to let others know you are working on it. If you have any questions related to the implementation of the issue, ask them in the issue instead of the PR.
The following instructions are for developers and contributors to cuDF OSS development. These instructions are tested on Linux Ubuntu 16.04 & 18.04. Use these instructions to build cuDF from source and contribute to its development. Other operatings systems may be compatible, but are not currently tested.
Compiler requirements:
gcc
version 5.4+nvcc
version 9.2+cmake
version 3.12.4+
CUDA/GPU requirements:
- CUDA 9.2+
- NVIDIA driver 396.44+
- Pascal architecture or better
You can obtain CUDA from https://developer.nvidia.com/cuda-downloads
Since cmake
will download and build Apache Arrow you may need to install Boost C++ (version 1.58+) before running
cmake
:
# Install Boost C++ for Ubuntu 16.04/18.04
$ sudo apt-get install libboost-all-dev
or
# Install Boost C++ for Conda
$ conda install -c conda-forge boost
To install cuDF from source, ensure the dependencies are met and follow the steps below:
- Clone the repository and submodules
CUDF_HOME=$(pwd)/cudf
git clone https://github.com/rapidsai/cudf.git $CUDF_HOME
cd CUDF_HOME
git submodule update --init --remote --recursive
- Create the conda development environment
cudf_dev
:
# create the conda environment (assuming in base `cudf` directory)
conda env create --name cudf_dev --file conda/environments/cudf_dev_cuda10.0.yml
# activate the environment
source activate cudf_dev
-
If you're using CUDA 9.2, you will need to create the environment with
conda env create --name cudf_dev --file conda/environments/cudf_dev_cuda9.2.yml
instead. -
Build and install
libcudf
. CMake depends on thenvcc
executable being on your path or defined in$CUDACXX
.
$ cd $CUDF_HOME/cpp # navigate to C/C++ CUDA source root directory
$ mkdir build # make a build directory
$ cd build # enter the build directory
# CMake options:
# -DCMAKE_INSTALL_PREFIX set to the install path for your libraries or $CONDA_PREFIX if you're using Anaconda, i.e. -DCMAKE_INSTALL_PREFIX=/install/path or -DCMAKE_INSTALL_PREFIX=$CONDA_PREFIX
# -DCMAKE_CXX11_ABI set to ON or OFF depending on the ABI version you want, defaults to OFF. When turned ON, ABI compability for C++11 is used. When OFF, pre-C++11 ABI compability is used.
$ cmake .. -DCMAKE_INSTALL_PREFIX=$CONDA_PREFIX -DCMAKE_CXX11_ABI=OFF # configure cmake ...
$ make -j # compile the libraries librmm.so, libcudf.so ... '-j' will start a parallel job using the number of physical cores available on your system
$ make install # install the libraries librmm.so, libcudf.so to the CMAKE_INSTALL_PREFIX
- To run tests (Optional):
$ make test
- Build, install, and test cffi bindings:
$ make python_cffi # build CFFI bindings for librmm.so, libcudf.so
$ make install_python # build & install CFFI python bindings. Depends on cffi package from PyPi or Conda
$ cd python && py.test -v # optional, run python tests on low-level python bindings
- Build the
cudf
python package, in thepython
folder:
$ cd $CUDF_HOME/python
$ python setup.py build_ext --inplace
- You will also need the following environment variables, including
$CUDA_HOME
.
NUMBAPRO_NVVM=$CUDA_HOME/nvvm/lib64/libnvvm.so
NUMBAPRO_LIBDEVICE=$CUDA_HOME/nvvm/libdevice
- To run Python tests (Optional):
$ py.test -v # run python tests on cudf python bindings
- Finally, install the Python package to your Python path:
$ python setup.py install # install cudf python bindings
Done! You are ready to develop for the cuDF OSS project.
Follow the above instructions to build from source and add -DCMAKE_BUILD_TYPE=Debug
to the cmake
step.
For example:
$ cmake .. -DCMAKE_INSTALL_PREFIX=/install/path -DCMAKE_BUILD_TYPE=Debug # configure cmake ... use -DCMAKE_INSTALL_PREFIX=$CONDA_PREFIX if you're using Anaconda
This builds libcudf
in Debug mode which enables some assert
safety checks and includes symbols in the library for debugging.
All other steps for installing libcudf
into your environment are the same.
When you have a debug build of libcudf
installed, debugging with the cuda-gdb
and cuda-memcheck
is easy.
If you are debugging a Python script, simply run the following:
cuda-gdb -ex r --args python <program_name>.py <program_arguments>
cuda-memcheck python <program_name>.py <program_arguments>
A Dockerfile is provided with a preconfigured conda environment for building and installing cuDF from source based off of the master branch.
- Install nvidia-docker2 for Docker + GPU support
- Verify NVIDIA driver is
396.44
or higher - Ensure CUDA 9.2+ is installed
From cudf project root run the following, to build with defaults:
$ docker build --tag cudf .
After the container is built run the container:
$ docker run --runtime=nvidia -it cudf bash
Activate the conda environment cudf
to use the newly built cuDF and libcudf libraries:
root@3f689ba9c842:/# source activate cudf
(cudf) root@3f689ba9c842:/# python -c "import cudf"
(cudf) root@3f689ba9c842:/#
Several build arguments are available to customize the build process of the container. These are specified by using the Docker build-arg flag. Below is a list of the available arguments and their purpose:
Build Argument | Default Value | Other Value(s) | Purpose |
---|---|---|---|
CUDA_VERSION |
9.2 | 10.0 | set CUDA version |
LINUX_VERSION |
ubuntu16.04 | ubuntu18.04 | set Ubuntu version |
CC & CXX |
5 | 7 | set gcc/g++ version; NOTE: gcc7 requires Ubuntu 18.04 |
CUDF_REPO |
This repo | Forks of cuDF | set git URL to use for git clone |
CUDF_BRANCH |
master | Any branch name | set git branch to checkout of CUDF_REPO |
NUMBA_VERSION |
newest | >=0.40.0 | set numba version |
NUMPY_VERSION |
newest | >=1.14.3 | set numpy version |
PANDAS_VERSION |
newest | >=0.23.4 | set pandas version |
PYARROW_VERSION |
0.12.1 | Not supported | set pyarrow version |
CMAKE_VERSION |
newest | >=3.12 | set cmake version |
CYTHON_VERSION |
0.29 | Not supported | set Cython version |
PYTHON_VERSION |
3.6 | 3.7 | set python version |
Portions adopted from https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md