Skip to content

Commit

Permalink
Merge branch 'branch-23.12' into delta_encode_python
Browse files Browse the repository at this point in the history
  • Loading branch information
vuule authored Nov 8, 2023
2 parents 1e0cc58 + d3dcc75 commit 6dabbcb
Show file tree
Hide file tree
Showing 10 changed files with 55 additions and 50 deletions.
73 changes: 39 additions & 34 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,57 +1,62 @@
# <div align="left"><img src="img/rapids_logo.png" width="90px"/>&nbsp;cuDF - GPU DataFrames</div>

**NOTE:** For the latest stable [README.md](https://github.com/rapidsai/cudf/blob/main/README.md) ensure you are on the `main` branch.
## 📢 cuDF can now be used as a no-code-change accelerator for pandas! To learn more, see [here](https://rapids.ai/cudf-pandas/)!

## Resources

- [cuDF Reference Documentation](https://docs.rapids.ai/api/cudf/stable/): Python API reference, tutorials, and topic guides.
- [libcudf Reference Documentation](https://docs.rapids.ai/api/libcudf/stable/): C/C++ CUDA library API reference.
- [Getting Started](https://rapids.ai/start.html): Instructions for installing cuDF.
- [RAPIDS Community](https://rapids.ai/community.html): Get help, contribute, and collaborate.
- [GitHub repository](https://github.com/rapidsai/cudf): Download the cuDF source code.
- [Issue tracker](https://github.com/rapidsai/cudf/issues): Report issues or request features.

## Overview

Built based on the [Apache Arrow](http://arrow.apache.org/) columnar memory format, cuDF is a GPU DataFrame library for loading, joining, aggregating, filtering, and otherwise manipulating data.
cuDF is a GPU DataFrame library for loading joining, aggregating,
filtering, and otherwise manipulating data. cuDF leverages
[libcudf](https://docs.rapids.ai/api/libcudf/stable/), a
blazing-fast C++/CUDA dataframe library and the [Apache
Arrow](https://arrow.apache.org/) columnar format to provide a
GPU-accelerated pandas API.

cuDF provides a pandas-like API that will be familiar to data engineers & data scientists, so they can use it to easily accelerate their workflows without going into the details of CUDA programming.
You can import `cudf` directly and use it like `pandas`:

For example, the following snippet downloads a CSV, then uses the GPU to parse it into rows and columns and run calculations:
```python
import cudf, requests
import cudf
import requests
from io import StringIO

url = "https://github.com/plotly/datasets/raw/master/tips.csv"
content = requests.get(url).content.decode('utf-8')
content = requests.get(url).content.decode("utf-8")

tips_df = cudf.read_csv(StringIO(content))
tips_df['tip_percentage'] = tips_df['tip'] / tips_df['total_bill'] * 100
tips_df["tip_percentage"] = tips_df["tip"] / tips_df["total_bill"] * 100

# display average tip by dining party size
print(tips_df.groupby('size').tip_percentage.mean())
print(tips_df.groupby("size").tip_percentage.mean())
```

Output:
```
size
1 21.729201548727808
2 16.571919173482897
3 15.215685473711837
4 14.594900639351332
5 14.149548965142023
6 15.622920072028379
Name: tip_percentage, dtype: float64
```
Or, you can use cuDF as a no-code-change accelerator for pandas, using
[`cudf.pandas`](https://docs.rapids.ai/api/cudf/stable/cudf_pandas).
`cudf.pandas` supports 100% of the pandas API, utilizing cuDF for
supported operations and falling back to pandas when needed:

For additional examples, browse our complete [API documentation](https://docs.rapids.ai/api/cudf/stable/), or check out our more detailed [notebooks](https://github.com/rapidsai/notebooks-contrib).
```python
%load_ext cudf.pandas # pandas operations now use the GPU!

## Quick Start
import pandas as pd
import requests
from io import StringIO

Please see the [Demo Docker Repository](https://hub.docker.com/r/rapidsai/rapidsai/), choosing a tag based on the NVIDIA CUDA version you're running. This provides a ready to run Docker container with example notebooks and data, showcasing how you can utilize cuDF.
url = "https://github.com/plotly/datasets/raw/master/tips.csv"
content = requests.get(url).content.decode("utf-8")

## Installation
tips_df = pd.read_csv(StringIO(content))
tips_df["tip_percentage"] = tips_df["tip"] / tips_df["total_bill"] * 100

# display average tip by dining party size
print(tips_df.groupby("size").tip_percentage.mean())
```

## Resources

- [Try cudf.pandas now](https://nvda.ws/rapids-cudf): Explore `cudf.pandas` on a free GPU enabled instance on Google Colab!
- [Install](https://rapids.ai/start.html): Instructions for installing cuDF and other [RAPIDS](https://rapids.ai) libraries.
- [cudf (Python) documentation](https://docs.rapids.ai/api/cudf/stable/)
- [libcudf (C++/CUDA) documentation](https://docs.rapids.ai/api/libcudf/stable/)
- [RAPIDS Community](https://rapids.ai/community.html): Get help, contribute, and collaborate.

## Installation

### CUDA/GPU requirements

Expand Down
4 changes: 2 additions & 2 deletions conda/environments/all_cuda-118_arch-x86_64.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ dependencies:
- hypothesis
- identify>=2.5.20
- ipython
- libarrow==13.0.0.*
- libarrow-all==14.0.0.*
- libcufile-dev=1.4.0.31
- libcufile=1.4.0.31
- libcurand-dev=10.3.0.86
Expand Down Expand Up @@ -69,7 +69,7 @@ dependencies:
- pre-commit
- protobuf>=4.21,<5
- ptxcompiler
- pyarrow==13.0.0.*
- pyarrow==14.0.0.*
- pydata-sphinx-theme!=0.14.2
- pytest
- pytest-benchmark
Expand Down
4 changes: 2 additions & 2 deletions conda/environments/all_cuda-120_arch-x86_64.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ dependencies:
- hypothesis
- identify>=2.5.20
- ipython
- libarrow==13.0.0.*
- libarrow-all==14.0.0.*
- libcufile-dev
- libcurand-dev
- libkvikio==23.12.*
Expand All @@ -67,7 +67,7 @@ dependencies:
- pip
- pre-commit
- protobuf>=4.21,<5
- pyarrow==13.0.0.*
- pyarrow==14.0.0.*
- pydata-sphinx-theme!=0.14.2
- pytest
- pytest-benchmark
Expand Down
4 changes: 2 additions & 2 deletions conda/recipes/cudf/meta.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -55,13 +55,13 @@ requirements:
- cuda-version ={{ cuda_version }}
- sysroot_{{ target_platform }} {{ sysroot_version }}
host:
- protobuf ==4.23.*
- protobuf ==4.24.*
- python
- cython >=3.0.0
- scikit-build >=0.13.1
- setuptools
- dlpack >=0.5,<0.6.0a0
- pyarrow ==13.0.0.*
- pyarrow ==14.0.0.*
- libcudf ={{ version }}
- rmm ={{ minor_version }}
{% if cuda_major == "11" %}
Expand Down
2 changes: 1 addition & 1 deletion conda/recipes/libcudf/conda_build_config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ gtest_version:
- ">=1.13.0"

libarrow_version:
- "==13.0.0"
- "==14.0.0"

dlpack_version:
- ">=0.5,<0.6.0a0"
Expand Down
2 changes: 1 addition & 1 deletion cpp/cmake/thirdparty/get_arrow.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -427,7 +427,7 @@ if(NOT DEFINED CUDF_VERSION_Arrow)
set(CUDF_VERSION_Arrow
# This version must be kept in sync with the libarrow version pinned for builds in
# dependencies.yaml.
13.0.0
14.0.0
CACHE STRING "The version of Arrow to find (or build)"
)
endif()
Expand Down
8 changes: 4 additions & 4 deletions dependencies.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -224,7 +224,7 @@ dependencies:
- &gmock gmock>=1.13.0
# Hard pin the patch version used during the build. This must be kept
# in sync with the version pinned in get_arrow.cmake.
- libarrow==13.0.0.*
- libarrow-all==14.0.0.*
- librdkafka>=1.9.0,<1.10.0a0
# Align nvcomp version with rapids-cmake
- nvcomp==2.6.1
Expand All @@ -246,7 +246,7 @@ dependencies:
packages:
# Hard pin the patch version used during the build. This must be kept
# in sync with the version pinned in get_arrow.cmake.
- pyarrow==13.0.0.*
- pyarrow==14.0.0.*
build_python:
common:
- output_types: [conda, requirements, pyproject]
Expand All @@ -264,13 +264,13 @@ dependencies:
- output_types: conda
packages:
# Allow runtime version to float up to minor version
- libarrow==13.*
- libarrow-all==14.*
pyarrow_run:
common:
- output_types: [conda, requirements, pyproject]
packages:
# Allow runtime version to float up to minor version
- pyarrow==13.*
- pyarrow==14.*
cudatoolkit:
specific:
- output_types: conda
Expand Down
2 changes: 1 addition & 1 deletion python/cudf/cudf/tests/test_stats.py
Original file line number Diff line number Diff line change
Expand Up @@ -272,7 +272,7 @@ def test_kurt_skew_error(op):
gs = cudf.Series(["ab", "cd"])
ps = gs.to_pandas()

with pytest.raises(FutureWarning):
with pytest.warns(FutureWarning):
assert_exceptions_equal(
getattr(gs, op),
getattr(ps, op),
Expand Down
4 changes: 2 additions & 2 deletions python/cudf/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ requires = [
"ninja",
"numpy>=1.21,<1.25",
"protoc-wheel",
"pyarrow==13.0.0.*",
"pyarrow==14.0.0.*",
"rmm==23.12.*",
"scikit-build>=0.13.1",
"setuptools",
Expand Down Expand Up @@ -38,7 +38,7 @@ dependencies = [
"pandas>=1.3,<1.6.0dev0",
"protobuf>=4.21,<5",
"ptxcompiler",
"pyarrow==13.*",
"pyarrow==14.*",
"rmm==23.12.*",
"typing_extensions>=4.0.0",
] # This list was generated by `rapids-dependency-file-generator`. To make changes, edit ../../dependencies.yaml and run `rapids-dependency-file-generator`.
Expand Down
2 changes: 1 addition & 1 deletion python/cudf_kafka/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
requires = [
"cython>=3.0.0",
"numpy>=1.21,<1.25",
"pyarrow==13.0.0.*",
"pyarrow==14.0.0.*",
"setuptools",
"wheel",
] # This list was generated by `rapids-dependency-file-generator`. To make changes, edit ../../dependencies.yaml and run `rapids-dependency-file-generator`.
Expand Down

0 comments on commit 6dabbcb

Please sign in to comment.