Merge branch 'branch-23.12' into delta_encode_python

rapidsai · Nov 8, 2023 · 6dabbcb · 6dabbcb
2 parents 1e0cc58 + d3dcc75
commit 6dabbcb
Show file tree

Hide file tree

Showing 10 changed files with 55 additions and 50 deletions.
diff --git a/README.md b/README.md
@@ -1,57 +1,62 @@
 # <div align="left"><img src="img/rapids_logo.png" width="90px"/>&nbsp;cuDF - GPU DataFrames</div>
 
-**NOTE:** For the latest stable [README.md](https://github.com/rapidsai/cudf/blob/main/README.md) ensure you are on the `main` branch.
+## 📢 cuDF can now be used as a no-code-change accelerator for pandas! To learn more, see [here](https://rapids.ai/cudf-pandas/)!
 
-## Resources
-
-- [cuDF Reference Documentation](https://docs.rapids.ai/api/cudf/stable/): Python API reference, tutorials, and topic guides.
-- [libcudf Reference Documentation](https://docs.rapids.ai/api/libcudf/stable/): C/C++ CUDA library API reference.
-- [Getting Started](https://rapids.ai/start.html): Instructions for installing cuDF.
-- [RAPIDS Community](https://rapids.ai/community.html): Get help, contribute, and collaborate.
-- [GitHub repository](https://github.com/rapidsai/cudf): Download the cuDF source code.
-- [Issue tracker](https://github.com/rapidsai/cudf/issues): Report issues or request features.
-
-## Overview
-
-Built based on the [Apache Arrow](http://arrow.apache.org/) columnar memory format, cuDF is a GPU DataFrame library for loading, joining, aggregating, filtering, and otherwise manipulating data.
+cuDF is a GPU DataFrame library for loading joining, aggregating,
+filtering, and otherwise manipulating data. cuDF leverages
+[libcudf](https://docs.rapids.ai/api/libcudf/stable/), a
+blazing-fast C++/CUDA dataframe library and the [Apache
+Arrow](https://arrow.apache.org/) columnar format to provide a
+GPU-accelerated pandas API.
 
-cuDF provides a pandas-like API that will be familiar to data engineers & data scientists, so they can use it to easily accelerate their workflows without going into the details of CUDA programming.
+You can import `cudf` directly and use it like `pandas`:
 
-For example, the following snippet downloads a CSV, then uses the GPU to parse it into rows and columns and run calculations:
 ```python
-import cudf, requests
+import cudf
+import requests
 from io import StringIO
 
 url = "https://github.com/plotly/datasets/raw/master/tips.csv"
-content = requests.get(url).content.decode('utf-8')
+content = requests.get(url).content.decode("utf-8")
 
 tips_df = cudf.read_csv(StringIO(content))
-tips_df['tip_percentage'] = tips_df['tip'] / tips_df['total_bill'] * 100
+tips_df["tip_percentage"] = tips_df["tip"] / tips_df["total_bill"] * 100
 
 # display average tip by dining party size
-print(tips_df.groupby('size').tip_percentage.mean())
+print(tips_df.groupby("size").tip_percentage.mean())
 ```
 
-Output:
-```
-size
-1    21.729201548727808
-2    16.571919173482897
-3    15.215685473711837
-4    14.594900639351332
-5    14.149548965142023
-6    15.622920072028379
-Name: tip_percentage, dtype: float64
-```
+Or, you can use cuDF as a no-code-change accelerator for pandas, using
+[`cudf.pandas`](https://docs.rapids.ai/api/cudf/stable/cudf_pandas).
+`cudf.pandas` supports 100% of the pandas API, utilizing cuDF for
+supported operations and falling back to pandas when needed:
 
-For additional examples, browse our complete [API documentation](https://docs.rapids.ai/api/cudf/stable/), or check out our more detailed [notebooks](https://github.com/rapidsai/notebooks-contrib).
+```python
+%load_ext cudf.pandas  # pandas operations now use the GPU!
 
-## Quick Start
+import pandas as pd
+import requests
+from io import StringIO
 
-Please see the [Demo Docker Repository](https://hub.docker.com/r/rapidsai/rapidsai/), choosing a tag based on the NVIDIA CUDA version you're running. This provides a ready to run Docker container with example notebooks and data, showcasing how you can utilize cuDF.
+url = "https://github.com/plotly/datasets/raw/master/tips.csv"
+content = requests.get(url).content.decode("utf-8")
 
-## Installation
+tips_df = pd.read_csv(StringIO(content))
+tips_df["tip_percentage"] = tips_df["tip"] / tips_df["total_bill"] * 100
 
+# display average tip by dining party size
+print(tips_df.groupby("size").tip_percentage.mean())
+```
+
+## Resources
+
+- [Try cudf.pandas now](https://nvda.ws/rapids-cudf): Explore `cudf.pandas` on a free GPU enabled instance on Google Colab!
+- [Install](https://rapids.ai/start.html): Instructions for installing cuDF and other [RAPIDS](https://rapids.ai) libraries.
+- [cudf (Python) documentation](https://docs.rapids.ai/api/cudf/stable/)
+- [libcudf (C++/CUDA) documentation](https://docs.rapids.ai/api/libcudf/stable/)
+- [RAPIDS Community](https://rapids.ai/community.html): Get help, contribute, and collaborate.
+
+## Installation
 
 ### CUDA/GPU requirements
 

diff --git a/conda/environments/all_cuda-118_arch-x86_64.yaml b/conda/environments/all_cuda-118_arch-x86_64.yaml
@@ -40,7 +40,7 @@ dependencies:
 - hypothesis
 - identify>=2.5.20
 - ipython
-- libarrow==13.0.0.*
+- libarrow-all==14.0.0.*
 - libcufile-dev=1.4.0.31
 - libcufile=1.4.0.31
 - libcurand-dev=10.3.0.86
@@ -69,7 +69,7 @@ dependencies:
 - pre-commit
 - protobuf>=4.21,<5
 - ptxcompiler
-- pyarrow==13.0.0.*
+- pyarrow==14.0.0.*
 - pydata-sphinx-theme!=0.14.2
 - pytest
 - pytest-benchmark

diff --git a/conda/environments/all_cuda-120_arch-x86_64.yaml b/conda/environments/all_cuda-120_arch-x86_64.yaml
@@ -42,7 +42,7 @@ dependencies:
 - hypothesis
 - identify>=2.5.20
 - ipython
-- libarrow==13.0.0.*
+- libarrow-all==14.0.0.*
 - libcufile-dev
 - libcurand-dev
 - libkvikio==23.12.*
@@ -67,7 +67,7 @@ dependencies:
 - pip
 - pre-commit
 - protobuf>=4.21,<5
-- pyarrow==13.0.0.*
+- pyarrow==14.0.0.*
 - pydata-sphinx-theme!=0.14.2
 - pytest
 - pytest-benchmark

diff --git a/conda/recipes/cudf/meta.yaml b/conda/recipes/cudf/meta.yaml
@@ -55,13 +55,13 @@ requirements:
     - cuda-version ={{ cuda_version }}
     - sysroot_{{ target_platform }} {{ sysroot_version }}
   host:
-    - protobuf ==4.23.*
+    - protobuf ==4.24.*
     - python
     - cython >=3.0.0
     - scikit-build >=0.13.1
     - setuptools
     - dlpack >=0.5,<0.6.0a0
-    - pyarrow ==13.0.0.*
+    - pyarrow ==14.0.0.*
     - libcudf ={{ version }}
     - rmm ={{ minor_version }}
     {% if cuda_major == "11" %}

diff --git a/conda/recipes/libcudf/conda_build_config.yaml b/conda/recipes/libcudf/conda_build_config.yaml
@@ -23,7 +23,7 @@ gtest_version:
   - ">=1.13.0"
 
 libarrow_version:
-  - "==13.0.0"
+  - "==14.0.0"
 
 dlpack_version:
   - ">=0.5,<0.6.0a0"

diff --git a/cpp/cmake/thirdparty/get_arrow.cmake b/cpp/cmake/thirdparty/get_arrow.cmake
@@ -427,7 +427,7 @@ if(NOT DEFINED CUDF_VERSION_Arrow)
   set(CUDF_VERSION_Arrow
       # This version must be kept in sync with the libarrow version pinned for builds in
       # dependencies.yaml.
-      13.0.0
+      14.0.0
       CACHE STRING "The version of Arrow to find (or build)"
   )
 endif()

diff --git a/dependencies.yaml b/dependencies.yaml
@@ -224,7 +224,7 @@ dependencies:
           - &gmock gmock>=1.13.0
           # Hard pin the patch version used during the build. This must be kept
           # in sync with the version pinned in get_arrow.cmake.
-          - libarrow==13.0.0.*
+          - libarrow-all==14.0.0.*
           - librdkafka>=1.9.0,<1.10.0a0
           # Align nvcomp version with rapids-cmake
           - nvcomp==2.6.1
@@ -246,7 +246,7 @@ dependencies:
         packages:
           # Hard pin the patch version used during the build. This must be kept
           # in sync with the version pinned in get_arrow.cmake.
-          - pyarrow==13.0.0.*
+          - pyarrow==14.0.0.*
   build_python:
     common:
       - output_types: [conda, requirements, pyproject]
@@ -264,13 +264,13 @@ dependencies:
       - output_types: conda
         packages:
           # Allow runtime version to float up to minor version
-          - libarrow==13.*
+          - libarrow-all==14.*
   pyarrow_run:
     common:
       - output_types: [conda, requirements, pyproject]
         packages:
           # Allow runtime version to float up to minor version
-          - pyarrow==13.*
+          - pyarrow==14.*
   cudatoolkit:
     specific:
       - output_types: conda

diff --git a/python/cudf/cudf/tests/test_stats.py b/python/cudf/cudf/tests/test_stats.py
@@ -272,7 +272,7 @@ def test_kurt_skew_error(op):
     gs = cudf.Series(["ab", "cd"])
     ps = gs.to_pandas()
 
-    with pytest.raises(FutureWarning):
+    with pytest.warns(FutureWarning):
         assert_exceptions_equal(
             getattr(gs, op),
             getattr(ps, op),

diff --git a/python/cudf/pyproject.toml b/python/cudf/pyproject.toml
@@ -8,7 +8,7 @@ requires = [
     "ninja",
     "numpy>=1.21,<1.25",
     "protoc-wheel",
-    "pyarrow==13.0.0.*",
+    "pyarrow==14.0.0.*",
     "rmm==23.12.*",
     "scikit-build>=0.13.1",
     "setuptools",
@@ -38,7 +38,7 @@ dependencies = [
     "pandas>=1.3,<1.6.0dev0",
     "protobuf>=4.21,<5",
     "ptxcompiler",
-    "pyarrow==13.*",
+    "pyarrow==14.*",
     "rmm==23.12.*",
     "typing_extensions>=4.0.0",
 ] # This list was generated by `rapids-dependency-file-generator`. To make changes, edit ../../dependencies.yaml and run `rapids-dependency-file-generator`.

diff --git a/python/cudf_kafka/pyproject.toml b/python/cudf_kafka/pyproject.toml
@@ -5,7 +5,7 @@
 requires = [
     "cython>=3.0.0",
     "numpy>=1.21,<1.25",
-    "pyarrow==13.0.0.*",
+    "pyarrow==14.0.0.*",
     "setuptools",
     "wheel",
 ] # This list was generated by `rapids-dependency-file-generator`. To make changes, edit ../../dependencies.yaml and run `rapids-dependency-file-generator`.