From d3dcc75c8f82d3ba2bcf1efed493ebbf02b2e6a1 Mon Sep 17 00:00:00 2001 From: Ashwin Srinath <3190405+shwina@users.noreply.github.com> Date: Wed, 8 Nov 2023 12:15:57 -0500 Subject: [PATCH] Update README (#14374) Authors: - Ashwin Srinath (https://github.com/shwina) Approvers: - Bradley Dice (https://github.com/bdice) - GALI PREM SAGAR (https://github.com/galipremsagar) URL: https://github.com/rapidsai/cudf/pull/14374 --- README.md | 73 +++++++++++++++++++++++++++++-------------------------- 1 file changed, 39 insertions(+), 34 deletions(-) diff --git a/README.md b/README.md index 5f2ce014dba..677cfc89d52 100644 --- a/README.md +++ b/README.md @@ -1,57 +1,62 @@ #
 cuDF - GPU DataFrames
-**NOTE:** For the latest stable [README.md](https://github.com/rapidsai/cudf/blob/main/README.md) ensure you are on the `main` branch. +## 📢 cuDF can now be used as a no-code-change accelerator for pandas! To learn more, see [here](https://rapids.ai/cudf-pandas/)! -## Resources - -- [cuDF Reference Documentation](https://docs.rapids.ai/api/cudf/stable/): Python API reference, tutorials, and topic guides. -- [libcudf Reference Documentation](https://docs.rapids.ai/api/libcudf/stable/): C/C++ CUDA library API reference. -- [Getting Started](https://rapids.ai/start.html): Instructions for installing cuDF. -- [RAPIDS Community](https://rapids.ai/community.html): Get help, contribute, and collaborate. -- [GitHub repository](https://github.com/rapidsai/cudf): Download the cuDF source code. -- [Issue tracker](https://github.com/rapidsai/cudf/issues): Report issues or request features. - -## Overview - -Built based on the [Apache Arrow](http://arrow.apache.org/) columnar memory format, cuDF is a GPU DataFrame library for loading, joining, aggregating, filtering, and otherwise manipulating data. +cuDF is a GPU DataFrame library for loading joining, aggregating, +filtering, and otherwise manipulating data. cuDF leverages +[libcudf](https://docs.rapids.ai/api/libcudf/stable/), a +blazing-fast C++/CUDA dataframe library and the [Apache +Arrow](https://arrow.apache.org/) columnar format to provide a +GPU-accelerated pandas API. -cuDF provides a pandas-like API that will be familiar to data engineers & data scientists, so they can use it to easily accelerate their workflows without going into the details of CUDA programming. +You can import `cudf` directly and use it like `pandas`: -For example, the following snippet downloads a CSV, then uses the GPU to parse it into rows and columns and run calculations: ```python -import cudf, requests +import cudf +import requests from io import StringIO url = "https://github.com/plotly/datasets/raw/master/tips.csv" -content = requests.get(url).content.decode('utf-8') +content = requests.get(url).content.decode("utf-8") tips_df = cudf.read_csv(StringIO(content)) -tips_df['tip_percentage'] = tips_df['tip'] / tips_df['total_bill'] * 100 +tips_df["tip_percentage"] = tips_df["tip"] / tips_df["total_bill"] * 100 # display average tip by dining party size -print(tips_df.groupby('size').tip_percentage.mean()) +print(tips_df.groupby("size").tip_percentage.mean()) ``` -Output: -``` -size -1 21.729201548727808 -2 16.571919173482897 -3 15.215685473711837 -4 14.594900639351332 -5 14.149548965142023 -6 15.622920072028379 -Name: tip_percentage, dtype: float64 -``` +Or, you can use cuDF as a no-code-change accelerator for pandas, using +[`cudf.pandas`](https://docs.rapids.ai/api/cudf/stable/cudf_pandas). +`cudf.pandas` supports 100% of the pandas API, utilizing cuDF for +supported operations and falling back to pandas when needed: -For additional examples, browse our complete [API documentation](https://docs.rapids.ai/api/cudf/stable/), or check out our more detailed [notebooks](https://github.com/rapidsai/notebooks-contrib). +```python +%load_ext cudf.pandas # pandas operations now use the GPU! -## Quick Start +import pandas as pd +import requests +from io import StringIO -Please see the [Demo Docker Repository](https://hub.docker.com/r/rapidsai/rapidsai/), choosing a tag based on the NVIDIA CUDA version you're running. This provides a ready to run Docker container with example notebooks and data, showcasing how you can utilize cuDF. +url = "https://github.com/plotly/datasets/raw/master/tips.csv" +content = requests.get(url).content.decode("utf-8") -## Installation +tips_df = pd.read_csv(StringIO(content)) +tips_df["tip_percentage"] = tips_df["tip"] / tips_df["total_bill"] * 100 +# display average tip by dining party size +print(tips_df.groupby("size").tip_percentage.mean()) +``` + +## Resources + +- [Try cudf.pandas now](https://nvda.ws/rapids-cudf): Explore `cudf.pandas` on a free GPU enabled instance on Google Colab! +- [Install](https://rapids.ai/start.html): Instructions for installing cuDF and other [RAPIDS](https://rapids.ai) libraries. +- [cudf (Python) documentation](https://docs.rapids.ai/api/cudf/stable/) +- [libcudf (C++/CUDA) documentation](https://docs.rapids.ai/api/libcudf/stable/) +- [RAPIDS Community](https://rapids.ai/community.html): Get help, contribute, and collaborate. + +## Installation ### CUDA/GPU requirements