diff --git a/CHANGELOG.md b/CHANGELOG.md index 3cdd0359b9..5206cd95cd 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,6 +7,10 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm ## [Development] +### Docs + +* New tutorial on GPU memory management and capacity planning in the GPU section + ## [0.34.17 - 2024-10-20] ### Added diff --git a/README.md b/README.md index d5dd94b921..bdb5422b4c 100644 --- a/README.md +++ b/README.md @@ -110,7 +110,7 @@ df = pd.DataFrame({ g1 = graphistry.edges(df, 'src', 'dst') # Override styling defaults -g1_styled = g1.encode_edge_color('friendship', as_continuous=True, ['blue', 'red']) +g1_styled = g1.encode_edge_color('friendship', ['blue', 'red'], as_continuous=True) # Connect: Free GPU accounts and self-hosting @ graphistry.com/get-started graphistry.register(api=3, username='your_username', password='your_password') diff --git a/demos/gfql/GPU_memory_consumption_tutorial.ipynb b/demos/gfql/GPU_memory_consumption_tutorial.ipynb new file mode 100644 index 0000000000..c79fe460f6 --- /dev/null +++ b/demos/gfql/GPU_memory_consumption_tutorial.ipynb @@ -0,0 +1,908 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "AnX7LqWMBnpu" + }, + "source": [ + "# How much GPU RAM do you need and how much data fits into a GPU task?\n", + "\n", + "## GPU memory size planning & data ratios for Parquet, Arrow, RAPIDS/cuDF, and Graphistry/GFQL\n", + "\n", + "Put too much data into a GPU or use a GPU without enough memory and things fall apart. Whatever GPU you pick, you may then want to partition your data to make sure it fits, but make partitions too small and now you risk only getting a fraction of the available GPU speedups.\n", + "\n", + "Achieving high performance with your GPUs often starts with navigating these questions.\n", + "\n", + "It is surprisingly simple in practice to stay within your GPU memory budget once you understand some common data ratios that occur at basic data pipeline phases.\n", + "\n", + "Using a representative activity logs dataset, we will work through a typical GPU ETL & analytics pipeline that starts all the way from disk:\n", + "\n", + "* Parquet (disk, compressed): 0.1-0.5X\n", + "* Arrow (CPU, in-memory): 0.2-1X\n", + "* **Pandas (CPU, in-memory): 1X <-- baseline**\n", + "* cuDF (GPU, in-memory): 0.2-1X\n", + "* **GPU compute operations (GPU): 0.2-1X <-- includes cuDF tabular queries and GFQL graph queries**\n", + "* Overall Peak Usage: 1-2X\n", + "* Variants: **Multi-GPU**, **multi-node**, and **AI+ML**\n", + "\n", + "Even before we begin, note that the above ratios already show GPU libraries typically consume a small fraction of the memory required by popular CPU-based libraries like Pandas: They're built with better performance in mind in general, not just because of GPU processing." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "LUbV0yN1Bs8C" + }, + "source": [ + "# Phase 1: Setup and Data Creation\n", + "\n", + "(Skip ahead to **The data** if you're just skimming)\n", + "\n", + "## Installs & imports\n", + "\n", + "Pandas (CPU), RAPIDS cuDF (GPU), PyGraphistry" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": { + "id": "Y7BonUbuGg0i" + }, + "outputs": [], + "source": [ + "! pip install -q graphistry" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "J9MZnkKPayUa" + }, + "outputs": [], + "source": [ + "# For freely testing on colab.research.google.com:\n", + "\n", + "# RAPIDS for Google Colab\n", + "# This get the RAPIDS-Colab install files and test check your GPU. Run this and the next cell only.\n", + "# Please read the output of this cell. If your Colab Instance is not RAPIDS compatible, it will warn you and give you remediation steps.\n", + "! git clone -q https://github.com/rapidsai/rapidsai-csp-utils.git\n", + "! python rapidsai-csp-utils/colab/pip-install.py > /dev/null 2>&1\n" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 35 + }, + "id": "4nLrk46BllED", + "outputId": "9fbb67c3-e95d-43ab-9bfd-369d4cfd07e0" + }, + "outputs": [ + { + "data": { + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "string" + }, + "text/plain": [ + "'24.10.01'" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import cudf\n", + "cudf.__version__" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": { + "id": "8JruAVXM9Z8N" + }, + "outputs": [], + "source": [ + "# Initialize RMM with a managed memory pool; this will automatically apply to cuDF allocations.\n", + "import cudf\n", + "import rmm\n", + "import rmm.statistics\n", + "rmm.reinitialize(pool_allocator=True, managed_memory=True)\n", + "rmm.statistics.enable_statistics()\n", + "\n", + "# Initialize NVML for direct GPU memory measurement\n", + "import pynvml\n", + "pynvml.nvmlInit()\n", + "handle = pynvml.nvmlDeviceGetHandleByIndex(0)" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": { + "id": "ayDVAset9aRP" + }, + "outputs": [], + "source": [ + "import pandas as pd\n", + "import numpy as np\n", + "import pyarrow as pa\n", + "import pyarrow.parquet as pq\n", + "import cudf\n", + "import graphistry\n", + "import matplotlib.pyplot as plt\n", + "import os\n", + "from graphistry import e, n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "uZBB6MPtB8nE" + }, + "source": [ + "### The data\n", + "One million simulated network traffic connection events with timestamped events (src_ip, dst_ip) representing graph edges\n" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 241 + }, + "id": "AjIecASF9fuh", + "outputId": "f2b2d1e6-4c68-4ec1-f944-306038cfa49a" + }, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + ":3: FutureWarning: 'S' is deprecated and will be removed in a future version, please use 's' instead.\n", + " \"timestamp\": pd.date_range(start=\"2023-01-01\", periods=rows, freq=\"S\"),\n" + ] + }, + { + "data": { + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "dataframe", + "variable_name": "df" + }, + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
timestampsrc_ipdst_ipevent_typebytes_transferred
02023-01-01 00:00:00192.168.1.610.0.0.216disconnect595
12023-01-01 00:00:01192.168.1.24710.0.0.73connect754
22023-01-01 00:00:02192.168.1.24410.0.0.32connect630
32023-01-01 00:00:03192.168.1.20410.0.0.207disconnect348
42023-01-01 00:00:04192.168.1.12110.0.0.219connect710
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "\n", + "
\n", + "
\n" + ], + "text/plain": [ + " timestamp src_ip dst_ip event_type \\\n", + "0 2023-01-01 00:00:00 192.168.1.6 10.0.0.216 disconnect \n", + "1 2023-01-01 00:00:01 192.168.1.247 10.0.0.73 connect \n", + "2 2023-01-01 00:00:02 192.168.1.244 10.0.0.32 connect \n", + "3 2023-01-01 00:00:03 192.168.1.204 10.0.0.207 disconnect \n", + "4 2023-01-01 00:00:04 192.168.1.121 10.0.0.219 connect \n", + "\n", + " bytes_transferred \n", + "0 595 \n", + "1 754 \n", + "2 630 \n", + "3 348 \n", + "4 710 " + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "rows = 1_000_000\n", + "data = {\n", + " \"timestamp\": pd.date_range(start=\"2023-01-01\", periods=rows, freq=\"S\"),\n", + " \"src_ip\": np.random.choice([f\"192.168.1.{i}\" for i in range(1, 256)], rows),\n", + " \"dst_ip\": np.random.choice([f\"10.0.0.{i}\" for i in range(1, 256)], rows),\n", + " \"event_type\": np.random.choice([\"connect\", \"disconnect\", \"data_transfer\"], rows),\n", + " \"bytes_transferred\": np.random.randint(0, 1000, rows),\n", + "}\n", + "df = pd.DataFrame(data)\n", + "df.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "FO2j6eCUB_e2" + }, + "source": [ + "## Phase 2: Measure space usage on-disk with Parquet and CPU in-memory with Pandas\n", + "\n", + "### 4X CPU in-memory compaction with Arrow\n", + "\n", + "The Apache Arrow in-memory computing table format makes analytics fast by **packing data into typed columns** (vs typical row-wise SQL, KV, graph, and log databases). A typical benefit is data also getting smaller\n", + "\n", + "### 20X disk compaction with Parquet\n", + "\n", + "Parquet adds **compression algorithms for each column**, giving another 5X multiple over Arrow\n", + "\n", + "Both Parquet and Arrow have their place. Arrow avoids compression for in-memory use to enable faster in-memory access. Parquet prioritizes compression for better disk storage." + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "t9Tb3rlp9p0n", + "outputId": "6fe38e72-07e1-4f66-9516-84e53a649326" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Pandas in-memory size: 209.00 MB\n", + "Arrow in-memory size: 57.36 MB\n", + "Parquet compressed size on disk: 9.57 MB\n" + ] + } + ], + "source": [ + "# Pandas Dataframe size\n", + "pandas_memory = df.memory_usage(index=True, deep=True).sum() / (1024**2)\n", + "\n", + "# Arrow Table size\n", + "arrow_table = pa.Table.from_pandas(df)\n", + "arrow_size = arrow_table.nbytes / (1024**2)\n", + "\n", + "# Parquet compressed size\n", + "pq_file_path = \"compressed_data.parquet\"\n", + "pq.write_table(arrow_table, pq_file_path, compression=\"SNAPPY\")\n", + "parquet_size = os.path.getsize(pq_file_path) / (1024**2)\n", + "\n", + "print(f\"Pandas in-memory size: {pandas_memory:.2f} MB\")\n", + "print(f\"Arrow in-memory size: {arrow_size:.2f} MB\")\n", + "print(f\"Parquet compressed size on disk: {parquet_size:.2f} MB\")\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "9O2WNshICBt3" + }, + "source": [ + "## Phase 3: Load data into the GPU with cuDF and PyGraphistry\n", + "\n", + "### 4X GPU compaction with cuDF\n", + "\n", + "`cuDF` is an open source GPU-based dataframe library that matches the Pandas API. Note that cuDF is Arrow-native, so the estimated GPU memory consumption exactly matches Apache Arrow. It maintains the 4X improvement over Pandas even without doing any compute.\n", + "\n", + "### 4X GPU compaction with PyGraphistry\n", + "\n", + "Graph users can automate transfering a graph's tables to the GPU via [g2 = g1.to_cudf()](https://pygraphistry.readthedocs.io/en/latest/api/compute.html#graphistry.compute.ComputeMixin.ComputeMixin.to_cudf), reaping the same benefits over a Pandas-based approach." + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "a2GZAzsP9tLJ", + "outputId": "ed8bf972-9a54-4c99-91e4-8bf01578bc53" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Total gdf size in memory: 57.36 MB\n" + ] + } + ], + "source": [ + "# Convert DataFrame to cuDF for operations\n", + "gdf = cudf.from_pandas(df)\n", + "\n", + "# Calculate the size of the gdf in memory\n", + "gdf_size_bytes = gdf.memory_usage(deep=True).sum()\n", + "gdf_size_mb = gdf_size_bytes / (1024**2) # Convert bytes to MB\n", + "print(f\"Total gdf size in memory: {gdf_size_mb:.2f} MB\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "### Pack in 10X+ more data for real workloads with GPU projections and higher CPU RAM\n", + "\n", + "It is convenient to move the entire dataframe to the GPU when there is a lot of room, so we recommend doing that during prototyping.\n", + "\n", + "However, 10X+ bigger workloads can often be easily handled on the same GPU just by mindful of which columns to use at the beginning:\n", + "\n", + "```python\n", + " # Only transfer 2 columns from df to the GPU\n", + " df2 = cudf.from_pandas(df[['src_ip', 'dst_ip']])\n", + "```\n", + "\n", + "CPU RAM is often cheaper than GPU RAM, so you may want your CPU to have 1-4X more RAM than your GPUs\n", + "\n", + "### Off-GPU IO Speeds\n", + "\n", + "To handle bigger-than-memory datasets, it helps to keep in mind that data travels through different speed devices as it goes through disk to GPU:\n", + "\n", + "It helps to pair your GPU RAM with even more (cheaper) CPU RAM or disk:\n", + "* Individual SSDs can do 1-5 GB/s, and arrays of them can do 100GB+/s\n", + "* Consumer speeds for disk->CPU and CPU->GPU are around 32 GB/s per 1-2 GPUs via PCIe 4.0\n", + "* Server-grade are often PCIe 5.0 at 64 GB/s per 1-2 GPUs\n", + "\n", + "For advanced setups, such as for going at 100 GB/s on 1-2 GPUs, see our recorded Dask Summit talk on [100GB/s GPU Log Analytics at Graphistry](https://www.youtube.com/watch?v=8ZMzsTbfImU). It reviews broad concepts, architecture, and tricks like skipping the convoluted CPU path via [GPU Direct](https://developer.nvidia.com/gpudirect).\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "x_DJXrTFCELq" + }, + "source": [ + "## Phase 4: GPU Computation - Simple Task and GFQL Traversal\n", + "\n", + "CPU and GPU programs need extra memory on top of the input data structure memory in order to create intermediate data structures. This is often 1-5X the input data size.\n", + "\n", + "### Step A: Simple GPU computation for memory baseline\n", + "\n", + "We see simple `cuDF` dataframe methods like filtering and joining are optimized, so both take < 1X the original input size. Later, however, we will see peek is multiples higher." + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "A-dqwB8fEFGM", + "outputId": "9937d466-7d0b-4a02-b34d-3a1e778f59a0" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Filter & Sum Operation Memory Peak: 31.16 MB\n", + "Join Operation on Subset Memory Peak: 47.92 MB\n" + ] + } + ], + "source": [ + "# Synchronize to ensure a clean memory state before starting\n", + "cudf.cuda.current_context().synchronize()\n", + "\n", + "with rmm.statistics.profiler(name=\"Filter and Sum Operation\"):\n", + " filtered = gdf[gdf[\"event_type\"] == \"data_transfer\"]\n", + " total_bytes = filtered[\"bytes_transferred\"].sum()\n", + "\n", + "subset_gdf = gdf.head(10000) # Smaller subset to avoid large memory requirements\n", + "with rmm.statistics.profiler(name=\"Join Operation on Subset\"):\n", + " joined = subset_gdf.merge(subset_gdf, on=\"src_ip\", how=\"inner\")\n", + "\n", + "\n", + "filter_sum_stats = rmm.statistics.default_profiler_records.records[\"Filter and Sum Operation\"]\n", + "filter_sum_peak_mb = filter_sum_stats.memory_peak / (1024**2)\n", + "print(f\"Filter & Sum Operation Memory Peak: {filter_sum_peak_mb: .2f} MB\")\n", + "\n", + "join_stats = rmm.statistics.default_profiler_records.records[\"Join Operation on Subset\"]\n", + "join_peak_mb = join_stats.memory_peak / (1024**2)\n", + "print(f\"Join Operation on Subset Memory Peak: {join_peak_mb: .2f} MB\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "S-fDu2YkCG8L" + }, + "source": [ + "### Step B: GPU Graph analytics with GFQL - 2-hop Traversal\n", + "\n", + "The example below is a 2-hop traversal in PyGraphistry's GFQL in `cuDF` GPU engine mode, including filtering for \"data_transfer\" events and >500 bytes\n", + "\n", + "Graph queries are more like a sequence of database operators, so unsurprisingly, so we see not only the speed benefits of `cuDF` but the memory benefits too. The memory is essentially the sum of the optimized operators used.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "MbsM-G0r_NI3", + "outputId": "e169e94e-697b-4510-8537-718e1a4e4d51" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "GFQL 2-hop Traversal Memory Peak: 80.58 MB\n" + ] + } + ], + "source": [ + "# Step 4: Profile the GFQL 2-hop Traversal\n", + "g1 = graphistry.edges(gdf, 'src_ip', 'dst_ip') # Example edge specification for Graphistry\n", + "with rmm.statistics.profiler(name=\"GFQL 2-hop Traversal\"):\n", + " g2 = g1.chain([\n", + " n(),\n", + " e(edge_match={'event_type': 'data_transfer'},\n", + " edge_query=\"bytes_transferred > 500\"),\n", + " n()\n", + " ])\n", + "\n", + "gfql_stats = rmm.statistics.default_profiler_records.records[\"GFQL 2-hop Traversal\"]\n", + "gfql_peak_mb = gfql_stats.memory_peak / (1024**2)\n", + "print(f\"GFQL 2-hop Traversal Memory Peak: {gfql_peak_mb: .2f} MB\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "vyiNm_i0CMR_" + }, + "source": [ + "## Comparison chart\n", + "\n", + "Let's put it all together to see each phase side by side: dataset sizes, and extra intermediate memory\n", + "\n", + "Fascinatingly, the GPU version was able to do both store the data and and compute on it while taking less memory than the Pandas needed to just make the initial data structure without yet doing anything on top\n", + "\n", + "In a large-scale production scenario, we would likely aim for another 10X+ by being targeted on which columns to put on the GPU and when to retire intermediate structures" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 807 + }, + "id": "CHRekjjdRDnK", + "outputId": "12d4ad7e-c585-4445-f253-190c9785badf" + }, + "outputs": [ + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "import matplotlib.pyplot as plt\n", + "\n", + "\n", + "# Labels and sizes for the bar chart\n", + "labels = [\n", + " 'Parquet (disk)',\n", + " 'Arrow (in-mem)',\n", + " 'Pandas (in-mem)',\n", + " 'cuDF (GPU)',\n", + " '+ Filter & Sum (GPU)',\n", + " '+ Join (GPU)',\n", + " '+ GFQL 2-hop (GPU)',\n", + " 'Overall Peak (GPU)'\n", + "]\n", + "sizes = [\n", + " parquet_size,\n", + " arrow_size,\n", + " pandas_memory,\n", + " gdf_size_mb,\n", + " filter_sum_peak_mb,\n", + " join_peak_mb,\n", + " gfql_peak_mb,\n", + " overall_peak_mb\n", + "]\n", + "\n", + "colors = ['#1f77b4', '#6baed6', '#9ecae1', '#2ca02c', '#ff7f0e', '#9467bd', '#8c564b', '#000000']\n", + "\n", + "plt.figure(figsize=(14, 8))\n", + "bars = plt.bar(labels, sizes, color=colors, edgecolor='black')\n", + "\n", + "# Add labels and title with a modern font size\n", + "plt.ylabel('Memory Usage (MB)', fontsize=14)\n", + "plt.title('Memory Usage Comparison', fontsize=16, fontweight='bold')\n", + "plt.xticks(rotation=45, ha=\"right\", fontsize=12) # Rotate and size labels for readability\n", + "plt.yticks(fontsize=12) # Increase y-axis label font size for consistency\n", + "plt.tight_layout()\n", + "\n", + "# Add value labels on top of each bar with a cleaner font style\n", + "for bar, size in zip(bars, sizes):\n", + " plt.text(\n", + " bar.get_x() + bar.get_width() / 2,\n", + " bar.get_height(),\n", + " f'{size:.2f} MB',\n", + " ha='center',\n", + " va='bottom',\n", + " fontsize=11,\n", + " fontweight='medium',\n", + " color='darkblue' # Softer label color for contrast\n", + " )\n", + "\n", + "# Display the plot\n", + "plt.show()\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "NOIMAt0XMiWo" + }, + "source": [ + "## Takeaways, multi-GPU/multi-node, and ML+AI\n", + "\n", + "The chart reveals several key understandings:\n", + "\n", + "### Parquet is a great on-disk format\n", + "\n", + "It is great at compressing big tables, taking a fraction of the in-memory dataset representations\n", + "\n", + "### The RAPIDS (cuDF) is a great in-memory format\n", + "\n", + "Its initial GPU memory allocation sizes matches the Apache Arrow CPU in-memory size\n", + "\n", + "### Memory consumption is a multiple over the input data siz\n", + "\n", + "Computing takes extra space. 1X-5X additional GPU RAM was needed to compute on the dataframe than just storing it.\n", + "\n", + "### Single-GPU\n", + "\n", + "We recommend assuming 10X+ in-memory size needed than the size of compressed Parquet on disk\n", + "\n", + "### Multi-GPU, bigger-than-memory, & dask-cudf\n", + "\n", + "When manually chunking big datasets, such as for bigger-than-memory compute or spreading data across multiple GPUs, or automatically via Dask, we generally recommend 1GB+ chunks. This is~10X bigger than CPU Dask tasks because GPUs are more throughput-oriented in general. You can see our Dask Distributed Summit talk on [100GB/s GPU Log Analytics at Graphistry](https://www.youtube.com/watch?v=8ZMzsTbfImU) for more methodology here\n", + "\n", + "### AI/ML Workloads\n", + "\n", + "Modern data science libraries like PyGraphistry's [g.umap()](https://pygraphistry.readthedocs.io/en/latest/gfql/combo.html#umap-fit-transform-for-scaling) use GPUs and learning to scale:\n", + "\n", + "#### Training\n", + "\n", + "Often called `fit()`, GPU systems can often make AI/ML training phases handle 10X more data within your time budgets. As you generally do not train on all your data, this means a 10X+ bigger sample set for a higher-fidelity and more representative model.\n", + "\n", + "#### Inferencing\n", + "\n", + "Often called `transform()`, inference applies a trained model to the rest of your data. This is more scalable than fitting your entire data, so a massive speedup. With GPUs, this goes faster too, essentially matching your GPU budget.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "a0c5WPXeSBa6" + }, + "source": [ + "\n", + "## Next steps\n", + "\n", + "We're preparing follow-on articles on more performance intuitions in general and deeper on the technologies discussed here, including how to more carefully measure your own workloads\n", + "\n", + "Meanwhile, you may find these useful as well:\n", + "\n", + "* [100GB/s GPU Log Analytics at Graphistry](https://www.youtube.com/watch?v=8ZMzsTbfImU) recorded talk at Dask Distributed Summit\n", + "* [PyGraphistry](https://pygraphistry.readthedocs.io/en/latest/10min.html) GPU-accelerated visual graph analytics\n", + "* [PyGraphistry GPU umap()](https://pygraphistry.readthedocs.io/en/latest/gfql/combo.html#umap-fit-transform-for-scaling) for visual graph AI\n", + "* The open source [GFQL dataframe-native graph query language](https://pygraphistry.readthedocs.io/en/latest/gfql/index.html) with optional GPU mode\n", + "* Try for yourself at [Graphistry Hub](https://www.graphistry.com/get-started)\n", + "\n" + ] + } + ], + "metadata": { + "accelerator": "GPU", + "colab": { + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} diff --git a/docs/source/notebooks/gpu.rst b/docs/source/notebooks/gpu.rst index a57ee7ec2f..52293d5575 100644 --- a/docs/source/notebooks/gpu.rst +++ b/docs/source/notebooks/gpu.rst @@ -10,3 +10,4 @@ GPU GPU II: cuDF <../demos/demos_databases_apis/gpu_rapids/part_ii_gpu_cudf.ipynb> GPU IV: cuML UMAP <../demos/demos_databases_apis/gpu_rapids/part_iv_gpu_cuml.ipynb> GPU V: cuGraph <../demos/demos_databases_apis/gpu_rapids/cugraph.ipynb> + GPU Memory Planning <../demos/gfql/GPU_memory_consumption_tutorial.ipynb>