Skip to content
David Sorber edited this page Aug 3, 2021 · 16 revisions

GNU Radio Accelerator Device Support Project

Description

GNU Radio provides a flexible block-based interface for signal processing tasks. Historically GNU Radio signal processing blocks have been written in software but there is increasing need to offload complex signal processing algorithms to accelerator devices including GPUs, FPGAs, and DSPs. Many accelerated blocks have been created using GNU Radio's block interface but these blocks require manual handling of data movement to and from the accelerator device. The purpose of this project is to add accelerator device support directly to GNU Radio.

Project Goals

  • Maintain backwards compatibility with all existing blocks (both in-tree and from OOT modules)
  • Create flexible interface for creating "custom buffers" to support accelerated devices
    • Custom buffer interface provides necessary hooks to allow the scheduler to handle data movement
  • Provide infrastructure to support "insert signal processing here" paradigm for common accelerator devices such as NVidia GPUs

High-level Plan

  • Milestone 1 - completed: May 11, 2021
    • refactor existing code and create single mapped buffer abstraction
    • support single accelerated block (block responsible for data movement)
    • simple custom buffer interface
  • Milestone 2 - completed: August 5, 2021
    • support multiple accelerated blocks with zero-copy between
    • more flexible custom buffer interface (scheduler handles data movement)

Overview and Usage

double copy

GNU Radio's block-based interface is very flexible and has allowed users to create their own accelerated blocks for some time. However, this approach has some limitations. In particular if the accelerator device requires special (DMA) buffers for data movement, then the accelerator block must copy data from the GNU Radio buffer into the device's buffer on the input path and vice versa on the output path. This process is inefficient and is known as the "double copy" problem as shown in the diagram above. Furthermore, in addition to the double copy inefficiency, accelerated blocks written in this fashion require the writer to manage data movement explicitly. While this is doable it may be challenging novice and off-putting for a user that wishes to concentrate on implementing a signal processing algorithm. The new accelerated block interface changes address both of these issues while (very importantly) maintaining backwards compatibility for all existing GNU Radio blocks.

Supporting Code

The accelerated block interface changes currently live in this repository however the intention is to upstream these changes into GNU Radio. The following repositories contain supporting code that is also intended to be upstreamed to the project but not directly into the main GNU Radio repository itself (NOTE: both of the repositories below require the accelerated block interface changes, also called "ngsched"):

  • gr-cuda_buffer - This repository contains an OOT module containing the cuda_buffer class which is a "custom buffer" supporting the CUDA runtime for NVidia GPUs. This module is intended to be a base implementation for CUDA blocks and can be used directly when writing CUDA accelerated blocks for NVidia GPUs.
  • gr-blnxngsched - This repository contains an OOT module containing various examples of the accelerated block interface (aka "ngsched") changes. These blocks are described in a additional detail in the "Examples" section below. Note that the CUDA-related blocks in this OOT require cuda_buffer from gr-cuda-buffer.

Examples

  • custom_buffer -
  • custom_buf_loopback -
  • cuda_fanout -
  • cuda_loopback -
  • cuda_mult -
  • mixed_2_port_loopback -

How to Use a Custom Buffer

The following instructions illustrate how to write a block using a "custom buffer". The instructions use cuda_buffer from gr-cuda-buffer for example purposes but the same general concepts can be applied to any custom buffer.

Detailed Design

TODO

Single Mapped Buffer Abstraction

Buffer Type

Replace Upstream

Callback Functions

Custom Lock Interface

host_bufer Class

Clone this wiki locally