Add handling of parameter references in Sphinx documentation (#5707)

Utilize sphinx_paramlinks plugin which adds: * link target for every parameter * new :paramref: directive Add hook that automatically injects :paramref: before every single-backticked parameter reference. The references are validated against function signature. If the signature is unavailable than the whole step is skipped. Numpydocify the docstring syntax by removing backticks around params in Python docs and in the numpydoc documentation generator for operators. Signed-off-by: Krzysztof Lecki <[email protected]>
NVIDIA · Nov 19, 2024 · b543839 · b543839
1 parent 7fd3876
commit b543839
Show file tree

Hide file tree

Showing 15 changed files with 237 additions and 169 deletions.
diff --git a/dali/pipeline/operator/op_schema.h b/dali/pipeline/operator/op_schema.h
@@ -140,7 +140,7 @@ class DLL_PUBLIC OpSchema {
    * only the first `min` inputs are considered mandatory, the rest are optional
    *
    * Will generate entry in `Args` section using numpydoc style:
-   * `name`: type_doc
+   * name : type_doc
    *     doc
    */
   DLL_PUBLIC OpSchema &InputDox(int index, const string &name, const string &type_doc,
@@ -158,11 +158,11 @@ class DLL_PUBLIC OpSchema {
    * """
    * Args
    * ----
-   * `input0`: Type of input
+   * input0 : Type of input
    *     This is the first input
-   * `input1`: TensorList of some kind
+   * input1 : TensorList of some kind
    *     This is second input
-   * `optional_input`: TensorList, optional
+   * optional_input : TensorList, optional
    *     This is optional input
    *
    * If the `append_kwargs_section` is true, the docstring generator will append the Keyword args

diff --git a/dali/python/nvidia/dali/_multiproc/messages.py b/dali/python/nvidia/dali/_multiproc/messages.py
@@ -1,4 +1,4 @@
-# Copyright (c) 2020-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# Copyright (c) 2020-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
@@ -24,17 +24,17 @@ class ShmMessageDesc(Structure):
     It describes placement (shared memory chunk, offset etc.) of actual data to be read
     by the receiver of the `ShmMessageDesc` instance.
     ----------
-    `worker_id` : int
+    worker_id : int
         Integer identifying a process that put the message, number from [0, num_workers) range
         for workers or -1 in case of a main process.
-    `shm_chunk_id` : int
+    shm_chunk_id : int
         Integer identifying shm chunk that contains pickled data to be read by the receiver
-    `shm_capacity` : unsigned long long int
+    shm_capacity : unsigned long long int
         Size of the `shm_chunk_id` chunk, receiver should resize the mapping if the chunk
         was resized by the writer.
-    `offset` : unsigned long long int
+    offset : unsigned long long int
         Offset in the shm chunk where the serialized message starts
-    `num_bytes` : unsigned long long int
+    num_bytes : unsigned long long int
         Size in bytes of the serialized message
     """
 
@@ -51,23 +51,23 @@ class WorkerArgs:
     """
     Pack of parameters passed to the worker process on initialization.
     ----------
-    `worker_id` : Ordinal of the worker in the workers pool
-    `start_method` : Python's multiprocessing start method - `spawn` or `fork`
-    `source_descs` : Dictionary with External Source's SourceDescription instances as values.
+    worker_id : Ordinal of the worker in the workers pool
+    start_method : Python's multiprocessing start method - `spawn` or `fork`
+    source_descs : Dictionary with External Source's SourceDescription instances as values.
         Keys are ordinals corresponding to the order in which callbacks were passed to the pool.
         If `callback_pickler` is not None, actual callback in SourceDescription is replaced
         with result of its serialization.
-    `shm_chunks` : list of BufShmChunk instances that describes all the shared memory chunks
+    shm_chunks : list of BufShmChunk instances that describes all the shared memory chunks
         available to the worker (they are identified by ids unique inside the pool).
-    `general_task_queue` : Optional[ShmQueue]
+    general_task_queue : Optional[ShmQueue]
         Queue with tasks for sources without dedicated worker
         or None if all sources have dedicated worker
-    `dedicated_task_queue`: Optional[ShmQueue]
+    dedicated_task_queue : Optional[ShmQueue]
         Queue with tasks for sources that are run solely in the given worker.
         If `dedicated_task_queue` is None, `general_task_queue` must be provided.
-    `result_queue`: ShmQueue
+    result_queue : ShmQueue
         Queue to report any task done, no matter if dedicated or general.
-    `setup_socket` : Optional[socket]
+    setup_socket : Optional[socket]
         Python wrapper around Unix socket used to pass file descriptors identifying
         shared memory chunk to child process. None if `start_method='fork'`
     `callback_pickler`
@@ -189,15 +189,15 @@ class ScheduledTask:
 
     Parameters
     ----------
-    `context_i` : int
+    context_i : int
         Index identifying the callback in the order of parallel callbacks passed to pool.
-    `scheduled_i` : int
+    scheduled_i : int
         Ordinal of the batch that tasks list corresponds to.
-    `epoch_start` : int
+    epoch_start : int
         The value is increased every time the corresponding context is reset,
         this way worker can know if the new epoch started, and if it can restart
         iterator that raised StopIteration but is set to cycle=raise.
-    `task` : TaskArgs
+    task : TaskArgs
         Describes the minibatch that should be computed by the worker. If the given source
         is run in batch mode this simply wraps parameters that external source would pass to
         the source in non-parallel mode. In sample mode, it is (part of) the list
@@ -217,16 +217,16 @@ class CompletedTask:
 
     Parameters
     ----------
-    `worker_id` : int
+    worker_id : int
         Id of the worker that completed the task.
-    `context_i` : int
+    context_i : int
         Index identifying the callback in the order of parallel callbacks passed to pool.
-    `scheduled_i` : int
+    scheduled_i : int
         Ordinal of the batch that tasks corresponds to.
-    `minibatch_i` : int
+    minibatch_i : int
         Computation of batch might be split into number of minibatches, this is the number
         that identifies which consecutive part of the batch it is.
-    `batch_meta` :  nvidia.dali._multiproc.shared_batch.SharedBatchMeta
+    batch_meta :  nvidia.dali._multiproc.shared_batch.SharedBatchMeta
         Serialized result of the task.
     `exception`
         Exception if the task failed.

diff --git a/dali/python/nvidia/dali/_multiproc/pool.py b/dali/python/nvidia/dali/_multiproc/pool.py
@@ -1,4 +1,4 @@
-# Copyright (c) 2020-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# Copyright (c) 2020-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
@@ -539,13 +539,13 @@ class Observer:
     Closes the whole pool of worker processes if any of the processes exits. The processes can also
     be closed from the main process by calling observer `close` method.
     ----------
-    `mp` : Python's multiprocessing context (depending on start method used: `spawn` or `fork`)
-    `processes` : List of multiprocessing Process instances
-    `task_queues` : List[ShmQueue]
+    mp : Python's multiprocessing context (depending on start method used: `spawn` or `fork`)
+    processes : List of multiprocessing Process instances
+    task_queues : List[ShmQueue]
         Queues that worker processes take tasks from. If `close` method is called and none of
         the processes exited abruptly so far, the queues will be used to notify the workers about
         closing to let the workers gracefully exit.
-    `result_queue` : ShmQueue
+    result_queue : ShmQueue
         Queue where worker processes report completed tasks. It gets closed along with the worker
         processes, to prevent the main process blocking on waiting for results from the workers.
     """
@@ -627,9 +627,9 @@ def __init__(self, contexts: List[CallbackContext], pool: ProcPool):
         """
         Parameters
         ----------
-        `contexts` : List[CallbackContext]
+        contexts : List[CallbackContext]
             List of callbacks' contexts to be handled by the Worker.
-        `pool` : ProcPool
+        pool : ProcPool
             ProcPool instance enabling basic communication with worker processes, it should be
             initialized with `contexts`.
         """
@@ -659,21 +659,21 @@ def from_groups(
 
         Parameters
         ----------
-        `groups` : _ExternalSourceGroup list
+        groups : _ExternalSourceGroup list
             List of external source groups.
-        `keep_alive_queue_size` : int
+        keep_alive_queue_size : int
             Number of the most recently produced batches whose underlying shared memory should
             remain untouched (because they might still be referenced further in the pipeline).
             Note that the actual number of simultaneously kept batches will be greater by the length
             of parallel external source prefetching queue which is at least one.
-        `batch_size` : int, optional
+        batch_size : int, optional
             Maximal batch size. For now, used only to estimate initial capacity of virtual
             memory slots.
-        `start_method` : str
+        start_method : str
             Method of starting worker processes, either fork or spawn.
-        `num_workers` : int
+        num_workers : int
             Number of workers to be created in ProcPool.
-        `min_initial_chunk_size` : int
+        min_initial_chunk_size : int
             Minimal initial size of each shared memory chunk.
             NOTE it must be enough to accommodate serialized `ScheduledTask` instance.
         """
@@ -764,10 +764,10 @@ def schedule_batch(self, context_i, work_batch: TaskArgs):
 
         Parameters
         ----------
-        `context_i` : int
+        context_i : int
             Specifies which callback will be used to run the task, it must be the index
             corresponding to the order of callbacks passed when constructing WorkerPool.
-        `work_batch` : TaskArgs
+        work_batch : TaskArgs
             Wrapper around parameters produced by the ExternalSource describing the next batch.
         """
         context = self.contexts[context_i]
@@ -834,7 +834,7 @@ def receive_batch(self, context_i):
 
         Parameters
         ----------
-        `context_i` : int
+        context_i : int
             Specifies which callback you want the results from, ordering corresponds to the order of
             callbacks passed when constructing the pool.
         """

diff --git a/dali/python/nvidia/dali/_multiproc/shared_mem.py b/dali/python/nvidia/dali/_multiproc/shared_mem.py
@@ -1,4 +1,4 @@
-# Copyright (c) 2020-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# Copyright (c) 2020-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
@@ -30,9 +30,9 @@ class SharedMem:
 
     Parameters
     ----------
-    `handle` : int
+    handle : int
         Handle identifying related shared memory object. Pass None to allocate new memory chunk.
-    `size` : int
+    size : int
         When handle=None it is the size of shared memory to allocate in bytes, otherwise it must be
         the size of shared memory objects that provided handle represents.
     """
@@ -59,7 +59,7 @@ def allocate(cls, size):
 
         Parameters
         ----------
-        `size` : int
+        size : int
             Number of bytes to allocate.
         """
         return cls(None, size)
@@ -71,9 +71,9 @@ def open(cls, handle, size):
 
         Parameters
         ----------
-        `handle`: int
+        handle : int
             Handle pointing to already existing shared memory chunk.
-        `size` : int
+        size : int
             Size of the existing shared memory chunk.
         """
         instance = cls(handle, size)

diff --git a/dali/python/nvidia/dali/external_source.py b/dali/python/nvidia/dali/external_source.py
@@ -319,7 +319,7 @@ class ExternalSource:
     Args
     ----
 
-    `source` : callable or iterable
+    source : callable or iterable
         The source of the data.
 
         The source is polled for data (via a call ``source()`` or ``next(source)``)
@@ -369,7 +369,7 @@ class ExternalSource:
             (accepting :class:`nvidia.dali.types.SampleInfo`, :class:`nvidia.dali.types.BatchInfo`
             or batch index) will be resumed from the epoch and iteration saved in the checkpoint.
 
-    `num_outputs` : int, optional
+    num_outputs : int, optional
         If specified, denotes the number of TensorLists that are produced by the source function.
 
         If set, the operator returns a list of ``DataNode`` objects, otherwise a single ``DataNode``
@@ -378,7 +378,7 @@ class ExternalSource:
     Keyword Args
     ------------
 
-    `cycle`: string or bool, optional
+    cycle : string or bool, optional
         Specifies if and how to cycle through the source.
         It can be one of the following values:
 
@@ -395,20 +395,20 @@ class ExternalSource:
 
         Specifying ``"raise"`` can be used with DALI iterators to create a notion of epoch.
 
-    `name` : str, optional
+    name : str, optional
         The name of the data node.
 
         Used when feeding the data with a call to  ``feed_input`` and can be omitted if
         the data is provided by ``source``.
 
-    `layout` : :ref:`layout str<layout_str_doc>` or list/tuple thereof, optional
+    layout : :ref:`layout str<layout_str_doc>` or list/tuple thereof, optional
         If provided, sets the layout of the data.
 
         When ``num_outputs > 1``, the layout can be a list that contains a distinct layout
         for each output. If the list has fewer than ``num_outputs`` elements, only
         the first outputs have the layout set, the rest of the outputs don't have a layout set.
 
-    `dtype` : `nvidia.dali.types.DALIDataType` or list/tuple thereof, optional
+    dtype : `nvidia.dali.types.DALIDataType` or list/tuple thereof, optional
         Input data type.
 
         When ``num_outputs > 1``, the ``dtype`` can be a list that contains a distinct
@@ -420,7 +420,7 @@ class ExternalSource:
 
         This argument will be required starting from DALI 2.0.
 
-    `ndim` : int or list/tuple thereof, optional
+    ndim : int or list/tuple thereof, optional
         Number of dimensions in the input data.
 
         When ``num_outputs > 1``, the ``ndim`` can be a list that contains a distinct value for each
@@ -434,7 +434,7 @@ class ExternalSource:
 
         Specifying the input dimensionality will be required starting from DALI 2.0
 
-    `cuda_stream` : optional, ``cudaStream_t`` or an object convertible to ``cudaStream_t``,
+    cuda_stream : optional, ``cudaStream_t`` or an object convertible to ``cudaStream_t``,
         such as ``cupy.cuda.Stream`` or ``torch.cuda.Stream``
         The CUDA stream is used to copy data to the GPU or from a GPU source.
 
@@ -453,19 +453,19 @@ class ExternalSource:
         buffer is complete, since there's no way to synchronize with this stream to prevent
         overwriting the array with new data in another stream.
 
-    `use_copy_kernel` : bool, optional
+    use_copy_kernel : bool, optional
         If set to True, DALI will use a CUDA kernel to feed the data
         instead of cudaMemcpyAsync (default).
 
         .. note::
             This is applicable only when copying data to and from GPU memory.
 
-    `blocking`: bool, optional
+    blocking : bool, optional
         **Advanced** If True, this operator will block until the data is available
         (e.g. by calling ``feed_input``).  If False, the operator will raise an error,
         if the data is not available.
 
-    `no_copy` : bool, optional
+    no_copy : bool, optional
         Determines whether DALI should copy the buffer when feed_input is called.
 
         If set to True, DALI passes the user memory directly to the pipeline, instead of copying it.
@@ -485,20 +485,20 @@ class ExternalSource:
 
         Automatically set to ``True`` when ``parallel=True``
 
-    `batch` : bool, optional
+    batch : bool, optional
         If set to True or None, the ``source`` is expected to produce an entire batch at once.
         If set to False, the ``source`` is called per-sample.
 
         Setting ``parallel`` to True automatically sets ``batch`` to False if it was not provided.
 
-    `batch_info` : bool, optional, default = False
+    batch_info : bool, optional, default = False
         Controls if a callable ``source`` that accepts an argument and returns batches
         should receive class:`~nvidia.dali.types.BatchInfo` instance or just an
         integer representing the iteration number.
         If set to False (the default), only the integer is passed. If ``source`` is not callable,
         does not accept arguments or ``batch`` is set to False, setting this flag has no effect.
 
-    `parallel` : bool, optional, default = False
+    parallel : bool, optional, default = False
         If set to True, the corresponding pipeline will start a pool of Python workers to run the
         callback in parallel. You can specify the number of workers by passing ``py_num_workers``
         into pipeline's constructor.
@@ -566,7 +566,7 @@ class ExternalSource:
             Python process, but due to their state it is not possible to calculate more
             than one batch at a time.
 
-    `repeat_last` : bool, optional, default = False
+    repeat_last : bool, optional, default = False
         .. note::
             This is an advanced setting that is usable mainly with Triton Inference Server
             with decoupled models.
@@ -581,11 +581,11 @@ class ExternalSource:
         is incompatible with specifying the ``source``, which makes the ``external_source``
         operate in "pull" mode.
 
-    `prefetch_queue_depth` : int, optional, default = 1
+    prefetch_queue_depth : int, optional, default = 1
         When run in ``parallel=True`` mode, specifies the number of batches to be computed in
         advance and stored in the internal buffer, otherwise parameter is ignored.
 
-    `bytes_per_sample_hint`: int, optional, default = None
+    bytes_per_sample_hint : int, optional, default = None
         If specified in ``parallel=True`` mode, the value serves as a hint when
         calculating initial capacity of shared memory slots used by the worker
         processes to pass parallel external source outputs to the pipeline. The argument