Skip to content

Commit

Permalink
Add handling of parameter references in Sphinx documentation (#5707)
Browse files Browse the repository at this point in the history
Utilize sphinx_paramlinks plugin which adds:
* link target for every parameter
* new :paramref: directive

Add hook that automatically injects :paramref: before every
single-backticked parameter reference.

The references are validated against function signature. If the
signature is unavailable than the whole step is skipped.

Numpydocify the docstring syntax by removing backticks around
params in Python docs and in the numpydoc documentation
generator for operators.

Signed-off-by: Krzysztof Lecki <[email protected]>
  • Loading branch information
klecki authored Nov 19, 2024
1 parent 7fd3876 commit b543839
Show file tree
Hide file tree
Showing 15 changed files with 237 additions and 169 deletions.
8 changes: 4 additions & 4 deletions dali/pipeline/operator/op_schema.h
Original file line number Diff line number Diff line change
Expand Up @@ -140,7 +140,7 @@ class DLL_PUBLIC OpSchema {
* only the first `min` inputs are considered mandatory, the rest are optional
*
* Will generate entry in `Args` section using numpydoc style:
* `name`: type_doc
* name : type_doc
* doc
*/
DLL_PUBLIC OpSchema &InputDox(int index, const string &name, const string &type_doc,
Expand All @@ -158,11 +158,11 @@ class DLL_PUBLIC OpSchema {
* """
* Args
* ----
* `input0`: Type of input
* input0 : Type of input
* This is the first input
* `input1`: TensorList of some kind
* input1 : TensorList of some kind
* This is second input
* `optional_input`: TensorList, optional
* optional_input : TensorList, optional
* This is optional input
*
* If the `append_kwargs_section` is true, the docstring generator will append the Keyword args
Expand Down
46 changes: 23 additions & 23 deletions dali/python/nvidia/dali/_multiproc/messages.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright (c) 2020-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# Copyright (c) 2020-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand All @@ -24,17 +24,17 @@ class ShmMessageDesc(Structure):
It describes placement (shared memory chunk, offset etc.) of actual data to be read
by the receiver of the `ShmMessageDesc` instance.
----------
`worker_id` : int
worker_id : int
Integer identifying a process that put the message, number from [0, num_workers) range
for workers or -1 in case of a main process.
`shm_chunk_id` : int
shm_chunk_id : int
Integer identifying shm chunk that contains pickled data to be read by the receiver
`shm_capacity` : unsigned long long int
shm_capacity : unsigned long long int
Size of the `shm_chunk_id` chunk, receiver should resize the mapping if the chunk
was resized by the writer.
`offset` : unsigned long long int
offset : unsigned long long int
Offset in the shm chunk where the serialized message starts
`num_bytes` : unsigned long long int
num_bytes : unsigned long long int
Size in bytes of the serialized message
"""

Expand All @@ -51,23 +51,23 @@ class WorkerArgs:
"""
Pack of parameters passed to the worker process on initialization.
----------
`worker_id` : Ordinal of the worker in the workers pool
`start_method` : Python's multiprocessing start method - `spawn` or `fork`
`source_descs` : Dictionary with External Source's SourceDescription instances as values.
worker_id : Ordinal of the worker in the workers pool
start_method : Python's multiprocessing start method - `spawn` or `fork`
source_descs : Dictionary with External Source's SourceDescription instances as values.
Keys are ordinals corresponding to the order in which callbacks were passed to the pool.
If `callback_pickler` is not None, actual callback in SourceDescription is replaced
with result of its serialization.
`shm_chunks` : list of BufShmChunk instances that describes all the shared memory chunks
shm_chunks : list of BufShmChunk instances that describes all the shared memory chunks
available to the worker (they are identified by ids unique inside the pool).
`general_task_queue` : Optional[ShmQueue]
general_task_queue : Optional[ShmQueue]
Queue with tasks for sources without dedicated worker
or None if all sources have dedicated worker
`dedicated_task_queue`: Optional[ShmQueue]
dedicated_task_queue : Optional[ShmQueue]
Queue with tasks for sources that are run solely in the given worker.
If `dedicated_task_queue` is None, `general_task_queue` must be provided.
`result_queue`: ShmQueue
result_queue : ShmQueue
Queue to report any task done, no matter if dedicated or general.
`setup_socket` : Optional[socket]
setup_socket : Optional[socket]
Python wrapper around Unix socket used to pass file descriptors identifying
shared memory chunk to child process. None if `start_method='fork'`
`callback_pickler`
Expand Down Expand Up @@ -189,15 +189,15 @@ class ScheduledTask:
Parameters
----------
`context_i` : int
context_i : int
Index identifying the callback in the order of parallel callbacks passed to pool.
`scheduled_i` : int
scheduled_i : int
Ordinal of the batch that tasks list corresponds to.
`epoch_start` : int
epoch_start : int
The value is increased every time the corresponding context is reset,
this way worker can know if the new epoch started, and if it can restart
iterator that raised StopIteration but is set to cycle=raise.
`task` : TaskArgs
task : TaskArgs
Describes the minibatch that should be computed by the worker. If the given source
is run in batch mode this simply wraps parameters that external source would pass to
the source in non-parallel mode. In sample mode, it is (part of) the list
Expand All @@ -217,16 +217,16 @@ class CompletedTask:
Parameters
----------
`worker_id` : int
worker_id : int
Id of the worker that completed the task.
`context_i` : int
context_i : int
Index identifying the callback in the order of parallel callbacks passed to pool.
`scheduled_i` : int
scheduled_i : int
Ordinal of the batch that tasks corresponds to.
`minibatch_i` : int
minibatch_i : int
Computation of batch might be split into number of minibatches, this is the number
that identifies which consecutive part of the batch it is.
`batch_meta` : nvidia.dali._multiproc.shared_batch.SharedBatchMeta
batch_meta : nvidia.dali._multiproc.shared_batch.SharedBatchMeta
Serialized result of the task.
`exception`
Exception if the task failed.
Expand Down
32 changes: 16 additions & 16 deletions dali/python/nvidia/dali/_multiproc/pool.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright (c) 2020-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# Copyright (c) 2020-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -539,13 +539,13 @@ class Observer:
Closes the whole pool of worker processes if any of the processes exits. The processes can also
be closed from the main process by calling observer `close` method.
----------
`mp` : Python's multiprocessing context (depending on start method used: `spawn` or `fork`)
`processes` : List of multiprocessing Process instances
`task_queues` : List[ShmQueue]
mp : Python's multiprocessing context (depending on start method used: `spawn` or `fork`)
processes : List of multiprocessing Process instances
task_queues : List[ShmQueue]
Queues that worker processes take tasks from. If `close` method is called and none of
the processes exited abruptly so far, the queues will be used to notify the workers about
closing to let the workers gracefully exit.
`result_queue` : ShmQueue
result_queue : ShmQueue
Queue where worker processes report completed tasks. It gets closed along with the worker
processes, to prevent the main process blocking on waiting for results from the workers.
"""
Expand Down Expand Up @@ -627,9 +627,9 @@ def __init__(self, contexts: List[CallbackContext], pool: ProcPool):
"""
Parameters
----------
`contexts` : List[CallbackContext]
contexts : List[CallbackContext]
List of callbacks' contexts to be handled by the Worker.
`pool` : ProcPool
pool : ProcPool
ProcPool instance enabling basic communication with worker processes, it should be
initialized with `contexts`.
"""
Expand Down Expand Up @@ -659,21 +659,21 @@ def from_groups(
Parameters
----------
`groups` : _ExternalSourceGroup list
groups : _ExternalSourceGroup list
List of external source groups.
`keep_alive_queue_size` : int
keep_alive_queue_size : int
Number of the most recently produced batches whose underlying shared memory should
remain untouched (because they might still be referenced further in the pipeline).
Note that the actual number of simultaneously kept batches will be greater by the length
of parallel external source prefetching queue which is at least one.
`batch_size` : int, optional
batch_size : int, optional
Maximal batch size. For now, used only to estimate initial capacity of virtual
memory slots.
`start_method` : str
start_method : str
Method of starting worker processes, either fork or spawn.
`num_workers` : int
num_workers : int
Number of workers to be created in ProcPool.
`min_initial_chunk_size` : int
min_initial_chunk_size : int
Minimal initial size of each shared memory chunk.
NOTE it must be enough to accommodate serialized `ScheduledTask` instance.
"""
Expand Down Expand Up @@ -764,10 +764,10 @@ def schedule_batch(self, context_i, work_batch: TaskArgs):
Parameters
----------
`context_i` : int
context_i : int
Specifies which callback will be used to run the task, it must be the index
corresponding to the order of callbacks passed when constructing WorkerPool.
`work_batch` : TaskArgs
work_batch : TaskArgs
Wrapper around parameters produced by the ExternalSource describing the next batch.
"""
context = self.contexts[context_i]
Expand Down Expand Up @@ -834,7 +834,7 @@ def receive_batch(self, context_i):
Parameters
----------
`context_i` : int
context_i : int
Specifies which callback you want the results from, ordering corresponds to the order of
callbacks passed when constructing the pool.
"""
Expand Down
12 changes: 6 additions & 6 deletions dali/python/nvidia/dali/_multiproc/shared_mem.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright (c) 2020-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# Copyright (c) 2020-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -30,9 +30,9 @@ class SharedMem:
Parameters
----------
`handle` : int
handle : int
Handle identifying related shared memory object. Pass None to allocate new memory chunk.
`size` : int
size : int
When handle=None it is the size of shared memory to allocate in bytes, otherwise it must be
the size of shared memory objects that provided handle represents.
"""
Expand All @@ -59,7 +59,7 @@ def allocate(cls, size):
Parameters
----------
`size` : int
size : int
Number of bytes to allocate.
"""
return cls(None, size)
Expand All @@ -71,9 +71,9 @@ def open(cls, handle, size):
Parameters
----------
`handle`: int
handle : int
Handle pointing to already existing shared memory chunk.
`size` : int
size : int
Size of the existing shared memory chunk.
"""
instance = cls(handle, size)
Expand Down
34 changes: 17 additions & 17 deletions dali/python/nvidia/dali/external_source.py
Original file line number Diff line number Diff line change
Expand Up @@ -319,7 +319,7 @@ class ExternalSource:
Args
----
`source` : callable or iterable
source : callable or iterable
The source of the data.
The source is polled for data (via a call ``source()`` or ``next(source)``)
Expand Down Expand Up @@ -369,7 +369,7 @@ class ExternalSource:
(accepting :class:`nvidia.dali.types.SampleInfo`, :class:`nvidia.dali.types.BatchInfo`
or batch index) will be resumed from the epoch and iteration saved in the checkpoint.
`num_outputs` : int, optional
num_outputs : int, optional
If specified, denotes the number of TensorLists that are produced by the source function.
If set, the operator returns a list of ``DataNode`` objects, otherwise a single ``DataNode``
Expand All @@ -378,7 +378,7 @@ class ExternalSource:
Keyword Args
------------
`cycle`: string or bool, optional
cycle : string or bool, optional
Specifies if and how to cycle through the source.
It can be one of the following values:
Expand All @@ -395,20 +395,20 @@ class ExternalSource:
Specifying ``"raise"`` can be used with DALI iterators to create a notion of epoch.
`name` : str, optional
name : str, optional
The name of the data node.
Used when feeding the data with a call to ``feed_input`` and can be omitted if
the data is provided by ``source``.
`layout` : :ref:`layout str<layout_str_doc>` or list/tuple thereof, optional
layout : :ref:`layout str<layout_str_doc>` or list/tuple thereof, optional
If provided, sets the layout of the data.
When ``num_outputs > 1``, the layout can be a list that contains a distinct layout
for each output. If the list has fewer than ``num_outputs`` elements, only
the first outputs have the layout set, the rest of the outputs don't have a layout set.
`dtype` : `nvidia.dali.types.DALIDataType` or list/tuple thereof, optional
dtype : `nvidia.dali.types.DALIDataType` or list/tuple thereof, optional
Input data type.
When ``num_outputs > 1``, the ``dtype`` can be a list that contains a distinct
Expand All @@ -420,7 +420,7 @@ class ExternalSource:
This argument will be required starting from DALI 2.0.
`ndim` : int or list/tuple thereof, optional
ndim : int or list/tuple thereof, optional
Number of dimensions in the input data.
When ``num_outputs > 1``, the ``ndim`` can be a list that contains a distinct value for each
Expand All @@ -434,7 +434,7 @@ class ExternalSource:
Specifying the input dimensionality will be required starting from DALI 2.0
`cuda_stream` : optional, ``cudaStream_t`` or an object convertible to ``cudaStream_t``,
cuda_stream : optional, ``cudaStream_t`` or an object convertible to ``cudaStream_t``,
such as ``cupy.cuda.Stream`` or ``torch.cuda.Stream``
The CUDA stream is used to copy data to the GPU or from a GPU source.
Expand All @@ -453,19 +453,19 @@ class ExternalSource:
buffer is complete, since there's no way to synchronize with this stream to prevent
overwriting the array with new data in another stream.
`use_copy_kernel` : bool, optional
use_copy_kernel : bool, optional
If set to True, DALI will use a CUDA kernel to feed the data
instead of cudaMemcpyAsync (default).
.. note::
This is applicable only when copying data to and from GPU memory.
`blocking`: bool, optional
blocking : bool, optional
**Advanced** If True, this operator will block until the data is available
(e.g. by calling ``feed_input``). If False, the operator will raise an error,
if the data is not available.
`no_copy` : bool, optional
no_copy : bool, optional
Determines whether DALI should copy the buffer when feed_input is called.
If set to True, DALI passes the user memory directly to the pipeline, instead of copying it.
Expand All @@ -485,20 +485,20 @@ class ExternalSource:
Automatically set to ``True`` when ``parallel=True``
`batch` : bool, optional
batch : bool, optional
If set to True or None, the ``source`` is expected to produce an entire batch at once.
If set to False, the ``source`` is called per-sample.
Setting ``parallel`` to True automatically sets ``batch`` to False if it was not provided.
`batch_info` : bool, optional, default = False
batch_info : bool, optional, default = False
Controls if a callable ``source`` that accepts an argument and returns batches
should receive class:`~nvidia.dali.types.BatchInfo` instance or just an
integer representing the iteration number.
If set to False (the default), only the integer is passed. If ``source`` is not callable,
does not accept arguments or ``batch`` is set to False, setting this flag has no effect.
`parallel` : bool, optional, default = False
parallel : bool, optional, default = False
If set to True, the corresponding pipeline will start a pool of Python workers to run the
callback in parallel. You can specify the number of workers by passing ``py_num_workers``
into pipeline's constructor.
Expand Down Expand Up @@ -566,7 +566,7 @@ class ExternalSource:
Python process, but due to their state it is not possible to calculate more
than one batch at a time.
`repeat_last` : bool, optional, default = False
repeat_last : bool, optional, default = False
.. note::
This is an advanced setting that is usable mainly with Triton Inference Server
with decoupled models.
Expand All @@ -581,11 +581,11 @@ class ExternalSource:
is incompatible with specifying the ``source``, which makes the ``external_source``
operate in "pull" mode.
`prefetch_queue_depth` : int, optional, default = 1
prefetch_queue_depth : int, optional, default = 1
When run in ``parallel=True`` mode, specifies the number of batches to be computed in
advance and stored in the internal buffer, otherwise parameter is ignored.
`bytes_per_sample_hint`: int, optional, default = None
bytes_per_sample_hint : int, optional, default = None
If specified in ``parallel=True`` mode, the value serves as a hint when
calculating initial capacity of shared memory slots used by the worker
processes to pass parallel external source outputs to the pipeline. The argument
Expand Down
Loading

0 comments on commit b543839

Please sign in to comment.