[FEATURE] Support for nvCOMP batch API #248

Alexey-Kamenev · 2023-06-27T21:56:54Z

This feature adds support for nvCOMP batch/low-level API which allows to process multiple chunks in parallel.

The proposed implementation provides an easy way to use the API via well-known numcodecs Codec API. Using numcodecs also enables seamless integration with libraries such as zarr that use numcodecs internally.

Additionally, using nvCOMP batch API enables interoperability between existing codecs and nvCOMP batch codec. For example, the data can be compressed on CPU using default LZ4 codec and then decompressed on GPU using proposed nvCOMP batch codec.

To support batch mode, Codec interface was extended with functions, encode_batch and decode_batch, which implement batch mode.

Note that the current version of zarr does not support chunk-parallel functionality, but there is a proposal for this feature.

Currently the following compression/decompression algorithms are supported:

LZ4
Gdeflate
zstd
Snappy

nvCOMP also supports other algorithms which can be relatively easily added to kvikio.

Example of usage:

Simple use of Codec batch API:

import numcodecs
import numpy as np

# Get the codec from the numcodecs registry.
codec = numcodecs.registry.get_codec(dict(id="nvcomp_batch", algorithm="lz4"))

# Creater 2 chunks. The chunks do not have to be the same size.
shape = (4, 8)
chunk1, chunk2 = np.random.randn(2, *shape).astype(np.float32)

# Compress data.
data_comp = codec.encode_batch([chunk1, chunk2])

# Decompress.
data_decomp = codec.decode_batch(data_comp)

# Verify.
np.testing.assert_equal(data_decomp[0].view(np.float32).reshape(shape), chunk1)
np.testing.assert_equal(data_decomp[1].view(np.float32).reshape(shape), chunk2)

Using with zarr (no parallel chunking yet - see the note above).

import numcodecs
import numpy as np
import zarr

# Get the codec from the numcodecs registry.
codec = numcodecs.registry.get_codec(dict(id="nvcomp_batch", algorithm="lz4"))
shape = (16, 16)
chunks = (8, 8)

# Create data and compress.
data = np.random.randn(*shape).astype(np.float32)
z1 = zarr.array(data, chunks=chunks, compressor=codec)

# Store in compressed format.
zarr_store = zarr.MemoryStore()
zarr.save_array(zarr_store, z1, compressor=codec)

# Read back/decompress.
z2 = zarr.open_array(zarr_store)

np.testing.assert_equal(z1[:], z2[:])

If desired, the API can also be used directly, without having to use numcodecs API.

The text was updated successfully, but these errors were encountered:

See #248 for more details. Authors: - Alexey Kamenev (https://github.com/Alexey-Kamenev) Approvers: - Mads R. B. Kristensen (https://github.com/madsbk) URL: #249

Alexey-Kamenev mentioned this issue Jun 27, 2023

Add support for nvCOMP batch API #249

Merged

madsbk mentioned this issue Jun 29, 2023

A context for Codec encode_batch and decode_batch #251

Closed

rapids-bot bot pushed a commit that referenced this issue Jul 3, 2023

Add support for nvCOMP batch API (#249)

9e004ce

See #248 for more details. Authors: - Alexey Kamenev (https://github.com/Alexey-Kamenev) Approvers: - Mads R. B. Kristensen (https://github.com/madsbk) URL: #249

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Support for nvCOMP batch API #248

[FEATURE] Support for nvCOMP batch API #248

Alexey-Kamenev commented Jun 27, 2023 •

edited

Loading

[FEATURE] Support for nvCOMP batch API #248

[FEATURE] Support for nvCOMP batch API #248

Comments

Alexey-Kamenev commented Jun 27, 2023 • edited Loading

Alexey-Kamenev commented Jun 27, 2023 •

edited

Loading