GPU implementation of hamming distance #541

felixpetschko · 2024-08-19T18:53:03Z

Hamming distance implementation with numba.cuda for GPU support.
This is built on top of the changes in Hamming distance implementation with Numba #512

@grst:

add to API docs
start small "large dataset tutorial", see also Large dataset tutorial #479
update changelog

for more information, see https://pre-commit.ci

… into numba_hamming

for more information, see https://pre-commit.ci

… into numba_hamming

for more information, see https://pre-commit.ci

…tcrdist distance metrics

… into numba_hamming

for more information, see https://pre-commit.ci

… into numba_hamming

for more information, see https://pre-commit.ci

… into numba_hamming

…curing in all sequences

for more information, see https://pre-commit.ci

…tion

…into gpu_hamming

for more information, see https://pre-commit.ci

…into gpu_hamming

for more information, see https://pre-commit.ci

…ceCalculator` in documentation

…into gpu_hamming

for more information, see https://pre-commit.ci

grst · 2025-02-09T19:35:42Z

src/scirpy/ir_dist/metrics.py

+    gpu_n_blocks:
+        Number of blocks in which the final result matrix should be computed. Each block reserves GPU memory
+        in which the computed result block has to fit in sparse representation. Lower values give better performance
+        but increase the risk of running out of reserved memory. This value should be chosen based on the
+        estimated sparsity of the result matrix and the size of the GPU device memory.
+    gpu_block_width:
+        Maximum width of blocks in which the final result matrix should be computed. Each block reserves GPU memory
+        in which the computed result block has to fit in sparse representation. Higher values allow for a lower
+        number of result blocks (gpu_n_blocks) which increases the performances. This value should be chosen based on
+        the GPU device memory.


Why do we need both? Can't you infer one based on the other?

grst · 2025-02-09T19:36:42Z

src/scirpy/ir_dist/metrics.py

+
+        block_offset = start_column
+
+        print(


I suggest to use logging.info instead of print

grst · 2025-02-09T19:37:33Z

src/scirpy/ir_dist/metrics.py

+            seqs_mat1, seqs_L1 = _seqs2mat_fast(seqs, max_len=max_seq_len)
+            seqs_mat2, seqs_L2 = _seqs2mat_fast(seqs2, max_len=max_seq_len)
+        except UnicodeError:
+            print(


I suggest to use logging.info instead of print

for more information, see https://pre-commit.ci

felixpetschko and others added 30 commits April 29, 2024 13:28

take static methods out of tcrdist

bad62d8

made _tcrdist_mat a normal class method

72565bf

parent method NumbaDistanceCalculator extracted

add8e7f

numba version of hamming distance implemented

e9c0642

hamming numba tests passed and reference test added

68e0493

hamming numba distance calculator implemented and tested

ef0fa7d

n_jobs parameter handling done in NumbaDistanceCalculator superclass

0b15f8b

documentation adapted

46bfc14

removed unnecessary import

e339e14

[pre-commit.ci] auto fixes from pre-commit.com hooks

7da4519

for more information, see https://pre-commit.ci

hamming distance with numba parallelization implemented

82b0259

Merge branch 'numba_hamming' of https://github.com/felixpetschko/scirpy…

b2d28d3

… into numba_hamming

[pre-commit.ci] auto fixes from pre-commit.com hooks

249e626

for more information, see https://pre-commit.ci

imports fixed

2fccc6a

Merge branch 'numba_hamming' of https://github.com/felixpetschko/scirpy…

9ee1a2b

… into numba_hamming

[pre-commit.ci] auto fixes from pre-commit.com hooks

a68ab53

for more information, see https://pre-commit.ci

implemented parallelization with n_jobs and n_blocks for hamming and …

d68a10b

…tcrdist distance metrics

performance optimization for hamming and tcrdist

0005e63

more documentation added

6f16a3e

Merge branch 'numba_hamming' of https://github.com/felixpetschko/scirpy…

6b32311

… into numba_hamming

[pre-commit.ci] auto fixes from pre-commit.com hooks

ad13f52

for more information, see https://pre-commit.ci

documentation adapted

08ad838

Merge branch 'numba_hamming' of https://github.com/felixpetschko/scirpy…

a8d9846

… into numba_hamming

[pre-commit.ci] auto fixes from pre-commit.com hooks

b86030c

for more information, see https://pre-commit.ci

documentation adapted

2fb8254

Merge branch 'numba_hamming' of https://github.com/felixpetschko/scirpy…

bb0f430

… into numba_hamming

signature of _calc_dist_mat_block changed

80ae271

the alphabet for the hamming distance is now the unique characters oc…

91c1dea

…curing in all sequences

[pre-commit.ci] auto fixes from pre-commit.com hooks

899e2eb

for more information, see https://pre-commit.ci

Merge branch 'main' into numba_hamming

a0627b4

felixpetschko and others added 27 commits February 3, 2025 12:44

print library versions for debugging

9864473

[pre-commit.ci] auto fixes from pre-commit.com hooks

4c3f585

for more information, see https://pre-commit.ci

data types updated

9d9da76

refactored

2b8cf41

merge

8783b27

[pre-commit.ci] auto fixes from pre-commit.com hooks

9784d77

for more information, see https://pre-commit.ci

Update meta.yaml

10a7577

Merge branch 'main' into gpu_hamming

8b2d5bf

Update meta.yaml

4b40c08

Update meta.yaml

825775c

removed time measurement prints and added progress bar

2b677a7

removed cuda synchronize statements used for performance testing

f446f82

added parameters gpu_n_blocks and gpu_block_width

d1fc102

added parameter documentation

48a66dc

cleaned up hamming kernel

641985f

catch sequence conversion error by retrying with non ascii implementa…

751387d

…tion

use parameters gpu_n_blocks and gpu_block_width for testing

2d5067a

adapted documentation about n_blocks

231603c

Merge branch 'gpu_hamming' of https://github.com/felixpetschko/scirpy …

b2d0bd6

…into gpu_hamming

[pre-commit.ci] auto fixes from pre-commit.com hooks

9729e10

for more information, see https://pre-commit.ci

adapted error handling for ascii UnicodeError

4508cf5

Merge branch 'gpu_hamming' of https://github.com/felixpetschko/scirpy …

7036ad5

…into gpu_hamming

[pre-commit.ci] auto fixes from pre-commit.com hooks

f0c809c

for more information, see https://pre-commit.ci

removed reference to :class:`~scirpy.ir_dist.metrics.GPUHammingDistan…

571bffc

…ceCalculator` in documentation

Merge branch 'gpu_hamming' of https://github.com/felixpetschko/scirpy …

ad5757e

…into gpu_hamming

[pre-commit.ci] auto fixes from pre-commit.com hooks

2be1e8f

for more information, see https://pre-commit.ci

WIP: update docs

a1d509e

grst reviewed Feb 9, 2025

View reviewed changes

grst and others added 2 commits February 9, 2025 20:40

Update changelog

a788cf3

[pre-commit.ci] auto fixes from pre-commit.com hooks

5e648c5

for more information, see https://pre-commit.ci

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU implementation of hamming distance #541

GPU implementation of hamming distance #541

felixpetschko commented Aug 19, 2024 •

edited by grst

Loading

grst Feb 9, 2025

grst Feb 9, 2025

grst Feb 9, 2025

GPU implementation of hamming distance #541

Are you sure you want to change the base?

GPU implementation of hamming distance #541

Conversation

felixpetschko commented Aug 19, 2024 • edited by grst Loading

grst Feb 9, 2025

Choose a reason for hiding this comment

grst Feb 9, 2025

Choose a reason for hiding this comment

grst Feb 9, 2025

Choose a reason for hiding this comment

felixpetschko commented Aug 19, 2024 •

edited by grst

Loading