CPU and GPU ctz
implementation (#979)
#727
bench.yml
on: push
bench
3m 24s
bench-windows
4m 16s
bench-macos
7m 45s
bench-arm-linux
9m 12s
bench-arm-macos
2m 1s