-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add nextafter
intrinsic
#529
base: main
Are you sure you want to change the base?
Conversation
Your PR requires formatting changes to meet the project's style guidelines. Click here to view the suggested changes.diff --git a/test/device/intrinsics.jl b/test/device/intrinsics.jl
index 7bd537f9..bd67f7ef 100644
--- a/test/device/intrinsics.jl
+++ b/test/device/intrinsics.jl
@@ -358,23 +358,23 @@ end
end
- let # nextafter
- if Metal.is_macos(v"14")
- N = 4
- function nextafter_test(X, y)
- idx = thread_position_in_grid_1d()
- X[idx] = Metal.nextafter(X[idx], y)
- return nothing
- end
- arr = rand(T, N)
- buffer = MtlArray(arr)
- Metal.@sync @metal threads = N nextafter_test(buffer, typemax(T))
- @test Array(buffer) == nextfloat.(arr)
+ let # nextafter
+ if Metal.is_macos(v"14")
+ N = 4
+ function nextafter_test(X, y)
+ idx = thread_position_in_grid_1d()
+ X[idx] = Metal.nextafter(X[idx], y)
+ return nothing
+ end
+ arr = rand(T, N)
+ buffer = MtlArray(arr)
+ Metal.@sync @metal threads = N nextafter_test(buffer, typemax(T))
+ @test Array(buffer) == nextfloat.(arr)
- Metal.@sync @metal threads = N nextafter_test(buffer, typemin(T))
- @test Array(buffer) == arr
+ Metal.@sync @metal threads = N nextafter_test(buffer, typemin(T))
+ @test Array(buffer) == arr
+ end
end
- end
end
end
|
2569fe1
to
18279c9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Metal Benchmarks
Benchmark suite | Current: 987b73b | Previous: ca092c8 | Ratio |
---|---|---|---|
private array/construct |
28316 ns |
24829.916666666664 ns |
1.14 |
private array/broadcast |
464708.5 ns |
458500 ns |
1.01 |
private array/random/randn/Float32 |
791875 ns |
798750 ns |
0.99 |
private array/random/randn!/Float32 |
637458 ns |
615041.5 ns |
1.04 |
private array/random/rand!/Int64 |
577500 ns |
563000 ns |
1.03 |
private array/random/rand!/Float32 |
596000 ns |
598021 ns |
1.00 |
private array/random/rand/Int64 |
775709 ns |
774083 ns |
1.00 |
private array/random/rand/Float32 |
572500 ns |
611583 ns |
0.94 |
private array/copyto!/gpu_to_gpu |
640125 ns |
654250 ns |
0.98 |
private array/copyto!/cpu_to_gpu |
814562.5 ns |
624208 ns |
1.30 |
private array/copyto!/gpu_to_cpu |
662959 ns |
817708 ns |
0.81 |
private array/accumulate/1d |
1341000.5 ns |
1329333 ns |
1.01 |
private array/accumulate/2d |
1411083 ns |
1398375 ns |
1.01 |
private array/iteration/findall/int |
2085458 ns |
2103583.5 ns |
0.99 |
private array/iteration/findall/bool |
1815291.5 ns |
1824375 ns |
1.00 |
private array/iteration/findfirst/int |
1697833 ns |
1688792 ns |
1.01 |
private array/iteration/findfirst/bool |
1665208 ns |
1643000 ns |
1.01 |
private array/iteration/scalar |
3153250 ns |
3772458 ns |
0.84 |
private array/iteration/logical |
3199875 ns |
3187749.5 ns |
1.00 |
private array/iteration/findmin/1d |
1756291 ns |
1760708 ns |
1.00 |
private array/iteration/findmin/2d |
1338625 ns |
1344437.5 ns |
1.00 |
private array/reductions/reduce/1d |
1047125 ns |
1031583 ns |
1.02 |
private array/reductions/reduce/2d |
662625 ns |
654750 ns |
1.01 |
private array/reductions/mapreduce/1d |
1049625 ns |
1033875 ns |
1.02 |
private array/reductions/mapreduce/2d |
664292 ns |
659000 ns |
1.01 |
private array/permutedims/4d |
2526875.5 ns |
2503500 ns |
1.01 |
private array/permutedims/2d |
1033000 ns |
1028750 ns |
1.00 |
private array/permutedims/3d |
1624958 ns |
1580708 ns |
1.03 |
private array/copy |
563083 ns |
590270.5 ns |
0.95 |
latency/precompile |
8835470833 ns |
8811389416 ns |
1.00 |
latency/ttfp |
3607498375 ns |
3608628500 ns |
1.00 |
latency/import |
1230395917 ns |
1231898292 ns |
1.00 |
integration/metaldevrt |
710709 ns |
713792 ns |
1.00 |
integration/byval/slices=1 |
1488979.5 ns |
1617854.5 ns |
0.92 |
integration/byval/slices=3 |
9653167 ns |
9687812.5 ns |
1.00 |
integration/byval/reference |
1546250.5 ns |
1589625 ns |
0.97 |
integration/byval/slices=2 |
2711292 ns |
2675542 ns |
1.01 |
kernel/indexing |
484333 ns |
470792 ns |
1.03 |
kernel/indexing_checked |
473417 ns |
463208 ns |
1.02 |
kernel/launch |
7917 ns |
9527.666666666666 ns |
0.83 |
metal/synchronization/stream |
14708.5 ns |
15125 ns |
0.97 |
metal/synchronization/context |
15000 ns |
14834 ns |
1.01 |
shared array/construct |
26336.75 ns |
24604.166666666668 ns |
1.07 |
shared array/broadcast |
457916 ns |
461166 ns |
0.99 |
shared array/random/randn/Float32 |
765895.5 ns |
738958.5 ns |
1.04 |
shared array/random/randn!/Float32 |
629917 ns |
633292 ns |
0.99 |
shared array/random/rand!/Int64 |
573375 ns |
561625 ns |
1.02 |
shared array/random/rand!/Float32 |
590250 ns |
600416 ns |
0.98 |
shared array/random/rand/Int64 |
758166.5 ns |
778375 ns |
0.97 |
shared array/random/rand/Float32 |
575978.5 ns |
616000 ns |
0.94 |
shared array/copyto!/gpu_to_gpu |
82917 ns |
79250 ns |
1.05 |
shared array/copyto!/cpu_to_gpu |
82917 ns |
82084 ns |
1.01 |
shared array/copyto!/gpu_to_cpu |
82167 ns |
82750 ns |
0.99 |
shared array/accumulate/1d |
1336312 ns |
1335833 ns |
1.00 |
shared array/accumulate/2d |
1364459 ns |
1388833 ns |
0.98 |
shared array/iteration/findall/int |
1831542 ns |
1871833 ns |
0.98 |
shared array/iteration/findall/bool |
1579875 ns |
1569500 ns |
1.01 |
shared array/iteration/findfirst/int |
1380791 ns |
1396916 ns |
0.99 |
shared array/iteration/findfirst/bool |
1369250 ns |
1367500 ns |
1.00 |
shared array/iteration/scalar |
156750 ns |
154834 ns |
1.01 |
shared array/iteration/logical |
2966333.5 ns |
2987020.5 ns |
0.99 |
shared array/iteration/findmin/1d |
1469833 ns |
1477062.5 ns |
1.00 |
shared array/iteration/findmin/2d |
1359604 ns |
1364708 ns |
1.00 |
shared array/reductions/reduce/1d |
733771 ns |
731750 ns |
1.00 |
shared array/reductions/reduce/2d |
668542 ns |
666250 ns |
1.00 |
shared array/reductions/mapreduce/1d |
739792 ns |
736667 ns |
1.00 |
shared array/reductions/mapreduce/2d |
666958.5 ns |
672459 ns |
0.99 |
shared array/permutedims/4d |
2525854.5 ns |
2493333 ns |
1.01 |
shared array/permutedims/2d |
1006499.5 ns |
1024646 ns |
0.98 |
shared array/permutedims/3d |
1614312 ns |
1576667 ns |
1.02 |
shared array/copy |
246541 ns |
244000 ns |
1.01 |
This comment was automatically generated by workflow using github-action-benchmark.
@@ -274,6 +274,11 @@ end | |||
@device_override Base.trunc(x::Float32) = ccall("extern air.trunc.f32", llvmcall, Cfloat, (Cfloat,), x) | |||
@device_override Base.trunc(x::Float16) = ccall("extern air.trunc.f16", llvmcall, Float16, (Float16,), x) | |||
|
|||
@static if Metal.is_macos(v"14") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's possible to target different Metal versions by passing kwargs to @metal
, so I'm not sure if we should hard-code the dependency on a specific macOS version. The idea was to do these branches, semantically at least, at run time using metal_version()
. The failure branch could then simply error
, but maybe we ought to add a GPUCompiler intrinsic that eagerly aborts compilation to facilitate debugging this (which now wouldn't be great either, triggering a MethodError when calling the intrinsic from an unsupported macOS version).
54e1adc
to
43147a3
Compare
Error doesn't seem related:
|
43147a3
to
987b73b
Compare
987b73b
to
9eff7d2
Compare
#373