Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add nextafter intrinsic #529

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft

Add nextafter intrinsic #529

wants to merge 1 commit into from

Conversation

christiangnrd
Copy link
Contributor

Copy link
Contributor

github-actions bot commented Jan 29, 2025

Your PR requires formatting changes to meet the project's style guidelines.
Please consider running Runic (git runic main) to apply these changes.

Click here to view the suggested changes.
diff --git a/test/device/intrinsics.jl b/test/device/intrinsics.jl
index 7bd537f9..bd67f7ef 100644
--- a/test/device/intrinsics.jl
+++ b/test/device/intrinsics.jl
@@ -358,23 +358,23 @@ end
     end
 
 
-    let # nextafter
-        if Metal.is_macos(v"14")
-            N = 4
-            function nextafter_test(X, y)
-                idx = thread_position_in_grid_1d()
-                X[idx] = Metal.nextafter(X[idx], y)
-                return nothing
-            end
-            arr = rand(T, N)
-            buffer = MtlArray(arr)
-            Metal.@sync @metal threads = N nextafter_test(buffer, typemax(T))
-            @test Array(buffer) == nextfloat.(arr)
+        let # nextafter
+            if Metal.is_macos(v"14")
+                N = 4
+                function nextafter_test(X, y)
+                    idx = thread_position_in_grid_1d()
+                    X[idx] = Metal.nextafter(X[idx], y)
+                    return nothing
+                end
+                arr = rand(T, N)
+                buffer = MtlArray(arr)
+                Metal.@sync @metal threads = N nextafter_test(buffer, typemax(T))
+                @test Array(buffer) == nextfloat.(arr)
 
-            Metal.@sync @metal threads = N nextafter_test(buffer, typemin(T))
-            @test Array(buffer) == arr
+                Metal.@sync @metal threads = N nextafter_test(buffer, typemin(T))
+                @test Array(buffer) == arr
+            end
         end
-    end
 end
 end
 

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Metal Benchmarks

Benchmark suite Current: 987b73b Previous: ca092c8 Ratio
private array/construct 28316 ns 24829.916666666664 ns 1.14
private array/broadcast 464708.5 ns 458500 ns 1.01
private array/random/randn/Float32 791875 ns 798750 ns 0.99
private array/random/randn!/Float32 637458 ns 615041.5 ns 1.04
private array/random/rand!/Int64 577500 ns 563000 ns 1.03
private array/random/rand!/Float32 596000 ns 598021 ns 1.00
private array/random/rand/Int64 775709 ns 774083 ns 1.00
private array/random/rand/Float32 572500 ns 611583 ns 0.94
private array/copyto!/gpu_to_gpu 640125 ns 654250 ns 0.98
private array/copyto!/cpu_to_gpu 814562.5 ns 624208 ns 1.30
private array/copyto!/gpu_to_cpu 662959 ns 817708 ns 0.81
private array/accumulate/1d 1341000.5 ns 1329333 ns 1.01
private array/accumulate/2d 1411083 ns 1398375 ns 1.01
private array/iteration/findall/int 2085458 ns 2103583.5 ns 0.99
private array/iteration/findall/bool 1815291.5 ns 1824375 ns 1.00
private array/iteration/findfirst/int 1697833 ns 1688792 ns 1.01
private array/iteration/findfirst/bool 1665208 ns 1643000 ns 1.01
private array/iteration/scalar 3153250 ns 3772458 ns 0.84
private array/iteration/logical 3199875 ns 3187749.5 ns 1.00
private array/iteration/findmin/1d 1756291 ns 1760708 ns 1.00
private array/iteration/findmin/2d 1338625 ns 1344437.5 ns 1.00
private array/reductions/reduce/1d 1047125 ns 1031583 ns 1.02
private array/reductions/reduce/2d 662625 ns 654750 ns 1.01
private array/reductions/mapreduce/1d 1049625 ns 1033875 ns 1.02
private array/reductions/mapreduce/2d 664292 ns 659000 ns 1.01
private array/permutedims/4d 2526875.5 ns 2503500 ns 1.01
private array/permutedims/2d 1033000 ns 1028750 ns 1.00
private array/permutedims/3d 1624958 ns 1580708 ns 1.03
private array/copy 563083 ns 590270.5 ns 0.95
latency/precompile 8835470833 ns 8811389416 ns 1.00
latency/ttfp 3607498375 ns 3608628500 ns 1.00
latency/import 1230395917 ns 1231898292 ns 1.00
integration/metaldevrt 710709 ns 713792 ns 1.00
integration/byval/slices=1 1488979.5 ns 1617854.5 ns 0.92
integration/byval/slices=3 9653167 ns 9687812.5 ns 1.00
integration/byval/reference 1546250.5 ns 1589625 ns 0.97
integration/byval/slices=2 2711292 ns 2675542 ns 1.01
kernel/indexing 484333 ns 470792 ns 1.03
kernel/indexing_checked 473417 ns 463208 ns 1.02
kernel/launch 7917 ns 9527.666666666666 ns 0.83
metal/synchronization/stream 14708.5 ns 15125 ns 0.97
metal/synchronization/context 15000 ns 14834 ns 1.01
shared array/construct 26336.75 ns 24604.166666666668 ns 1.07
shared array/broadcast 457916 ns 461166 ns 0.99
shared array/random/randn/Float32 765895.5 ns 738958.5 ns 1.04
shared array/random/randn!/Float32 629917 ns 633292 ns 0.99
shared array/random/rand!/Int64 573375 ns 561625 ns 1.02
shared array/random/rand!/Float32 590250 ns 600416 ns 0.98
shared array/random/rand/Int64 758166.5 ns 778375 ns 0.97
shared array/random/rand/Float32 575978.5 ns 616000 ns 0.94
shared array/copyto!/gpu_to_gpu 82917 ns 79250 ns 1.05
shared array/copyto!/cpu_to_gpu 82917 ns 82084 ns 1.01
shared array/copyto!/gpu_to_cpu 82167 ns 82750 ns 0.99
shared array/accumulate/1d 1336312 ns 1335833 ns 1.00
shared array/accumulate/2d 1364459 ns 1388833 ns 0.98
shared array/iteration/findall/int 1831542 ns 1871833 ns 0.98
shared array/iteration/findall/bool 1579875 ns 1569500 ns 1.01
shared array/iteration/findfirst/int 1380791 ns 1396916 ns 0.99
shared array/iteration/findfirst/bool 1369250 ns 1367500 ns 1.00
shared array/iteration/scalar 156750 ns 154834 ns 1.01
shared array/iteration/logical 2966333.5 ns 2987020.5 ns 0.99
shared array/iteration/findmin/1d 1469833 ns 1477062.5 ns 1.00
shared array/iteration/findmin/2d 1359604 ns 1364708 ns 1.00
shared array/reductions/reduce/1d 733771 ns 731750 ns 1.00
shared array/reductions/reduce/2d 668542 ns 666250 ns 1.00
shared array/reductions/mapreduce/1d 739792 ns 736667 ns 1.00
shared array/reductions/mapreduce/2d 666958.5 ns 672459 ns 0.99
shared array/permutedims/4d 2525854.5 ns 2493333 ns 1.01
shared array/permutedims/2d 1006499.5 ns 1024646 ns 0.98
shared array/permutedims/3d 1614312 ns 1576667 ns 1.02
shared array/copy 246541 ns 244000 ns 1.01

This comment was automatically generated by workflow using github-action-benchmark.

@@ -274,6 +274,11 @@ end
@device_override Base.trunc(x::Float32) = ccall("extern air.trunc.f32", llvmcall, Cfloat, (Cfloat,), x)
@device_override Base.trunc(x::Float16) = ccall("extern air.trunc.f16", llvmcall, Float16, (Float16,), x)

@static if Metal.is_macos(v"14")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible to target different Metal versions by passing kwargs to @metal, so I'm not sure if we should hard-code the dependency on a specific macOS version. The idea was to do these branches, semantically at least, at run time using metal_version(). The failure branch could then simply error, but maybe we ought to add a GPUCompiler intrinsic that eagerly aborts compilation to facilitate debugging this (which now wouldn't be great either, triggering a MethodError when calling the intrinsic from an unsupported macOS version).

@christiangnrd christiangnrd force-pushed the nextafter branch 3 times, most recently from 54e1adc to 43147a3 Compare February 3, 2025 19:08
@christiangnrd
Copy link
Contributor Author

Error doesn't seem related:

ERROR: SystemError: opening file "/Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-macmini-aarch64-3.0/depots/5cd495a2-4a16-4674-ae02-c839447744bb/compiled/v1.11/Metal/ACDsk_nveup.ji": Permission denied

@christiangnrd christiangnrd marked this pull request as draft February 4, 2025 04:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants