Releases · JuliaGPU/Metal.jl

22 May 14:25

github-actions

v0.4.0

6047d50

v0.4.0

Metal v0.4.0

Diff since v0.3.0

Closed issues:

Restore mtlcall (#17)
mapreduce has poor performance (#87)
Native code reflection (#95)
rand! with Bools sometimes fails in tests in 1.9 (#141)
LLVM assertion failures (#153)
Time macro similar to CUDA.@time (#160)
bug in rand!? (#162)
Why not support threadIdx().x, blockIdx().x, blockDim().x etc? (#163)
Incorrect(?) darwin version in 1.8 with Metal.versioninfo() (#179)

Merged pull requests:

Add native code reflection. (#96) (@maleadt)
Move MPSKernels into a dedicated file (#155) (@tgymnich)
[LU decomposition] Fix types (#156) (@tgymnich)
Update manifest (#161) (@github-actions[bot])
Implement Time macro (#164) (@christiangnrd)
Fix some references to CUDA (#165) (@christiangnrd)
Fix GPUArrays RNG interface implementation. (#166) (@maleadt)
Bump the LLVM back-end. (#169) (@maleadt)
Update manifest (#170) (@github-actions[bot])
Update manifest (#171) (@github-actions[bot])
Update manifest (#172) (@github-actions[bot])
Bump GPUCompiler to v0.20 (#173) (@christiangnrd)
Detect mapreduce threadgroup limits instead of guessing. (#176) (@maleadt)
Remove reference to no longer used library in README.md (#177) (@christiangnrd)
Report package versions as part of versioninfo() (#180) (@christiangnrd)
Fix Darwin version indentification (#181) (@christiangnrd)
Topk for MPSMatrix (#182) (@christiangnrd)
Update manifest (#183) (@github-actions[bot])
Don't rely on thread adoption for command buffer callbacks. (#184) (@maleadt)

Contributors

time, maleadt, and 2 other contributors

Assets 2

31 Mar 14:07

github-actions

v0.3.0

20ba6a4

v0.3.0

Metal v0.3.0

Diff since v0.2.0

Closed issues:

Migrate to metal C++? (#2)
Improved errors when calling device functions on CPU (#90)
Improve Objective-C interfacing (#104)
Rename grid to groups (#116)
Add functionality check helper (#121)
inputing non-isbits types (#128)
@metal docstring out-of-date (#129)
mapreduce kernel uses too many threads (#132)
Powers don't work with complex floats (#142)

Merged pull requests:

Add contributing documentation (#93) (@max-Hawkins)
Reduce multiple consecutive values in each thread to improve efficiency (#112) (@maxwindiff)
Remove libcmt, use native ObjectiveC FFI (#117) (@maleadt)
Rename grid to groups (#119) (@habemus-papadum)
Audit MRR (#122) (@maleadt)
Faster in-place reduction by using broadcasting to initialize partial… (#123) (@maxwindiff)
Add MPS matrix decompositions (#124) (@tgymnich)
Minor documentation formatting (#125) (@asinghvi17)
Switch default mode to private storage (#126) (@christiangnrd)
Update manifest (#127) (@github-actions[bot])
Add some MtlArray docs (#130) (@christiangnrd)
Port MetalKernels (#131) (@maxwindiff)
Adapt to GPUCompiler 0.18. (#134) (@maleadt)
Support passing non-isbits arguments, as long as they're unused. (#135) (@maleadt)
Do not change grain size after pipeline creation (#136) (@maxwindiff)
Bump GPUArrays. (#137) (@maleadt)
Specialize GPUArrays' global_size query. (#139) (@maleadt)
Catch errors that happen during command buffer callbacks. (#140) (@maleadt)
Call the correct current_device() in reflection (#143) (@maxwindiff)
Error when calling device functions on CPU (#144) (@christiangnrd)
Implement MTLGPUFamily and use it to validate gpu (#146) (@christiangnrd)
Add functional() (#147) (@christiangnrd)
Update manifest (#148) (@github-actions[bot])
CompatHelper: add new compat entry for StaticArrays at version 1, (keep existing compat) (#151) (@github-actions[bot])
Update to LLVM.jl 5 and GPUCompiler 0.19. (#154) (@maleadt)

Contributors

maxwindiff, maleadt, and 6 other contributors

Assets 2

03 Mar 13:00

github-actions

v0.2.0

fdab277

v0.2.0

Metal v0.2.0

Diff since v0.1.2

Closed issues:

Threadgroup memory breaks on small datatypes (#26)
Int64 not supported on AMD GPUs? (#38)
Base.unsafe_convert is ambiguous (#42)
Support for multiple devices (#44)
Add CITATION file (#55)
XGBoost on Metal.jl (#82)
first try at metal (#84)
Copysign intrinsic possibly wrong (#89)
Metal.jl fails to precompile on Linux (#97)
Silent failure with unsupported(?) Intel Iris Graphics (#109)
I have 2 question about Metal.jl and Flux.jl (#110)

Merged pull requests:

Update manifest (#57) (@github-actions[bot])
Add GPU profiling capabilities (#58) (@max-Hawkins)
Automatically detect if we need cmt build from source. (#59) (@maleadt)
Update manifest (#60) (@github-actions[bot])
Add queue kernel launch argument (#61) (@tgymnich)
Update manifest (#63) (@github-actions[bot])
Switch pipeline to juliaecosystem (#64) (@vchuravy)
Update manifest (#65) (@github-actions[bot])
Add a function for setting the current device (#66) (@maxwindiff)
Add documentation webpage (#67) (@max-Hawkins)
Wrap simdgroup matrix functions (#70) (@maxwindiff)
Support loading/saving simdgroup matrix from threadgroup memory (#71) (@maxwindiff)
Conditionalize the MtlDeviceArray element-type workaround. (#72) (@maleadt)
Add basic SIMD shuffle up/down (#73) (@max-Hawkins)
Update manifest (#74) (@github-actions[bot])
Optimize warp reduction for mapreduce (#75) (@max-Hawkins)
Specialize GPUArrays.global_index() to improve broadcast performance (#76) (@maxwindiff)
Update manifest (#78) (@github-actions[bot])
Add initial performance shader support (matmul) (#80) (@max-Hawkins)
Use Ninja to build cmt. (#81) (@maleadt)
Update manifest (#83) (@github-actions[bot])
Support Julia 1.9 (#85) (@maleadt)
Add queue parameter to unsafe_copyto (#88) (@tgymnich)
Update manifest (#91) (@github-actions[bot])
Add MPS tests. (#92) (@maleadt)
Support for writing binary archives (#94) (@maleadt)
Support precompilation and loading on non-Apple hardware (#98) (@maleadt)
Update manifest (#99) (@github-actions[bot])
Improve reduce performance by passing CartesianIndices and length statically (#100) (@maxwindiff)
Do not release objects that are autoreleased. (#102) (@habemus-papadum)
Fix path the cmt in Hacking Section of the Readme (#105) (@habemus-papadum)
Add example showing Metal and Gtk4 integration (#106) (@habemus-papadum)
Fix memory leak. (#107) (@habemus-papadum)
Add a mtl function for simple recursive data conversions. (#114) (@maleadt)
Write profile trace in the current folder. (#115) (@maleadt)