v0.2.0
Metal v0.2.0
Closed issues:
- Threadgroup memory breaks on small datatypes (#26)
- Int64 not supported on AMD GPUs? (#38)
- Base.unsafe_convert is ambiguous (#42)
- Support for multiple devices (#44)
- Add CITATION file (#55)
- XGBoost on Metal.jl (#82)
- first try at metal (#84)
- Copysign intrinsic possibly wrong (#89)
- Metal.jl fails to precompile on Linux (#97)
- Silent failure with unsupported(?) Intel Iris Graphics (#109)
- I have 2 question about Metal.jl and Flux.jl (#110)
Merged pull requests:
- Update manifest (#57) (@github-actions[bot])
- Add GPU profiling capabilities (#58) (@max-Hawkins)
- Automatically detect if we need cmt build from source. (#59) (@maleadt)
- Update manifest (#60) (@github-actions[bot])
- Add queue kernel launch argument (#61) (@tgymnich)
- Update manifest (#63) (@github-actions[bot])
- Switch pipeline to juliaecosystem (#64) (@vchuravy)
- Update manifest (#65) (@github-actions[bot])
- Add a function for setting the current device (#66) (@maxwindiff)
- Add documentation webpage (#67) (@max-Hawkins)
- Wrap simdgroup matrix functions (#70) (@maxwindiff)
- Support loading/saving simdgroup matrix from threadgroup memory (#71) (@maxwindiff)
- Conditionalize the MtlDeviceArray element-type workaround. (#72) (@maleadt)
- Add basic SIMD shuffle up/down (#73) (@max-Hawkins)
- Update manifest (#74) (@github-actions[bot])
- Optimize warp reduction for mapreduce (#75) (@max-Hawkins)
- Specialize GPUArrays.global_index() to improve broadcast performance (#76) (@maxwindiff)
- Update manifest (#78) (@github-actions[bot])
- Add initial performance shader support (matmul) (#80) (@max-Hawkins)
- Use Ninja to build cmt. (#81) (@maleadt)
- Update manifest (#83) (@github-actions[bot])
- Support Julia 1.9 (#85) (@maleadt)
- Add queue parameter to unsafe_copyto (#88) (@tgymnich)
- Update manifest (#91) (@github-actions[bot])
- Add MPS tests. (#92) (@maleadt)
- Support for writing binary archives (#94) (@maleadt)
- Support precompilation and loading on non-Apple hardware (#98) (@maleadt)
- Update manifest (#99) (@github-actions[bot])
- Improve reduce performance by passing CartesianIndices and length statically (#100) (@maxwindiff)
- Do not release objects that are autoreleased. (#102) (@habemus-papadum)
- Fix path the cmt in Hacking Section of the Readme (#105) (@habemus-papadum)
- Add example showing Metal and Gtk4 integration (#106) (@habemus-papadum)
- Fix memory leak. (#107) (@habemus-papadum)
- Add a mtl function for simple recursive data conversions. (#114) (@maleadt)
- Write profile trace in the current folder. (#115) (@maleadt)