Releases: JuliaGPU/Metal.jl
Releases · JuliaGPU/Metal.jl
v0.4.0
Metal v0.4.0
Closed issues:
- Restore mtlcall (#17)
- mapreduce has poor performance (#87)
- Native code reflection (#95)
- rand! with Bools sometimes fails in tests in 1.9 (#141)
- LLVM assertion failures (#153)
- Time macro similar to CUDA.@time (#160)
- bug in rand!? (#162)
- Why not support threadIdx().x, blockIdx().x, blockDim().x etc? (#163)
- Incorrect(?) darwin version in 1.8 with
Metal.versioninfo()
(#179)
Merged pull requests:
- Add native code reflection. (#96) (@maleadt)
- Move MPSKernels into a dedicated file (#155) (@tgymnich)
- [LU decomposition] Fix types (#156) (@tgymnich)
- Update manifest (#161) (@github-actions[bot])
- Implement Time macro (#164) (@christiangnrd)
- Fix some references to CUDA (#165) (@christiangnrd)
- Fix GPUArrays RNG interface implementation. (#166) (@maleadt)
- Bump the LLVM back-end. (#169) (@maleadt)
- Update manifest (#170) (@github-actions[bot])
- Update manifest (#171) (@github-actions[bot])
- Update manifest (#172) (@github-actions[bot])
- Bump GPUCompiler to v0.20 (#173) (@christiangnrd)
- Detect mapreduce threadgroup limits instead of guessing. (#176) (@maleadt)
- Remove reference to no longer used library in README.md (#177) (@christiangnrd)
- Report package versions as part of versioninfo() (#180) (@christiangnrd)
- Fix Darwin version indentification (#181) (@christiangnrd)
- Topk for MPSMatrix (#182) (@christiangnrd)
- Update manifest (#183) (@github-actions[bot])
- Don't rely on thread adoption for command buffer callbacks. (#184) (@maleadt)
v0.3.0
Metal v0.3.0
Closed issues:
- Migrate to metal C++? (#2)
- Improved errors when calling device functions on CPU (#90)
- Improve Objective-C interfacing (#104)
- Rename
grid
togroups
(#116) - Add functionality check helper (#121)
- inputing non-isbits types (#128)
- @metal docstring out-of-date (#129)
- mapreduce kernel uses too many threads (#132)
- Powers don't work with complex floats (#142)
Merged pull requests:
- Add contributing documentation (#93) (@max-Hawkins)
- Reduce multiple consecutive values in each thread to improve efficiency (#112) (@maxwindiff)
- Remove libcmt, use native ObjectiveC FFI (#117) (@maleadt)
- Rename grid to groups (#119) (@habemus-papadum)
- Audit MRR (#122) (@maleadt)
- Faster in-place reduction by using broadcasting to initialize partial… (#123) (@maxwindiff)
- Add MPS matrix decompositions (#124) (@tgymnich)
- Minor documentation formatting (#125) (@asinghvi17)
- Switch default mode to private storage (#126) (@christiangnrd)
- Update manifest (#127) (@github-actions[bot])
- Add some MtlArray docs (#130) (@christiangnrd)
- Port MetalKernels (#131) (@maxwindiff)
- Adapt to GPUCompiler 0.18. (#134) (@maleadt)
- Support passing non-isbits arguments, as long as they're unused. (#135) (@maleadt)
- Do not change grain size after pipeline creation (#136) (@maxwindiff)
- Bump GPUArrays. (#137) (@maleadt)
- Specialize GPUArrays' global_size query. (#139) (@maleadt)
- Catch errors that happen during command buffer callbacks. (#140) (@maleadt)
- Call the correct current_device() in reflection (#143) (@maxwindiff)
- Error when calling device functions on CPU (#144) (@christiangnrd)
- Implement MTLGPUFamily and use it to validate gpu (#146) (@christiangnrd)
- Add
functional()
(#147) (@christiangnrd) - Update manifest (#148) (@github-actions[bot])
- CompatHelper: add new compat entry for StaticArrays at version 1, (keep existing compat) (#151) (@github-actions[bot])
- Update to LLVM.jl 5 and GPUCompiler 0.19. (#154) (@maleadt)
v0.2.0
Metal v0.2.0
Closed issues:
- Threadgroup memory breaks on small datatypes (#26)
- Int64 not supported on AMD GPUs? (#38)
- Base.unsafe_convert is ambiguous (#42)
- Support for multiple devices (#44)
- Add CITATION file (#55)
- XGBoost on Metal.jl (#82)
- first try at metal (#84)
- Copysign intrinsic possibly wrong (#89)
- Metal.jl fails to precompile on Linux (#97)
- Silent failure with unsupported(?) Intel Iris Graphics (#109)
- I have 2 question about Metal.jl and Flux.jl (#110)
Merged pull requests:
- Update manifest (#57) (@github-actions[bot])
- Add GPU profiling capabilities (#58) (@max-Hawkins)
- Automatically detect if we need cmt build from source. (#59) (@maleadt)
- Update manifest (#60) (@github-actions[bot])
- Add queue kernel launch argument (#61) (@tgymnich)
- Update manifest (#63) (@github-actions[bot])
- Switch pipeline to juliaecosystem (#64) (@vchuravy)
- Update manifest (#65) (@github-actions[bot])
- Add a function for setting the current device (#66) (@maxwindiff)
- Add documentation webpage (#67) (@max-Hawkins)
- Wrap simdgroup matrix functions (#70) (@maxwindiff)
- Support loading/saving simdgroup matrix from threadgroup memory (#71) (@maxwindiff)
- Conditionalize the MtlDeviceArray element-type workaround. (#72) (@maleadt)
- Add basic SIMD shuffle up/down (#73) (@max-Hawkins)
- Update manifest (#74) (@github-actions[bot])
- Optimize warp reduction for mapreduce (#75) (@max-Hawkins)
- Specialize GPUArrays.global_index() to improve broadcast performance (#76) (@maxwindiff)
- Update manifest (#78) (@github-actions[bot])
- Add initial performance shader support (matmul) (#80) (@max-Hawkins)
- Use Ninja to build cmt. (#81) (@maleadt)
- Update manifest (#83) (@github-actions[bot])
- Support Julia 1.9 (#85) (@maleadt)
- Add queue parameter to unsafe_copyto (#88) (@tgymnich)
- Update manifest (#91) (@github-actions[bot])
- Add MPS tests. (#92) (@maleadt)
- Support for writing binary archives (#94) (@maleadt)
- Support precompilation and loading on non-Apple hardware (#98) (@maleadt)
- Update manifest (#99) (@github-actions[bot])
- Improve reduce performance by passing CartesianIndices and length statically (#100) (@maxwindiff)
- Do not release objects that are autoreleased. (#102) (@habemus-papadum)
- Fix path the cmt in Hacking Section of the Readme (#105) (@habemus-papadum)
- Add example showing Metal and Gtk4 integration (#106) (@habemus-papadum)
- Fix memory leak. (#107) (@habemus-papadum)
- Add a mtl function for simple recursive data conversions. (#114) (@maleadt)
- Write profile trace in the current folder. (#115) (@maleadt)
v0.1.2
Metal v0.1.2
Closed issues:
- installation issue (libz.1.dylib not found) [+workaround] (#51)
- Optimally choosing threads and grid (#54)
Merged pull requests:
- Use Base.active_project. (#43) (@maleadt)
- Update manifest (#45) (@github-actions[bot])
- Add aliases MtlVector and MtlMatrix (#48) (@amontoison)
- Update manifest (#49) (@github-actions[bot])
- Wrap at-metal's output in a let block. (#50) (@maleadt)
- Update manifest (#52) (@github-actions[bot])
- Update manifest (#56) (@github-actions[bot])
v0.1.1
Metal v0.1.1
Closed issues:
- Super slow broadcast (#39)
Merged pull requests:
- Fix typos in unified memory example (#37) (@pitmonticone)
- Fix the launch heuristic. (#40) (@maleadt)
v0.1.0
Metal v0.1.0
v0.0.1
Metal v0.0.1
Closed issues:
- error when using (#1)
- Argument buffer encoding is fragile (#5)
- LLVMType of MtlDeviceArray needs changing/manipulation (#6)
- Errors running on M1 Max (#14)
- I get this, my name isn't Tim (#16)
- Thanks for the previous fix - had a go (#18)
- Custom IR verification (#25)
- cmt: Release build fails install (#27)
Merged pull requests:
- Add device_code_metallib macro (#3) (@max-Hawkins)
- Update README (#8) (@max-Hawkins)
- Implement GPUArrays launch heuristic (#9) (@max-Hawkins)
- Add docstrings (#12) (@max-Hawkins)
- Rework metadata generation (#13) (@maleadt)
- Add CI (#19) (@maleadt)
- Use sw_vers to query the macOS version. (#20) (@maleadt)
- Updates for macOS 13 (Ventura); use bindless argument buffers (#23) (@maleadt)
- Enable the GPUArrays test suite (#24) (@maleadt)
- Use cmt from pre-built JLL. (#28) (@maleadt)
- Package updates (#29) (@maleadt)
- First test with a locally-built cmt. (#30) (@maleadt)
- Use labels to determine whether to build local deps. (#31) (@maleadt)
- Bump GPUArrays. (#32) (@maleadt)
- MTL wrapper clean-ups (#33) (@maleadt)