Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
🤖 I have created a release *beep* *boop* --- ## [0.0.9](v0.0.8...v0.0.9) (2024-07-12) ### Bugfix * fix the decode kernel segfault in cudagraph mode ([#368](https://github.com/flashinfer-ai/flashinfer/pull/368))([c69cfa](https://github.com/flashinfer-ai/flashinfer/commit/c69cfabc540e4a7edd991713df10d575ff3b0c21)) - fix decode kernels output for empty kv cache ([#363](https://github.com/flashinfer-ai/flashinfer/pull/363))([ac72b1](https://github.com/flashinfer-ai/flashinfer/commit/ac72b1cc14a6474d601f371c8d69e2600ac28d2f)) - check gpu id in PyTorch APIs and use input tensor's gpu default stream ([#361](https://github.com/flashinfer-ai/flashinfer/pull/361))([1b84fa](https://github.com/flashinfer-ai/flashinfer/commit/1b84fab3e4f53fb4fa26952fdb46fa8018634057)) ### Performance Improvements * accelerate alibi ([#365](#365)) ([4f0a9f9](4f0a9f9)) * accelerate gqa performance ([#356](#356)) ([e56ddad](e56ddad)) * Optimize tensor conversions in C++ code to avoid unnecessary copies ([#366](#366)) ([1116237](1116237)) ### Acknowledgement We thank [@Yard1](https://github.com/Yard1), [@Ying1123](https://github.com/Ying1123) and [@zhyncs](https://github.com/zhyncs) for their contributions. --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Zihao Ye <[email protected]>
- Loading branch information