Releases · meta-llama/llama-stack-client-kotlin

12 Dec 20:02

v0.0.58

5ccf74e

v0.0.58 Latest

Latest

In this release, we build upon our major update for the Llama Stack Kotlin Library supporting local and remote inference on Android apps (v0.0.54.1). This update introduces several key features focused around updates to support Llama Stack server v0.0.58, tool calling, and response streaming on remote inference.

Local Inference Support

Single and multiple custom tool calling

Remote Inference Support

Enabled remote support with Llama Stack server v0.0.58
Fix type referencing error in SDK
Response streaming
Custom Tool calling is supported in non-streaming cases but not yet available for streaming cases

Build Support

Modification to build-libs.sh to clean old jars and any other artifacts before a build to avoid confusion.

Supporting

The models supported in the app vary based on whether you are doing remote or local inferencing.

Remote Model Support
For remote usage, the app supports all models that the Llama Stack server backend supports. This includes a range of models from the lightweight 1B Llama models to the extensive 405B models.

Local Model Support
For on-device usage, the following models are supported:

Llama 3.2 Quantized 1B/3B
Llama 3.2 1B/3B in BF16
Llama 3.1 8B Quantized
Llama 3 8B Quantized

Framework: ExecuTorch (commit: 0a12e33)

Getting Started

Pointer to an Android demo app that developers can use to get started (link with tag)
Quick start instructions on how to add the Kotlin SDK to their Android app
Instructions on how a power developer can contribute to the Kotlin SDK, debug or just play with it to learn more.

If you have any questions feel free to raise an issue and we’d be happy to help!

What’s Next?

This is only the beginning with enabling features on Llama Stack to run on Android devices. We will continue to expand the capabilities of Llama Stack, new use cases, and applications! Specifically we look to focus on:

Agentic workflow with streaming
Image and Speech reasoning
Local/on-device agentic components like memory banks
Examples with RAG usage

Stay tuned on future releases and updates!

Contributors

In alphabetical order: @cmodi-meta, @dltn, @Riandy, @WuhanMonkey, @yanxi0830, and big thank you to the ExecuTorch team.

Contributors

Riandy, WuhanMonkey, and 3 other contributors

Assets 2

06 Dec 22:33

cmodi-meta

v0.0.54.1

0a68271

v0.0.54.1

We are excited to announce a major update to the Llama Stack Kotlin Library which now supports both local and remote inferencing on Android apps. Building on the existing remote inferencing capabilities, this release introduces significant changes to enable seamless local inferencing integration and providing developers more flexibility with their AI workflows. This release focuses on delivering features centered around these capabilities.

Release v0.0.54.1 includes local modules as part of the Kotlin Library dependency in Maven

Local Inference Support

Leverage ExecuTorch on-device framework (commit: 0a12e33) for on-device inferencing.
Script for downloading ExecuTorch aar file.
Allow passing various configurations from Android app: .pte and tokenizer file, sequence length, and temperature.
Send stats metrics from ExecuTorch (tok/sec).
Handle prompt formatting based on model.
Support conversational history.

Remote Inference Support

Enabled remote support with Llama Stack server v0.0.54
Fix lib compile issues due to Stainless autogen invalid types (link and link)

Supporting Models

The models supported in the app vary based on whether you are doing remote or local inferencing.

Remote Model Support

For remote usage, the app supports all models that the Llama Stack server backend supports. This includes a range of models from the lightweight 1B Llama models to the extensive 405B models.

Local Model Support

For on-device usage, the following models are supported:

Llama 3.2 Quantized 1B/3B
Llama 3.2 1B/3B in BF16
Llama 3.1 8B Quantized
Llama 3 8B Quantized

Getting Started

Pointer to an Android demo app. (Note tag: android-0.0.54.1)
Quick start instructions on how to add the Kotlin SDK to their Android app
Instructions on how a power developer can contribute to the Kotlin SDK, debug or just play with it to learn more.

If you have any questions feel free to raise an issue and we’d be happy to help!

What’s Next?

Agentic workflow with streaming
Image and Speech reasoning
Local/on-device agentic components like memory banks
Examples with RAG usage

Stay tuned on future releases and updates!

Contributors

@cmodi-meta, @dltn, @Riandy, @WuhanMonkey, @yanxi0830, and big thank you to the ExecuTorch team.

Contributors

Riandy, WuhanMonkey, and 3 other contributors

Assets 2

06 Dec 21:00

cmodi-meta

v0.0.54

c8dbf91

v0.0.54

Local Inference Support

Leverage ExecuTorch on-device framework (commit: 0a12e33) for on-device inferencing.
Script for downloading ExecuTorch aar file.
Allow passing various configurations from Android app: .pte and tokenizer file, sequence length, and temperature.
Send stats metrics from ExecuTorch (tok/sec).
Handle prompt formatting based on model.
Support conversational history.

Remote Inference Support

Enabled remote support with Llama Stack server v0.0.54
Fix lib compile issues due to Stainless autogen invalid types (link and link)

Supporting Models

The models supported in the app vary based on whether you are doing remote or local inferencing.

Remote Model Support

For remote usage, the app supports all models that the Llama Stack server backend supports. This includes a range of models from the lightweight 1B Llama models to the extensive 405B models.

Local Model Support

For on-device usage, the following models are supported:

Llama 3.2 Quantized 1B/3B
Llama 3.2 1B/3B in BF16
Llama 3.1 8B Quantized
Llama 3 8B Quantized

Getting Started

Pointer to an Android demo app
Quick start instructions on how to add the Kotlin SDK to their Android app
Instructions on how a power developer can contribute to the Kotlin SDK, debug or just play with it to learn more.

If you have any questions feel free to raise an issue and we’d be happy to help!

What’s Next?

Agentic workflow with streaming
Image and Speech reasoning
Local/on-device agentic components like memory banks
Examples with RAG usage

Stay tuned on future releases and updates!

Contributors

@cmodi-meta, @dltn, @Riandy, @WuhanMonkey, @yanxi0830, and big thank you to the ExecuTorch team.

Contributors

Riandy, WuhanMonkey, and 3 other contributors

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Local Inference Support

Remote Inference Support

Build Support

Supporting

Getting Started

What’s Next?

Contributors

Contributors

Local Inference Support

Remote Inference Support

Supporting Models

Remote Model Support

Local Model Support

Getting Started

What’s Next?

Contributors

Contributors

Local Inference Support

Remote Inference Support

Supporting Models

Remote Model Support

Local Model Support

Getting Started

What’s Next?

Contributors

Contributors

Releases: meta-llama/llama-stack-client-kotlin

v0.0.58

Local Inference Support

Remote Inference Support

Build Support

Supporting

Getting Started

What’s Next?

Contributors

Contributors

v0.0.54.1

Local Inference Support

Remote Inference Support

Supporting Models

Remote Model Support

Local Model Support

Getting Started

What’s Next?

Contributors

Contributors

v0.0.54

Local Inference Support

Remote Inference Support

Supporting Models

Remote Model Support

Local Model Support

Getting Started

What’s Next?

Contributors

Contributors