High latency #33

ferdinandl007 · 2024-08-21T04:06:10Z

Describe the bug
Inference latency seems to be a lot higher when using LLM Swift compared to when using it through LM Studio
About x2 the latency to 1st token and 5 X latency per token.
To Reproduce
You must include minimal code that can reproduce the behavior, for example:

import SwiftUI
import LLM


class ChatBot: LLM {
    convenience init() {
        let url = Bundle.main.url(forResource: "gemma-2-2b-it-Q8_0", withExtension: "gguf")!
        let systemPrompt = "you are helpful, highly intelligent assistant!"
        self.init(from: url, template: .chatML(systemPrompt))
    }
}

struct ChatView: View {
    @ObservedObject var bot: ChatBot
    @State var input = "Give me seven national flag emojis people use the most; You must include South Korea."
    init(_ bot: ChatBot) { self.bot = bot }
    func respond() { Task { await bot.respond(to: input) } }
    func stop() { bot.stop() }
    var body: some View {
        VStack(alignment: .leading) {
            ScrollView { Text(bot.output).monospaced() }
            Spacer()
            HStack {
                ZStack {
                    RoundedRectangle(cornerRadius: 8).foregroundStyle(.thinMaterial).frame(height: 40)
                    TextField("input", text: $input).padding(8)
                }
                Button(action: respond) { Image(systemName: "paperplane.fill") }
                Button(action: stop) { Image(systemName: "xmark") }
            }
        }.frame(maxWidth: .infinity).padding()
    }
}

Expected behavior
As both run on llama CCP I would expect the latency to be the same

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

Chip: [e.g. Apple M1]
Memory: [e.g. 16GB]
OS: [e.g. macOS 14.0]

Additional context
Try to make the inference settings to be identical as well and it did not help latency was still significantly slower. Am I missing anything here?

ferdinandl007 added the bug unconfirmed label Aug 21, 2024

ferdinandl007 assigned eastriverlee Aug 21, 2024

eastriverlee added question Further information is requested and removed bug unconfirmed labels Oct 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High latency #33

High latency #33

ferdinandl007 commented Aug 21, 2024

High latency #33

High latency #33

Comments

ferdinandl007 commented Aug 21, 2024