Skip to content

Releases: eastriverlee/LLM.swift

v1.5.5

17 Feb 14:16
9f224a7
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v1.5.4...v1.5.5

v1.5.4

19 Oct 07:43
59da7ca
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v1.5.3...v1.5.4

v1.5.3

12 Jul 15:44
bd31fdb
Compare
Choose a tag to compare

What's Changed

Full Changelog: v1.5.2...v1.5.3

v1.5.2

08 Jun 11:14
Compare
Choose a tag to compare

Highlight

fixed failing tests #23

Full Changelog: v1.5.1...v1.5.2

v1.5.1

27 Apr 02:38
0b196e8
Compare
Choose a tag to compare

Highlight

  • fixed a problem caused by llama.cpp API change.

What's Changed

  • Pass false for special tokens to be compatible with llama.cpp commit 40f74e4 by @shawiz in #21

New Contributors

Full Changelog: v1.5.0...v1.5.1

v1.5.0

27 Mar 02:25
Compare
Choose a tag to compare

Highlights

  • added IQ1_M quantization
  • renamed preProcess and postProcess to preprocess and postprocess respectively(#17).

Full Changelog: 1.4.3...v1.5.0

1.4.3

07 Mar 17:19
Compare
Choose a tag to compare

Highlights

  • removed redundant BOS token in mistral template, as it is added by llama.cpp anyway.
  • added more quantization option that llama.cpp supports (it's just String typed enum so you can extend it anyway, but still)
  • func decode(_ token: Token) -> String is now private and you now have func decode(_ token: [Token]) -> String. the prior was handling under the hood multibyte character handling so it was not supposed to be public from the beginning.
  • changed the params.n_ctx = UInt32(maxTokenCount) + (maxTokenCount % 2 == 1 ? 1 : 2) to params.n_ctx = UInt32(self.maxTokenCount). the prior code was like that because of some error i was experiencing but just changed it to the code as it supposed to be from the beginning.

Full Changelog: v1.4.2...1.4.3

v1.4.2

01 Feb 09:38
Compare
Choose a tag to compare

Highlights

  • fixed initializer with template
public convenience init(
    from url: URL,
    template: Template,
    history: [Chat] = [],
    seed: UInt32 = .random(in: .min ... .max),
    topK: Int32 = 40,
    topP: Float = 0.95,
    temp: Float = 0.8,
    historyLimit: Int = 8,
    maxTokenCount: Int32 = 2048
) {
    self.init(
        from: url.path,
        stopSequence: template.stopSequence,
        history: history,
        seed: seed,
        topK: topK,
        topP: topP,
        temp: temp,
        historyLimit: historyLimit,
        maxTokenCount: maxTokenCount
    )
    self.preProcess = template.preProcess
    self.template = template
}

last line was missing. damn.
Full Changelog: v1.4.1...v1.4.2

v1.4.1

30 Jan 20:14
Compare
Choose a tag to compare

Highlights

  1. renamed some things to improve readability, this and change from endIndex to stopSequenceEndIndex.
extension Model {
     public var endToken: Token { llama_token_eos(self) }
     public var newLineToken: Token { llama_token_nl(self) }
...
}
  1. added download progress observing function that you can pass onto initializer. check updated README.md.
fileprivate func downloadData(to destination: URL, _ updateProgress: @escaping (Double) -> Void) async throws {
    var observation: NSKeyValueObservation!
    let url: URL = try await withCheckedThrowingContinuation { continuation in
        let task = URLSession.shared.downloadTask(with: self) { url, response, error in
            if let error { return continuation.resume(throwing: error) }
            guard let url else { return continuation.resume(throwing: HuggingFaceError.urlIsNilForSomeReason) }
            let statusCode = (response as! HTTPURLResponse).statusCode
            guard statusCode / 100 == 2 else { return continuation.resume(throwing: HuggingFaceError.network(statusCode: statusCode)) }
            continuation.resume(returning: url)
        }
        observation = task.progress.observe(\.fractionCompleted) { progress, _ in
            updateProgress(progress.fractionCompleted)
        }
        task.resume()
    }
    let _ = observation
    try FileManager.default.moveItem(at: url, to: destination)
}

Full Changelog: v1.4.0...v1.4.1

v1.4.0

30 Jan 10:08
Compare
Choose a tag to compare

Highlights

  1. you can now override a new recovery function that is called when too long input that shouldn't be handled, func recoverFromLengthy(_ input: borrowing String, to output: borrowing AsyncStream<String>.Continuation).
open func recoverFromLengthy(_ input: borrowing String, to output:  borrowing AsyncStream<String>.Continuation) {
    output.yield("tl;dr")
}
  1. fixed potential bug of inferencing when it shouldn't. usually it won't cause any damage, because we are most likely going to set maxTokenCount lower than the actual limit of the model, but still. it used to be if statement now it is a while statement.
private func prepare(from input: borrowing String, to output: borrowing AsyncStream<String>.Continuation) -> Bool {
    ...
    if maxTokenCount <= currentCount {
        while !history.isEmpty && maxTokenCount <= currentCount {
            history.removeFirst(min(2, history.count))
            tokens = encode(preProcess(self.input, history))
            initialCount = tokens.count
            currentCount = Int32(initialCount)
        }
        if maxTokenCount <= currentCount {
            isFull = true
            recoverFromLengthy(input, to: output)
            return false
        }
    }
    ...
    return true
}
  1. i changed the order of HuggingFaceModel initializer parameter and its label in 94bcc54
//so now instead of:
HuggingFaceModel("TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF", template: .chatML(systemPrompt), with: .Q2_K)

//you should do:
HuggingFaceModel("TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF", .Q2_K, template: .chatML(systemPrompt))

this just makes more sense, so i had to change it.

Full Changelog: v1.3.0...v1.4.0