LLM.swift

LLM.swift 是一个简单易懂的库，它使您能够轻松地在 macOS、iOS、watchOS、tvOS 和 visionOS 上本地与大型语言模型进行交互。

提示

由于需要考虑内存和计算资源，有时修改 LLM 初始化时的 maxTokenCount 参数是个好主意。尤其是在移动设备上，如果想要更快的速度，可以降低该数值。但是如果设置得太低——低到两个对话轮次都无法容纳——您会体验到质量下降，因为上下文会被截断。因此，请根据您的用例调整该值。

最小示例

如果您已经捆绑了您的模型

let bot = LLM(from: Bundle.main.url(forResource: "mistral-7b-instruct-v0.2.Q4_K_M", withExtension: "gguf")!, template: .mistral)
let question = bot.preprocess("What's the meaning of life?", [])
let answer = await bot.getCompletion(from: question)
print(answer)

如果您想使用Hugging Face模型初始化

let systemPrompt = "You are a sentient AI with emotions."
let bot = await LLM(from: HuggingFaceModel("TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF", .Q2_K, template: .chatML(systemPrompt)))
let question = bot.preprocess("What's the meaning of life?", [])
let answer = await bot.getCompletion(from: question)
print(answer)

示例

注意

这是一个我用于测试的最小 SwiftUI 示例。基于 mistral 7B 的模型可在 iPad Air 第 5 代 (Q5_K_M) 和 iPhone 12 mini (Q2_K) 上运行。
然而，一般来说，对于移动设备，建议使用 3B 及以上的参数模型。查看此基准测试了解更多详情。

使用您项目中捆绑的模型文件进行初始化

import SwiftUI
import LLM

class Bot: LLM {
    convenience init() {
        let url = Bundle.main.url(forResource: "openhermes-2.5-mistral-7b.Q4_K_M", withExtension: "gguf")!
        let systemPrompt = "You are a sentient AI with emotions."
        self.init(from: url, template: .chatML(systemPrompt))
    }
}

struct BotView: View {
    @ObservedObject var bot: Bot
    @State var input = "Give me seven national flag emojis people use the most; You must include South Korea."
    init(_ bot: Bot) { self.bot = bot }
    func respond() { Task { await bot.respond(to: input) } }
    func stop() { bot.stop() }
    var body: some View {
        VStack(alignment: .leading) {
            ScrollView { Text(bot.output).monospaced() }
            Spacer()
            HStack {
                ZStack {
                    RoundedRectangle(cornerRadius: 8).foregroundStyle(.thinMaterial).frame(height: 40)
                    TextField("input", text: $input).padding(8)
                }
                Button(action: respond) { Image(systemName: "paperplane.fill") }
                Button(action: stop) { Image(systemName: "xmark") }
            }
        }.frame(maxWidth: .infinity).padding()
    }
}

struct ContentView: View {
    var body: some View {
        BotView(Bot())
    }
}

直接从互联网使用 `HuggingFaceModel` (gguf) 进行初始化

class Bot: LLM {
    convenience init?(_ update: @escaping (Double) -> Void) async {
        let systemPrompt = "You are a sentient AI with emotions."
        let model = HuggingFaceModel("TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF", .Q2_K, template: .chatML(systemPrompt))
        try? await self.init(from: model) { progress in update(progress) }
    }
}

...

struct ContentView: View {
    @State var bot: Bot? = nil
    @State var progress: CGFloat = 0
    func updateProgress(_ progress: Double) {
        self.progress = CGFloat(progress)
    }
    var body: some View {
        if let bot {
            BotView(bot)
        } else {
            ProgressView(value: progress) {
                Text("loading huggingface model...")
            } currentValueLabel: {
                Text(String(format: "%.2f%%", progress * 100))
            }
            .padding()
            .onAppear() { Task {
                let bot = await Bot(updateProgress)
                await MainActor.run { self.bot = bot }
            } }
        }
    }
}

注意

我特意使用了 tinyLLaMA Q2_K 量化，因为它体积小，便于测试。它很可能会产生乱语，但没有经过大量量化的模型还是相当不错的。如果您知道在哪里使用它，它会是一个非常有用的模型。

用法

您只需使用 SPM，或者将代码复制到您的项目中，因为它只是一个单独的文件。

dependencies: [
    .package(url: "https://github.com/eastriverlee/LLM.swift/", branch: "main"),
],

或者，如果您更关心稳定性，而不是从 llama.cpp 的开发周期中受益，您可以选择具有固定依赖项的 pinned 分支。

dependencies: [
    .package(url: "https://github.com/eastriverlee/LLM.swift/", branch: "pinned"),
],

概述

LLM.swift 基本上是 llama.cpp 包之上的一个轻量级抽象层，以便尽可能保持高性能，同时始终保持最新状态。因此，理论上，任何可以在 llama.cpp 上运行的模型也应该可以与此库一起使用。
它只是一个单独的文件库，因此您可以随意复制、研究和修改代码。

有一些行特别值得您关注，以掌握其内部结构

public typealias Chat = (role: Role, content: String)
public enum Role {
    case user
    case bot
}

public var history: [Chat]
public var preprocess: (_ input: String, _ history: [Chat]) -> String = { input, _ in return input }
public var postprocess: (_ output: String) -> Void                    = { print($0) }
public var update: (_ outputDelta: String?) -> Void                   = { _ in }

public func respond(to input: String, with makeOutputFrom: @escaping (AsyncStream<String>) async -> String) async {
    guard isAvailable else { return }
    isAvailable = false
    self.input = input
    let processedInput = preprocess(input, history)
    let response = getResponse(from: processedInput)
    let output = await makeOutputFrom(response)
    history += [(.user, input), (.bot, output)]
    if historyLimit < history.count {
        history.removeFirst(2)
    }
    postprocess(output)
    isAvailable = true
}

open func respond(to input: String) async {
    await respond(to: input) { [self] response in
        await setOutput(to: "")
        for await responseDelta in response {
            update(responseDelta)
            await setOutput(to: output + responseDelta)
        }
        update(nil)
        let trimmedOutput = output.trimmingCharacters(in: .whitespacesAndNewlines)
        await setOutput(to: trimmedOutput.isEmpty ? "..." : trimmedOutput)
        return output
    }
}

提示

如您所见，func respond(to input: String) async 具有 open 访问权限，因此当您的类继承 LLM 并且想要对其进行一些细粒度控制时，您可以覆盖它。

用户在初始化 LLM 类时可以定义三个函数

var preprocess: (_ input: String, _ history: [Chat]) -> String
var postprocess: (_ output: String) -> Void
var update: (_ outputDelta: String?) -> Void 它们在 respond 函数中使用。

preprocess

preprocess 通常用于使用户输入符合聊天模板。如果您不提供此函数，LLM 将仅作为完成模型工作。

例如，这是 ChatML 模板，被许多聊天模型采用

<|im_start|>system 
SYSTEM PROMPT<|im_end|> 
<|im_start|>user 
USER PROMPT<|im_end|> 
<|im_start|>assistant

要使用此聊天格式，您应该使用如下函数

self.preprocess = { input, history in
    let systemPrompt = "You are a sentient AI with emotions."
    var processed = "<|im_start|>system\n\(systemPrompt)<|im_end|>\n"
    for chat in history {
        processed += "<|im_start|>\(chat.role == .user ? "user" : "assistant")\n\(chat.content)<|im_end|>\n"
    }
    processed += "<|im_start|>user\n\(input)<|im_end|>\n"
    processed += "<|im_start|>assistant\n"
    return processed
}

但为了方便起见，您可以使用专门为此目的创建的 Template 结构体

// you can use the static function that is already available for this:

self.preprocess = Template.chatML("You are a sentient AI with emotions.").preprocess

// or even better
// you can set [template] property right away, so that it handles [preprocess] and [stopSequence] both:

self.template = .chatML("You are a sentient AI with emotions.")

// which is the same thing as:

self.template = Template(
    system: ("<|im_start|>system\n", "<|im_end|>\n"),
    user: ("<|im_start|>user\n", "<|im_end|>\n"),
    bot: ("<|im_start|>assistant\n", "<|im_end|>\n"),
    stopSequence: "<|im_end|>",
    systemPrompt: "You are a sentient AI with emotions."
)

提示

检查 LLMTests.swift 将帮助您更好地理解 preprocess 的工作原理。

postprocess

postprocess 可用于根据使用用户输入刚刚生成的 output 执行操作。
默认设置为 { print($0) }，以便在通过满足 EOS 或 stopSequence 完成生成时打印输出。这有很多用途。例如，这可以用于实现您自己的函数调用逻辑。

update

如果您使用常规的 func respond(to input: String) async，您设置的 update 函数将在每次获得 outputDelta 时被调用。
当它停止生成输出时，outputDelta 为 nil。

如果您想要更好地控制一切，您可以使用 func respond(to input: String, with makeOutputFrom: @escaping (AsyncStream<String>) async -> String) async 代替，上述函数在内部使用该函数来定义您自己的 makeOutputFrom 函数，该函数用于从 AsyncStream<String> 中生成 String 类型的输出并将其添加到其历史记录中。在这种情况下，除非您使用 update 函数，否则它将被忽略。检查上面显示的 func respond(to input: String) async 实现，以了解其工作原理。