Contents

LanguageModelExecutor

A protocol that defines the interface for responding to session requests.

Declaration

protocol LanguageModelExecutor : Sendable

Overview

An executor is the bridge between the framework types and the system that actually generates the tokens, like a server API or a local inference engine. A LanguageModel pairs with exactly one executor type and the framework instantiates the executor from the Configuration the model provides.

Every request can include preferences that control generation:

GenerationOptions

Configures the sampling strategy, temperature, and maximum response length.

ContextOptions

Configures the prompting behavior and thinking effort.

When the framework calls respond(to:model:streamingInto:), handle converting the transcript into the format your model expects and applying generation options. In some cases, you may need to fall back when your model can’t do exactly what was asked, like using temperature to approximate sampling options:

// Parse generation and context options
func respond(
    to request: LanguageModelExecutorGenerationRequest,
    model: MyLanguageModel,
    streamingInto channel: LanguageModelExecutorGenerationChannel
) async throws {

    // The request includes a sampling set to `greedy`, but your
    // model only uses temperature.
    if request.generationOptions.samplingMode == .greedy {
        // Use the temperature of `0` to approximate the intention.
    }

    // ...
}

Use LanguageModelExecutorGenerationChannel to stream incremental events back as generation progresses. You don’t return a value or close the channel explicitly. The channel finishes when the method returns or when an error is thrown.

Topics

Creating an executor

Prewarming the model

Handling the response

See Also

Custom language model provider