silo-labs/swift-context-management
**Swift Context Management** is a Swift package designed to help developers manage language model context windows efficiently. As conversations grow longer, they often exceed the token limits of large language models (LLMs). This package provides a variety of context reduction po
Features
- Progressive Context Reduction: Automatically retries LLM requests with increasingly aggressive context reduction if a context window limit is hit.
- Multiple Reduction Policies: A wide range of strategies from simple sliding windows to advanced hierarchical summarization.
- Structured State Extraction: Automatically extract and maintain important facts, decisions, and constraints from conversations.
- Seamless Integration: Built to work with the
FoundationModelsframework.
Context Reduction Policies
The package uses ContextReductionPolicy to define how conversation history should be managed.
Implemented Policies
- Sliding Window: Keeps only the most recent N conversation turns (or tokens), discarding all earlier history.
- Head-Tail Window: Preserves the initial instructions (head) and the most recent turns (tail), dropping the middle part of the conversation.
- Rolling Summary: Replaces older conversation history with a single running summary while keeping recent turns verbatim.
- Hierarchical Summary: Maintains multiple summaries at different granularities (per turn, per topic, global) and selects the appropriate level when reducing context.
- Structured State: Extracts and stores important facts, constraints, or decisions in structured fields instead of natural language history.
Planned Policies (to be implemented ..)
- Salience Pruning: Removes low-importance or low-salience messages, keeping only critical information.
- Semantic Recall: Retrieves only the most semantically relevant past messages using vector embeddings.
- Topic Memory: Segments conversation history by topic and injects only the memory related to the current topic.
- Query Rewriting: Rewrites multi-turn conversational prompts into single standalone queries.
- Dynamic Injection: Dynamically decides which parts of history, summaries, or memory to include based on available context budget.
- dhRAG: Minimizes unnecessary context usage by selectively using history only when it improves retrieval-augmented generation.
- Reflective Memory: Periodically rewrites and refines stored memory to prevent accumulation of outdated information.
Usage
To use context management, initialize a ContextualSession with your desired policy:
import FoundationModels
import SwiftContextManagement
let session = LanguageModelSession()
let contextualSession = ContextualSession(
session: session,
policy: .rollingSummary()
)
// The session will automatically handle context window errors by applying the policy
let response = try await contextualSession.respond(to: "...")Streaming Responses
For real-time UI updates, use the streaming API which yields cumulative content as the model generates:
for try await content in contextualSession.streamResponse(to: "Tell me a story") {
// Update UI with the latest content
print(content)
}Streaming also supports structured output with @Generable types:
for try await partial in contextualSession.streamResponse(to: "...", generating: MyType.self) {
// partial is MyType.PartiallyGenerated with optional fields filled incrementally
}Both streaming methods automatically handle context window overflow with the same retry-and-reduce behavior as respond(to:).
Contributing
We welcome contributions! Please feel free to submit pull requests or open issues for any of the planned policies or new ideas.
Package Metadata
Repository: silo-labs/swift-context-management
Default branch: main
README: README.md