christopherkarani/ContextCore
ContextCore: The ultra-fast Metal context engine for on-device AI. Build optimized context windows in <5ms with perfect recall on Apple Silicon. π§ β‘οΈπ
What it does
- Metal-accelerated scoring: custom Metal shaders handle relevance and recency scoring, with measured throughput at 63.36M chunks/sec and 2.45x GPU math speedup on large workloads.
- Four memory tiers: working, episodic, semantic, and procedural memory each have their own retrieval role.
- Progressive compression: lower-signal chunks can be compressed automatically when the token budget gets tight.
- Fast window builds:
buildWindow(500, 4096)measures 4.89ms p99 on the latest full release run. - Background consolidation:
consolidate(2000)measures 15.61ms p99. - Attention-aware reranking: context chunks can be reordered by attention centrality.
ποΈ Architecture
flowchart TB
subgraph Client ["Your Application"]
Input([User Input])
end
subgraph Core ["ContextCore Engine"]
direction TB
Orch[AgentContext]
subgraph Metal ["Metal Acceleration β‘οΈ"]
Scoring[Scoring Kernel]
Attn[Attention Kernel]
end
subgraph Mem ["Memory Tiers"]
Episodic[(Episodic)]
Semantic[(Semantic)]
Procedural[(Procedural)]
end
Packer[Window Packer]
end
Input --> Orch
Orch -->|Query| Mem
Mem -->|Candidates| Scoring
Scoring -->|Ranked Chunks| Attn
Attn -->|Reranked| Packer
Packer -->|Final Prompt| Model([LLM Inference])
style Core fill:#fff,stroke:#000,stroke-width:2px,color:#000
style Metal fill:#000,stroke:#fff,stroke-width:1px,color:#fff
style Scoring fill:#000,stroke:#fff,stroke-width:1px,color:#fff
style Attn fill:#000,stroke:#fff,stroke-width:1px,color:#fff
style Client fill:#fff,stroke:#000,stroke-dasharray: 5 5
style Model fill:#000,color:#fffWhy ContextCore
| Feature | β Standard LLM Usage | β With ContextCore | | :--- | :--- | :--- | | Recall | Forgets early conversation turns as context fills. | Retrieves relevant turns from earlier in the thread with semantic search. | | Speed | Slows down linearly as context grows. | Window building stays under 5ms p99 and consolidation stays under 16ms p99 on the measured M2 run. | | Cost | Wastes tokens by re-sending irrelevant history. | Packs higher-value tokens first and compresses the rest. | | Coherence | Loses track of long-running tasks. | Procedural memory tracks tool usage and task patterns. |
π Performance
ContextCore is designed to run locally on Apple Silicon.
xychart-beta
title "Window Build Latency (p99) - Lower is Better"
x-axis ["Target Limit", "ContextCore (M2)"]
y-axis "Milliseconds (ms)" 0 --> 25
bar [20.0, 6.54]xychart-beta
title "Consolidation Time (2000 chunks) - Lower is Better"
x-axis ["Target Limit", "ContextCore (M2)"]
y-axis "Milliseconds (ms)" 0 --> 500
bar [500.0, 19.7]xychart-beta
title "GPU Math Speedup (50000 chunks) - Higher is Better"
x-axis ["CPU Baseline", "ContextCore GPU"]
y-axis "Relative Speed" 0 --> 3
bar [1.0, 2.45]π Quick Start
import ContextCore
// 1. Initialize ContextCore
let context = try AgentContext()
// 2. Start a session
try await context.beginSession(systemPrompt: "You are a senior Swift engineer.")
// 3. Append turns
try await context.append(turn: Turn(role: .user, content: "How do I fix this actor leak?"))
// 4. Build a packed window
let window = try await context.buildWindow(
currentTask: "Debug actor isolation",
maxTokens: 4096
)
// 5. Format for your model
let prompt = window.formatted(style: .chatML)π Installation
dependencies: [
.package(url: "https://github.com/christopherkarani/ContextCore.git", from: "0.1.0")
]License
ContextCore is available under the MIT license. See LICENSE for details.
Package Metadata
Repository: christopherkarani/ContextCore
Homepage: https://christopherkarani.github.io/ContextCore/
Stars: 22
Forks: 3
Open issues: 1
Default branch: main
Primary language: swift
License: MIT
Topics: ai-agents, anthropic, context-engineering, metal, on-device-ai, openai, performance-engineering, swift, swift-library, swift-package
README: README.md