chenweisomebody126/SegmentKit
Real-time video object segmentation SDK for iOS. Powered by [EdgeTAM](https://github.com/facebookresearch/EdgeTAM).
Features
- Real-time performance — 30+ FPS on iPhone 14 Pro and later
- Tap-to-track — Point, box, or mask prompts
- Zero-copy Metal pipeline — CoreML + Metal with no CPU overhead
- Streaming memory — Constant memory usage regardless of video length
- Drop-in camera view — SwiftUI
SKCameraViewwith built-in UI - Offline license validation — Ed25519 signed keys, no network required
Requirements
- iOS 17.0+
- Xcode 16.0+
- iPhone 14 Pro or later (A16+ with Neural Engine)
Installation
Swift Package Manager
Add SegmentKit to your project via Xcode:
- File → Add Package Dependencies
- Enter:
https://github.com/chenweisomebody126/SegmentKit - Select version and add to your target
Or add to your Package.swift:
dependencies: [
.package(url: "https://github.com/chenweisomebody126/SegmentKit", from: "0.4.1")
]Quick Start
import EdgeTAMKit
// 1. Configure with your license key (or use free mode)
try SegmentKit.configure(licenseKey: "your-license-key")
// 2. Create a segmenter (liveStream mode for real-time camera)
let options = ETKSegmenterOptions()
options.runningMode = .liveStream
options.liveStreamDelegate = self
let segmenter = try ETKSegmenter(options: options)
// 3. Send camera frames — SDK handles frame dropping automatically
try segmenter.segmentAsync(
frame: pixelBuffer,
prompt: .point(tapPoint), // first frame needs a prompt
timestampMs: timestampMs
)
// 4. Receive results via delegate
func segmenter(_ segmenter: ETKSegmenter,
didFinishWith result: ETKTrackResult?,
timestampMs: Int, error: Error?) {
guard let result else { return }
// result.mask — 256×256 segmentation mask
// result.confidence — IoU score
// result.isTracking — target still visible?
}Example App
See Examples/SegmentKitDemo/ for a complete, runnable demo app with camera preview, tap-to-track, and mask overlay — localized in English and Chinese.
API Overview
Core Types
| Type | Description | |------|-------------| | ETKSegmenter | Main entry point — create, configure, and run segmentation | | ETKSegmenterOptions | Configuration options (mode, performance, multi-object) | | ETKPrompt | What to segment — point, box, mask, or combination | | ETKTrackResult | Per-frame tracking output (mask, confidence, overlay) | | ETKSegmentResult | Single-image segmentation output |
ETKSegmenter
Three running modes for different use cases:
// Image mode — single frame, no state
options.runningMode = .image
let result = try segmenter.segment(image: photo, prompt: .point(tap))
// Video mode — synchronous, frame-by-frame
options.runningMode = .video
let first = try segmenter.segment(videoFrame: frame, prompt: .point(tap), timestampMs: 0)
let next = try segmenter.segment(videoFrame: frame, timestampMs: 33) // auto-track
// LiveStream mode — async, camera pipeline
options.runningMode = .liveStream
options.liveStreamDelegate = self
try segmenter.segmentAsync(frame: pixelBuffer, prompt: .point(tap), timestampMs: ts)ETKSegmenterOptions
| Property | Default | Description | |----------|---------|-------------| | runningMode | .image | .image / .video / .liveStream | | maxObjects | 1 | Multi-object tracking (1–10). Each +1 object ≈ +3ms | | memoryFrames | 7 | Temporal memory depth (1–7). More = stabler tracking | | interleavingEnabled | true | IE∥MA parallel execution. ~29ms vs ~80ms per frame | | computeUnit | .auto | .auto / .cpuAndANE / .cpuAndGPU |
ETKPrompt
All coordinates are normalized (0–1), origin at top-left.
// Single foreground point
.point(CGPoint(x: 0.5, y: 0.5))
// Foreground + background points
.points([
ETKLabeledPoint(point: target, label: .foreground),
ETKLabeledPoint(point: exclude, label: .background),
])
// Bounding box
.boundingBox(CGRect(x: 0.2, y: 0.3, width: 0.4, height: 0.3))
// Combine prompts
.combined([.boundingBox(rect), .points([bg])])ETKTrackResult
result.mask // [Float] — 256×256 logits (positive = foreground)
result.confidence // Float — IoU score (0–1)
result.isTracking // Bool — target still visible?
result.binaryMask // [Bool] — thresholded at 0
result.probabilityMask // [Float] — sigmoid probabilities
// Render overlay directly
let overlay = result.overlayImage(on: frame, color: .systemBlue, opacity: 0.4)Multi-Object Tracking
Track multiple objects simultaneously (requires maxObjects > 1):
options.maxObjects = 3
let segmenter = try ETKSegmenter(options: options)
// First frame — provide all prompts
let results = try segmenter.segment(
videoFrame: frame,
prompts: [
0: .point(leftFoot),
1: .point(rightFoot),
],
timestampMs: 0
)
// results[0] → left foot mask, results[1] → right foot mask
// Subsequent frames — auto-track all objects
let tracked = try segmenter.segmentMulti(videoFrame: frame, timestampMs: 33)Pricing
| Plan | Price | Features | |------|-------|----------| | Free Trial | $0 / 7 days | Full access, no credit card required | | Indie | $49/mo | Commercial license, no watermark | | Pro | $199/mo | Multi-target, custom models, priority support | | Enterprise | Custom | Dedicated engineer, SLA |
Start your free trial at segmentkit.dev.
Documentation
- Quick Start — Get running in 4 lines
- API Overview — Core types and usage patterns
- Example App — Complete demo with camera + tap-to-track
- segmentkit.dev — Website and pricing
License
SegmentKit is a commercial SDK. See LICENSE for details.
The underlying EdgeTAM model is open source under Apache License 2.0 by Meta Platforms, Inc. See THIRD_PARTY_LICENSES for details.
Support
- Documentation: segmentkit.dev
- Email: chenweisomebody@gmail.com
- Issues: GitHub Issues
Package Metadata
Repository: chenweisomebody126/SegmentKit
Homepage: https://segmentkit.dev/
Stars: 1
Forks: 0
Open issues: 1
Default branch: main
Primary language: swift
License: Other
README: README.md