speechall/speechall-swift-sdk
A Swift SDK for the [Speechall](https://speechall.com) speech-to-text API. Transcribe audio and video files using multiple STT providers through a unified interface.
Features
- Simple, developer-friendly API - Get transcriptions with just a few lines of code
- Memory-efficient streaming - Files are streamed directly from disk using
FileHandle, never loaded entirely into memory - Video file support - Pass video files (mp4, mov, m4v, avi) directly; audio is extracted automatically
- Multiple output formats - Plain text, SRT subtitles, VTT subtitles, or detailed JSON with timestamps
- Type-safe - Fully generated from OpenAPI spec with Swift OpenAPI Generator
Requirements
- macOS 13+ / iOS 16+ / tvOS 16+ / watchOS 9+
- Swift 5.9+
Installation
Add SpeechallAPI to your Package.swift:
dependencies: [
.package(url: "https://github.com/Speechall-SDK/speechall-swift-sdk", from: "0.0.1")
]Then add the dependency to your target:
.target(
name: "YourTarget",
dependencies: [
.product(name: "SpeechallAPI", package: "speechall-swift-sdk")
]
)Quick Start
import SpeechallAPI
// Initialize the client with your API key
let client = SpeechallClient(apiKey: "your-api-key")
// Transcribe an audio or video file to plain text
let transcription = try await client.transcribe(
fileAt: URL(filePath: "/path/to/audio.mp3"),
withModel: .cloudflare_period_whisper
)
print(transcription)Convenience Methods
The SpeechallClient provides three main methods for common transcription tasks:
Plain Text Transcription
let text = try await client.transcribe(
fileAt: audioUrl,
withModel: .cloudflare_period_whisper,
inLanguage: .en, // Optional, defaults to .auto
withInitialContext: "Technical discussion about Swift programming" // Optional
)Subtitles (SRT or VTT)
// SRT format
let srtSubtitles = try await client.subtitlesFor(
fileAt: videoUrl,
as: .srt,
withModel: .assemblyai_period_best,
inLanguage: .auto
)
// VTT format
let vttSubtitles = try await client.subtitlesFor(
fileAt: videoUrl,
as: .vtt,
withModel: .assemblyai_period_best
)Detailed Transcription with Timestamps
Get word-level or segment-level timestamps:
let detailed = try await client.detailedTranscription(
of: audioUrl,
withModel: .deepgram_period_nova_hyphen_2
)
print("Full text: \(detailed.text)")
print("Language: \(detailed.language ?? "unknown")")
// Access individual segments with timestamps
for segment in detailed.segments ?? [] {
print("[\(segment.start ?? 0) - \(segment.end ?? 0)] \(segment.text ?? "")")
}
// Access individual words with timestamps
for word in detailed.words ?? [] {
print("[\(word.start) - \(word.end)] \(word.word)")
}Available Models
The SDK supports a wide range of speech-to-text models from various providers:
| Provider | Model Identifiers | |----------|------------------| | OpenAI | openai.whisper_1, openai.gpt_4o_transcribe, openai.gpt_4o_mini_transcribe | | Cloudflare | cloudflare.whisper | | Deepgram | deepgram.nova_2, deepgram.nova, deepgram.base, deepgram.whisper_large, etc. | | AssemblyAI | assemblyai.best, assemblyai.nano | | Groq | groq.whisper_large_v3, groq.whisper_large_v3_turbo, groq.distil_whisper_large_v3_en | | ElevenLabs | elevenlabs.scribe_v1 | | Gladia | gladia.default | | Amazon | amazon.transcribe | | RevAI | revai.default | | And more... | |
Use autocomplete on Components.Schemas.TranscriptionModelIdentifier to see all available options.
Advanced Usage: Full API Access
For advanced features like speaker diarization, custom vocabulary, text replacement rules, and more, you can use the fully-typed generated client directly:
import SpeechallAPI
import SpeechallAPITypes
import OpenAPIRuntime
import OpenAPIAsyncHTTPClient
// Create the low-level client
let client = SpeechallAPI.Client(
serverURL: URL(string: "https://api.speechall.com/v1")!,
transport: AsyncHTTPClientTransport(),
middlewares: [AuthenticationMiddleware(apiKey: "your-api-key")]
)Speaker Diarization
Identify different speakers in the audio:
let audioData = try Data(contentsOf: audioUrl)
let response = try await client.transcribe(
query: .init(
model: .deepgram_period_nova_hyphen_2,
language: .en,
output_format: .json,
diarization: true,
speakers_expected: 2 // Optional hint
),
body: .audio__ast_(HTTPBody(audioData))
)
if case .ok(let result) = response,
case .json(let transcription) = result.body,
case .TranscriptionDetailed(let detailed) = transcription {
for segment in detailed.segments ?? [] {
print("Speaker \(segment.speaker ?? "?"): \(segment.text ?? "")")
}
}Custom Vocabulary
Improve recognition of specific terms:
let response = try await client.transcribe(
query: .init(
model: .assemblyai_period_best,
custom_vocabulary: ["Speechall", "OpenAPI", "AsyncHTTPClient"]
),
body: .audio__ast_(HTTPBody(audioData))
)Temperature Control
Adjust output randomness for supported models:
let response = try await client.transcribe(
query: .init(
model: .openai_period_whisper_hyphen_1,
temperature: 0.2 // Lower = more deterministic
),
body: .audio__ast_(HTTPBody(audioData))
)Transcribe from Remote URL
Transcribe audio hosted at a public URL without downloading it first:
let response = try await client.transcribeRemote(
body: .json(.init(
url: "https://example.com/audio.mp3",
model: .cloudflare_period_whisper
))
)Text Replacement Rulesets
Create reusable text replacement rules for post-processing:
// Create a ruleset
let rulesetResponse = try await client.createReplacementRuleset(
body: .json(.init(
name: "Technical Terms",
rules: [
.init(_type: .exact, pattern: "swift", replacement: "Swift"),
.init(_type: .regex, pattern: "\\bapi\\b", replacement: "API")
]
))
)
// Use the ruleset in transcription
if case .created(let created) = rulesetResponse,
case .json(let ruleset) = created.body {
let response = try await client.transcribe(
query: .init(
model: .cloudflare_period_whisper,
ruleset_id: ruleset.id
),
body: .audio__ast_(HTTPBody(audioData))
)
}List Available Models
Discover all available models and their capabilities:
let modelsResponse = try await client.listSpeechToTextModels()
if case .ok(let result) = modelsResponse,
case .json(let models) = result.body {
for model in models {
print("\(model.display_name) (\(model.id.rawValue))")
print(" Diarization: \(model.diarization ?? false)")
print(" Streamable: \(model.streamable ?? false)")
print(" Languages: \(model.supported_languages?.joined(separator: ", ") ?? "N/A")")
}
}OpenAI-Compatible Endpoint
Use the OpenAI-compatible endpoint for easy migration. Set the base URL to https://api.speechall.com/v1/openai-compatible/audio/transcriptions in your OpenAI client from whichever library you are using.
Error Handling
The convenience methods throw descriptive errors:
do {
let text = try await client.transcribe(fileAt: audioUrl, withModel: .cloudflare_period_whisper)
} catch TranscriptionError.invalidFile {
print("File could not be read or is not a supported format")
} catch TranscriptionError.invalidResponse {
print("Server returned an invalid response")
} catch TranscriptionError.apiError(let message, let code) {
print("API error \(code): \(message)")
} catch TranscriptionError.networkError(let error) {
print("Network error: \(error.localizedDescription)")
}Configuration
Custom Base URL for Proxying
let client = SpeechallClient(
baseUrl: URL(string: "https://custom-endpoint.example.com/v1")!,
apiKey: "your-api-key"
)Request Timeout
The default timeout is 1200 seconds (20 minutes) to accommodate large files:
let client = SpeechallClient(
apiKey: "your-api-key",
timeoutInSeconds: 3600 // 1 hour
)License
See LICENSE for details.
Package Metadata
Repository: speechall/speechall-swift-sdk
Default branch: main
README: README.md