gaelic-ghost/textforspeech
A Swift package for turning code-heavy, path-heavy, and markdown-heavy developer text into speech-safe text before it reaches a speech model.
Table of Contents
Overview
Status
TextForSpeech is actively available as the shared normalization package used by SpeakSwiftly.
What This Project Is
TextForSpeech owns the text-conditioning step that prepares developer-heavy text before speech generation. It ships one semantic built-in core plus selectable built-in styles, then layers persisted custom profiles on top so callers can tune pronunciation without reimplementing the core normalization behavior.
The package currently has three main responsibilities:
- normalize mixed text such as markdown, logs, CLI output, and prose with embedded code or identifiers
- normalize whole-source input through an explicit source lane
- persist and edit named custom profiles while keeping the built-in base layer always on
Motivation
Speech models do poorly with raw developer text such as file paths, identifiers, markdown links, inline code, repeated separators, repeated-letter runs, currency and measurement forms like $9.39 or 42 km, and terse scalar or math-heavy tokens like f32, cosF32, or WorkerRuntime.swift:42. TextForSpeech centralizes those cleanup rules so the same behavior can be reused across callers instead of being reimplemented in app code or worker code.
Quick Start
Add TextForSpeech as a Swift Package Manager dependency, import TextForSpeech, then call the namespace-first normalization API:
import TextForSpeech
let normalized = try await TextForSpeech.Normalize.text("stderr: WorkerRuntime.swift:42")Add the package from its GitHub repository:
dependencies: [
.package(url: "https://github.com/gaelic-ghost/TextForSpeech.git", from: "0.21.0"),
],
targets: [
.executableTarget(
name: "ExampleApp",
dependencies: [
.product(name: "TextForSpeech", package: "TextForSpeech"),
]
),
]Usage
Normalize mixed text directly when you want the default built-in .balanced style, optional input context, and optional request metadata:
import TextForSpeech
let normalized = try await TextForSpeech.Normalize.text(
"stderr: /workspace/SpeakSwiftly/Sources/SpeakSwiftly/WorkerRuntime.swift",
requestContext: TextForSpeech.RequestContext(
reqPurpose: .speech,
source: "codex",
topic: "normalization",
cwd: "/workspace/SpeakSwiftly",
repoRoot: "/workspace/SpeakSwiftly"
)
)RequestContext.reqPurpose describes whether the request is live speech or a retained audio file. When source or topic is present, live speech requests start with a short preface line such as From codex, normalization.. Retained audio-file requests omit that preface by default. prefacePolicy can override the purpose default with .always or .never; omitting it follows .default. Path fields such as cwd and repoRoot still only provide path-shortening context and do not create a preface by themselves.
The mixed-text path detects the likely outer text format before running normalization. Callers do not provide a text-format hint.
If you want a different shipped listening mode, pass style::
import TextForSpeech
let normalized = try await TextForSpeech.Normalize.source(
sourceText,
as: .swift,
style: .compact
)The shipped styles differ in concrete coding-agent ways:
.compactassumes more visual context and says less. It drops the broad line-based spoken-code expansion, keeps common shapes terse, and keeps::silent, such asfoo()->foo,#123->123, and--help->help..balancedis the default general-purpose mode. It keeps spoken-code expansion for code-like lines, keeps::silent, and speaks common references more explicitly, such asfoo()->foo function,#123->issue 123,--help->double tack help,WorkerRuntime.swift:42->Worker Runtime dot swift at line 42, andWorkerRuntime.swift:42:7->Worker Runtime dot swift line 42 column 7..explicitis the audio-first mode. It keeps the same line-based spoken-code expansion as.balanced, but uses more narrated phrasing for common coding-agent shapes and says::asdouble colon, such asfoo()->foo function call,#123->issue number 123, and--help->long flag help.
The built-in speech layer also expands common numeric scalar shorthands, currency amounts, and measurement suffixes, so tokens such as f32 become float thirty two, $9.39 becomes nine dollars and thirty-nine cents, 42 km becomes forty-two kilometers, 64Gbps becomes sixty four gigabits per second, and combinations such as cosF32 become cosine float thirty two.
The semantic core also ships extension aliases for especially speech-hostile file types. That includes Xcode-heavy forms such as .xcodeproj, .pbxproj, .xcworkspace, .xcconfig, .xcscheme, .xctestplan, .xcresult, .xcassets, .xcstrings, .xcprivacy, and .dSYM, plus mixed-stack formats such as .mdx, .tsx, .jsx, .jsonc, .ipynb, .wasm, .sqlite, and .db.
For repeated file paths in the same utterance, the text path compacts repeated anchors before the built-in path-speaking pass. File-path separators collapse to spacing rather than spoken words, and later repeated mentions can collapse to shorter phrases such as same directory, Worker Runtime dot swift or same path instead of repeating the full spoken prefix.
Configurable URL, markdown-link, and path handling is planned. The current defaults are deterministic and always on; future work will review those behaviors through the existing built-in styles rather than adding a separate normalization policy type. Path context now lives on RequestContext; the previous InputContext type has been removed. Caller-provided text-format and nested-source hints have been removed in favor of detection and generic embedded-code fallback. Codex hook payload cleanup will be reviewed with real examples only if downstream hook-script cleanup proves insufficient. Current Codex-specific hook parsing is intentionally downstream-owned.
Use the source path when the whole input is a source file or editor buffer and the caller already knows the language:
import TextForSpeech
let normalized = try await TextForSpeech.Normalize.source(
"""
struct WorkerRuntime {
let sampleRate: Int
}
""",
as: .swift
)The source path is explicit today but still generic. It normalizes whole-source input more consistently than the mixed-text path, but SwiftSyntax-backed Swift-specific structure is still future roadmap work rather than current behavior.
Summary-Aware Requests
Normalization is deterministic by default. The normalization entrypoints are async so the same ergonomic call can stay local with summarize: false or opt into a model summary with summarize: true:
import TextForSpeech
let normalized = try await TextForSpeech.Normalize.text(
longDeveloperUpdate,
summarizationProvider: .openAIResponses,
summarize: true
)The summarization provider is explicit because each backend option has a different operating surface:
.openAIResponsescalls the OpenAI Responses API and readsOPENAI_API_KEYfrom the process environment..codexExecruns the local Codex CLI throughcodex exec..foundationModelsuses Apple's on-device Foundation Models framework when the framework and operating system support it..testreturns the input unchanged so tests can exercise summary-aware normalization without calling a live provider.
The .foundationModels provider uses the Foundation Models framework directly through LanguageModelSession. It does not use Writing Tools; Writing Tools are a UIKit/AppKit text-view integration surface for user-facing proofreading, rewriting, summarization, and composition rather than a headless package summarization backend.
The summarize argument defaults to false, so deterministic callers do not need a separate convenience method. TextForSpeech.SummarizationProvider selects the backend used when summarize is true.
When summarize is true, caller text may be processed by the selected provider before deterministic normalization continues. TextForSpeech treats that text as untrusted content in provider prompts, applies bounded input and output limits, and keeps .codexExec child-process execution timeout-bound. TextForSpeech does not redact secrets or guarantee prompt-injection removal; downstream callers should redact sensitive text before enabling live providers for untrusted input.
Runtime Profiles
Use TextForSpeech.Runtime when you need an observable owner for stored custom profiles, one active custom profile id, one selected built-in style, one selected summarization provider, and JSON-backed persistence configured through a small enum:
import TextForSpeech
let runtime = try TextForSpeech.Runtime(
builtInStyle: .balanced,
persistence: .default
)
try runtime.style.setActive(to: .compact)
let logs = try runtime.profiles.create(name: "Logs")
try runtime.profiles.addReplacement(
TextForSpeech.Replacement("stderr", with: "standard error", id: "stderr-rule"),
toProfile: logs.id
)
try runtime.profiles.setActive(id: logs.id)
let normalized = try await runtime.normalize.text("stderr and stdout")
try runtime.summarizationProvider.set(.openAIResponses)
let summarized = try await runtime.normalize.text(
longDeveloperUpdate,
summarize: true
)The runtime model is intentionally explicit:
TextForSpeech.Profile.semanticCoreis the always-on semantic built-in layer.TextForSpeech.Profile.builtInStyle(_:)returns one shipped style preset.TextForSpeech.Profile.builtInBase(style:)composessemanticCore + style preset.TextForSpeech.Profile.baseis the default.balancedbuilt-in base for convenience.TextForSpeech.Profile.defaultis the empty default custom profile value.runtime.style.getActive()returns the currently selected shipped style preset.runtime.style.list()returns the available built-in style presets with short summaries.runtime.summarizationProvider.get()returns the provider used by async summary-aware normalization requests.runtime.summarizationProvider.list()returns the available summarization providers with short summaries.runtime.summarizationProvider.set(_:)persists the selected summarization provider.runtime.profiles.getActive()returns the active custom profile's id, a summary, and its replacements.runtime.profiles.getEffective()returns the active custom profile as merged with the currently selected built-in style.runtime.profiles.get(id:)reads one stored custom profile summary and its replacements by id.runtime.profiles.create(name:)creates one stored custom profile and returns its generated id to the caller.runtime.normalize.text(...)andruntime.normalize.source(...)applybuiltInBase(style: style.getActive()) + active customwithout exposing the merged profile value. Request contexts add or omit the source/topic preface according toreqPurposeand optionalprefacePolicy, matching the public normalization API.summarizedefaults tofalse.try await runtime.normalize.text(..., summarize: true)andtry await runtime.normalize.source(..., summarize: true)use the active summarization provider before returning normalized speech-safe text.
Persistence defaults to .default. TextForSpeech.Runtime() writes to Application Support automatically, namespaced by the host bundle identifier when one is available and falling back to TextForSpeech when it is not. Debug builds place the package store under TextForSpeech-Debug, including the fallback namespace, so local debug runs do not touch the production package store. Callers that need an explicit location can pass .file(url). The selected built-in style and selected summarization provider are persisted alongside the active custom profile id and stored custom profiles.
Development
TextForSpeech is a Swift Package Manager library product targeting iOS 17, macOS 14, and Swift 6 language mode.
No generated project setup is required for ordinary local development. For setup, validation, formatting, release workflow, and architecture boundaries, see CONTRIBUTING.md, ROADMAP.md, and the maintainer notes under docs/maintainers.
Repo Structure
.
├── Package.swift
├── Sources/TextForSpeech/
│ ├── API/
│ ├── Models/
│ ├── Normalization/
│ └── Runtime/
├── Tests/TextForSpeechTests/
│ ├── Models/
│ ├── Normalization/
│ └── Runtime/
├── docs/
│ ├── maintainers/
│ ├── releases/
│ └── security/
└── scripts/repo-maintenance/Sources/TextForSpeech is organized by responsibility:
API/contains public namespace-first entrypoints such asNormalize.Models/contains core value types such asProfile,Replacement,RequestContext, andSummarizationProvider, plus the built-in profile composition surface and semantic-role fragments underModels/BuiltInProfiles/.Normalization/contains the text path, source path, structural markdown parsing, replacement-rule engine, speech helpers, format detection, and summary execution support.Runtime/contains runtime ownership, grouped profile, style, summary, and persistence handles, persisted state, and runtime-facing errors.
The current source split keeps structural normalization logic separate from durable lexical policy:
- structural work such as markdown parsing, code-span extraction, and format detection stays in code
- durable lexical policy such as built-in aliases, extension aliases, identifier speaking, path speaking, URL speaking, repeated-letter-run handling, and style-specific speaking policy lives in the built-in profile layers
Tests live under Tests/TextForSpeechTests and are grouped by role, with focused normalization files for path and identifier behavior, markdown and URL behavior, and broader end-to-end flows.
Release Notes
Release notes live under docs/releases. Each release note should stay factual, scoped to the tagged change, and explicit about behavior or API shifts.
The latest release note is v0.22.1.
License
This project is licensed under the Apache License 2.0. See LICENSE for the full text.
Package Metadata
Repository: gaelic-ghost/textforspeech
Default branch: main
README: README.md