Contents

SpeechAnalyzer

Analyzes spoken audio content in various ways and manages the analysis session.

Declaration

final actor SpeechAnalyzer

Mentioned in

Overview

The Speech framework provides several modules that can be added to an analyzer to provide specific types of analysis and transcription. Many use cases only need a SpeechTranscriber module, which performs speech-to-text transcriptions.

The SpeechAnalyzer class is responsible for:

  • Holding associated modules

  • Accepting audio speech input

  • Controlling the overall analysis

Each module is responsible for:

  • Providing guidance on acceptable input

  • Providing its analysis or transcription output

Analysis is asynchronous. Input, output, and session control are decoupled and typically occur over several different tasks created by you or by the session. In particular, where an Objective-C API might use a delegate to provide results to you, the Swift API’s modules provides their results via an AsyncSequence. Similarly, you provide speech input to this API via an AsyncSequence you create and populate.

The analyzer can only analyze one input sequence at a time.

Perform analysis

To perform analysis on audio files and streams, follow these general steps:

  1. Create and configure the necessary modules.

  2. Ensure the relevant assets are installed or already present. See AssetInventory.

  3. Create an input sequence you can use to provide the spoken audio. See helper classes AssetInputSequenceProvider and CaptureInputSequenceProvider.

  4. Create and configure the analyzer with the modules and input sequence.

  5. Supply audio. See helper class AnalyzerInputConverter.

  6. Start analysis.

  7. Act on results.

  8. Finish analysis when desired.

This example shows how you could perform an analysis that transcribes audio using the SpeechTranscriber module:

import Speech

// Step 1: Modules
guard let locale = SpeechTranscriber.supportedLocale(equivalentTo: Locale.current) else {
    /* Note unsupported language */
}
let transcriber = SpeechTranscriber(locale: locale, preset: .transcription)

// Step 2: Assets
if let installationRequest = try await AssetInventory.assetInstallationRequest(supporting: [transcriber]) {
    try await installationRequest.downloadAndInstall()
}

// Step 3: Input sequence
let (inputSequence, inputBuilder) = AsyncStream.makeStream(of: AnalyzerInput.self)

// Step 4: Analyzer
let audioFormat = await SpeechAnalyzer.bestAvailableAudioFormat(compatibleWith: [transcriber])
let analyzer = SpeechAnalyzer(modules: [transcriber])

// Step 5: Supply audio
let converter = AnalyzerInputConverter(analyzerFormat: audioFormat)
Task {
    while /* audio remains */ {
        let buffer = /* Get some audio */
        let inputs = try converter.convert(buffer, at: nil)
        for input in inputs {
            inputBuilder.yield(input)
        }
    }
    let inputs = try converter.flush()
    for input in inputs {
        inputBuilder.yield(input)
    }
    inputBuilder.finish()
}

// Step 7: Act on results
Task {
    do {
        for try await result in transcriber.results {
            let bestTranscription = result.text // an AttributedString
            let plainTextBestTranscription = String(bestTranscription.characters) // a String
            print(plainTextBestTranscription)
        }
    } catch {
        /* Handle error */
    }
}

// Step 6: Perform analysis
let lastSampleTime = try await analyzer.analyzeSequence(inputSequence)

// Step 8: Finish analysis
if let lastSampleTime {
    try await analyzer.finalizeAndFinish(through: lastSampleTime)
} else {
    try analyzer.cancelAndFinishNow()
}

Analyze audio from files or capture devices

To read audio from a file, asset, or capture device such as a microphone, create an AssetInputSequenceProvider or CaptureInputSequenceProvider object.

Get the provider object’s analyzerInputs or analyzerInputs property to convert the source’s audio to a supported format and obtain an asynchronous input sequence of the audio. Pass that sequence to analyzeSequence(_:), start(inputSequence:), or a similar parameter of the analyzer’s initializer.

To end the analysis session after processing the audio track or captured audio, call one of the analyzer’s finish methods. Otherwise, by default, the analyzer won’t terminate its result streams and will wait for additional audio input sequences or buffers. See the “Finish analysis” section below for more details.

Analyze audio from audio buffers

You can analyze audio buffers directly without using AssetInputSequenceProvider or CaptureInputSequenceProvider.

To do this:

  1. Create an asynchronous input sequence of AnalyzerInput elements that is appropriate for your use case.

  2. Supply the input sequence to analyzeSequence(_:), start(inputSequence:), or a similar parameter of the analyzer’s initializer.

  3. Convert each audio buffer to a supported audio format, either on the fly or in advance.

  4. Create AnalyzerInput objects for each buffer.

  5. Add the AnalyzerInput objects to the input sequence.

To convert AVAudioBuffer audio buffers to a supported format as AnalyzerInput objects on the fly, use AnalyzerInputConverter.

To convert audio buffers to a supported format in advance or with some other technique:

  1. Detemine the format to convert to by calling bestAvailableAudioFormat(compatibleWith:) or individual modules’ availableCompatibleAudioFormats methods

  2. Convert the audio and create AnalyzerInput objects as necessary

To skip past part of an audio stream, omit the buffers you want to skip from the input sequence. You can resume with a later buffer.

When you resume analysis with a later AVAudioPCMBuffer buffer, you may need to supply the correct time-code to account for skipped audio. To do this, pass the time-code of the later buffer as the bufferStartTime parameter of the corresponding AnalyzerInput object.

Analyze autonomously

You can and usually should perform analysis using the analyzeSequence(_:) or analyzeSequence(from:) methods; those methods work well with Swift structured concurrency techniques. However, you may prefer that the analyzer proceed independently and perform its analysis autonomously as audio input becomes available in a task managed by the analyzer itself.

To use this capability, create the analyzer with one of the initializers that has an input sequence or file parameter, or call start(inputSequence:) or start(inputAudioFile:finishAfterFile:). To end the analysis of that input only and start analysis of different input, call one of the start methods again. To end the analysis session when the input ends, call finalizeAndFinishThroughEndOfInput().

Control processing and timing of results

Modules deliver results periodically, but you can manually synchronize their processing and delivery to outside cues.

To deliver a result for a particular time-code, call finalize(through:). To cancel processing of results that are no longer of interest, call cancelAnalysis(before:).

Improve responsiveness

By default, the analyzer and modules load the system resources that they require lazily, and unload those resources when they’re deallocated.

To proactively load system resources and “preheat” the analyzer, call prepareToAnalyze(in:) after setting its modules. This may improve how quickly the modules return their first results.

To delay or prevent unloading an analyzer’s resources — caching them for later use by a different analyzer instance — you can select a SpeechAnalyzer.Options.ModelRetention option and create the analyzer with an appropriate SpeechAnalyzer.Options object.

To set the priority of analysis work, create the analyzer with a SpeechAnalyzer.Options object with the desired priority value.

Specific modules may also offer options that improve responsiveness.

Finish analysis

To end an analysis session, you must use one of the analyzer’s finish methods or parameters, or deallocate the analyzer.

When the analysis session transitions to the finished state:

  • The analyzer won’t consume additional input from the input sequence (but note that it doesn’t drain or terminate the sequence)

  • Most methods won’t do anything; in particular, the analyzer won’t accept different input sequences or modules

  • Module result streams terminate and modules won’t publish additional results, though the app can continue to iterate over already-published results

Respond to errors

When the analyzer or its modules’ result streams throw an error, the analysis session becomes finished as described above, and the same error (or a CancellationError) is thrown from all waiting methods and result streams.

When this happens, you may wish to terminate the input sequence, or create a new analyzer to continue working on the remaining (and any additional) input.

Manage simultaneous analyses

The system normally limits simultaneous analyses to a conservative number, considering hardware capabilities of different devices. If you exceed that number, the system throws an insufficientResources error.

However, under certain use cases, the hardware may be able to accommodate additional simultaneous analyses; for example, several simultaneous transcription sessions may use the same language and settings, or only receive audio in an interleaved schedule. To support these use cases, you can override the system to ignore the predefined conservative system resource limits.

To override the normal limits, create an analyzer with a SpeechAnalyzer.Options object with its ignoresResourceLimits value set to true. The system allows an unlimited number of analyzers configured with this option. However, the hardware requirements of numerous analyzers will eventually exceed the system’s actual capacity, and one or more of the analyzers will fail, throwing an unpredictable error.

Topics

Creating an analyzer

Managing modules

Performing analysis

Performing autonomous analysis

Finalizing and cancelling results

Finishing analysis

Determining audio formats

Improving responsiveness

Monitoring analysis

Managing contexts

See Also

Essentials