Contents

Encoding video for low-latency conferencing

Configure a compression session to optimize encoding for video-conferencing apps.

Overview

This sample code project demonstrates how to configure and use a VTCompressionSession object to encode video for low-latency conferencing.

Create a compression session

Create a VTCompressionSession object and specify the required dimensions and codecType. For low latency use, include the kVTVideoEncoderSpecification_EnableLowLatencyRateControl option and set it to true in the videoEncoderSpecification. Optionally, specify sourceImageBufferAttributes to provide a description of the source pixel buffers.

// Set `kVTVideoEncoderSpecification_EnableLowLatencyRateControl` in encoder
// specification when creating a compression session to request low-latency
// rate control.
let videoEncoderSpecification = [kVTVideoEncoderSpecification_EnableLowLatencyRateControl: true as CFBoolean] as CFDictionary

// Specify the pixel format of the uncompressed video.
let sourceImageBufferAttributes = [kCVPixelBufferPixelFormatTypeKey: options.pixelFormat as CFNumber] as CFDictionary

var compressionSessionOut: VTCompressionSession?
let err = VTCompressionSessionCreate(allocator: kCFAllocatorDefault,
                                     width: Int32(options.destWidth),
                                     height: Int32(options.destHeight),
                                     codecType: options.codec,
                                     encoderSpecification: videoEncoderSpecification,
                                     imageBufferAttributes: sourceImageBufferAttributes,
                                     compressedDataAllocator: nil,
                                     outputCallback: nil,
                                     refcon: nil,
                                     compressionSessionOut: &compressionSessionOut)
guard err == noErr, let compressionSession = compressionSessionOut else {
    throw RuntimeError("VTCompressionSession creation failed (\(err))!")
}

Configure the compression session

Get the suggested encoder settings dictionary for the encode preset.

/// Get the suggested encoder settings dictionary for encode preset.
/// - Parameters:
///   - session: A compression session.
///   - encodePreset: The `EncodePreset` enumeration.
private func getEncoderSettingsForPreset(session: VTCompressionSession, encodePreset: EncodePreset) -> [CFString: Any]? {
    var supportedPresetDictionaries: CFDictionary?
    var encoderSettings: [CFString: Any]?

    _ = withUnsafeMutablePointer(to: &supportedPresetDictionaries) { valueOut in
        VTSessionCopyProperty( session, key: kVTCompressionPropertyKey_SupportedPresetDictionaries,
                               allocator: kCFAllocatorDefault, valueOut: valueOut )
    }

    if let presetDictionaries = supportedPresetDictionaries as? [CFString: [CFString: Any]] {
        let presetConstant = switch encodePreset {
        case .videoConferencing: kVTCompressionPreset_VideoConferencing
        }

        encoderSettings = presetDictionaries[presetConstant]
    }

    return encoderSettings
}

Set the encoder settings dictionary. Set kVTCompressionPropertyKey_RealTime to kCFBooleanTrue to indicate that this is a live encoding session. Specify the expected video source frame rate. Optionally specify the codec profile and level and update the average target bit rate. The low-latency rate control that the system enables during the VTCompressionSession object creation takes care of the other encoder configuration for low latency.

/// Configures a compression session for low-latency conferencing.
/// - Parameters:
///   - session: A compression session.
///   - options: The configuration options.
///   - expectedFrameRate: The expected frame rate of the video source.
private func configureVTCompressionSession(session: VTCompressionSession, options: Options, expectedFrameRate: Float) throws {
    // Different encoder implementations may support different property sets, so
    // the app needs to determine the implications of a failed property setting
    // on a case-by-case basis for the encoder. If the property is essential for
    // the use case and its setting fails, the app terminates. Otherwise, the
    // encoder ignores the failed setting and uses a default value to proceed
    // with encoding.

    var err: OSStatus = noErr
    var variableBitRateMode = false

    if let presetTuple = options.presetTuple {
        // Try configuring the encoder using the preset.
        let encoderSettings: [CFString: Any]?
        encoderSettings = getEncoderSettingsForPreset(session: session, encodePreset: presetTuple.0)

        if let encoderSettings {
            if encoderSettings[kVTCompressionPropertyKey_VariableBitRate] != nil {
                variableBitRateMode = true
            }

            // Set the encoder settings dictionary on the compression session.
            err = VTSessionSetProperties(session, propertyDictionary: encoderSettings as CFDictionary)
            try NSError.check(err, "VTSessionSetProperties failed")
        }
    }

    // Indicate real-time compression session, which conferencing requires.
    err = VTSessionSetProperty(session, key: kVTCompressionPropertyKey_RealTime, value: kCFBooleanTrue)
    if err != noErr {
        print("Warning: VTSessionSetProperty(kVTCompressionPropertyKey_RealTime) failed (\(err))")
    }

    // Indicate the expected frame rate, if known. This is just a hint for rate
    // control purposes; the actual encoding frame rate matches the incoming
    // frame rate even if it doesn't match this setting. When
    // `kVTCompressionPropertyKey_RealTime` is `kCFBooleanTrue`, the video
    // encoder may optimize energy usage.
    err = VTSessionSetProperty(session, key: kVTCompressionPropertyKey_ExpectedFrameRate, value: expectedFrameRate as CFNumber)
    if err != noErr {
        print("Warning: VTSessionSetProperty(kVTCompressionPropertyKey_ExpectedFrameRate) failed (\(err))")
    }

    // Specify the profile and level for the encoded bitstream.
    if let profileTuple = options.profileTuple {
        var profileConstant: CFString?

        if options.codec == kCMVideoCodecType_H264 {
            if profileTuple.0 == .h264Main {
                profileConstant = kVTProfileLevel_H264_Main_AutoLevel
            } else if profileTuple.0 == .h264High {
                profileConstant = kVTProfileLevel_H264_High_AutoLevel
            }
        } else if options.codec == kCMVideoCodecType_HEVC {
            if profileTuple.0 == .hevcMain {
                profileConstant = kVTProfileLevel_HEVC_Main_AutoLevel
            } else if profileTuple.0 == .hevcMain10 {
                profileConstant = kVTProfileLevel_HEVC_Main10_AutoLevel
            }
        }

        if let profileConstant {
            err = VTSessionSetProperty(session, key: kVTCompressionPropertyKey_ProfileLevel, value: profileConstant)
            if err != noErr {
                print("Warning: VTSessionSetProperty(kVTCompressionPropertyKey_ProfileLevel) failed (\(err))")
            }
        }
    }

    if let destBitRate = options.destBitRate {
        if variableBitRateMode {
            // Specify the long-term desired variable bit rate in bits per second.
            err = VTSessionSetProperty(session, key: kVTCompressionPropertyKey_VariableBitRate, value: destBitRate as CFNumber)
            if err != noErr {
                print("Warning: VTSessionSetProperty(kVTCompressionPropertyKey_VariableBitRate) failed (\(err))")
            }

            // Set VBV maximum bit rate.
            err = VTSessionSetProperty(session, key: kVTCompressionPropertyKey_VBVMaxBitRate, value: (destBitRate * 3 / 2) as CFNumber)
            if err != noErr {
                print("Warning: VTSessionSetProperty(kVTCompressionPropertyKey_VBVMaxBitRate) failed (\(err))")
            }
        } else {
            // Specify the long-term desired average bit rate in bits per second.
            // It's a soft limit, so the encoder may overshoot or undershoot and
            // the average bit rate of the output video may be over or under the
            // target.
            err = VTSessionSetProperty(session, key: kVTCompressionPropertyKey_AverageBitRate, value: destBitRate as CFNumber)
            if err != noErr {
                print("Warning: VTSessionSetProperty(kVTCompressionPropertyKey_AverageBitRate) failed (\(err))")
            }
        }
    }
}

Encode video frames

Call VTCompressionSessionEncodeFrame(_:imageBuffer:presentationTimeStamp:duration:frameProperties:infoFlagsOut:outputHandler:) and submit each uncompressed frame to the VTCompressionSession object for encoding. The object calls the outputHandler block for each encoded frame. Check whether a frame drop or error occurs after frame encoding.

Call VTCompressionSessionCompleteFrames(_:untilPresentationTimeStamp:) to indicate to the VTCompressionSession object that the app submitted all uncompressed video frames for encoding.

Perform compression and encoding

You can use the movie file /Assets/video.m4v to test this app. Copy the file to your desktop or other working directory, and then open Terminal to that directory and run the following command, where xxx is a unique string that Xcode generates. Use autocomplete before the xxx component to complete the path for that directory.

~/Library/Developer/Xcode/DerivedData/VTEncoderForConferencing-xxx/Build/Products/Debug/VTEncoderForConferencing-Swift video.m4v

Pass the --help option for additional configuration options.

See Also

Compression