m1guelpf/swift-realtime-openai

This library provides a simple interface for implementing multi-modal conversations using OpenAI's new Realtime API.

Installation

Swift Package Manager

The Swift Package Manager allows for developers to easily integrate packages into their Xcode projects and packages; and is also fully integrated into the swift compiler.

SPM Through XCode Project

File > Swift Packages > Add Package Dependency
Add https://github.com/m1guelpf/swift-realtime-openai.git
Select "Branch" with "main"

SPM Through Xcode Package

Once you have your Swift package set up, add the Git link within the dependencies value of your Package.swift file.

dependencies: [
    .package(url: "https://github.com/m1guelpf/swift-realtime-openai.git", .branch("main"))
]

Getting started 🚀

You can build an iMessage-like app with built-in AI chat in less than 60 lines of code (UI included!):

import SwiftUI
import RealtimeAPI

struct ContentView: View {
	@State private var newMessage: String = ""
	@State private var conversation = try! Conversation()

	var messages: [Item.Message] {
		conversation.entries.compactMap { switch $0 {
			case let .message(message): return message
			default: return nil
		} }
	}

	var body: some View {
		VStack(spacing: 0) {
			ScrollView {
                VStack(spacing: 12) {
                    ForEach(messages, id: \.id) { message in
                        MessageBubble(message: message)
                    }
                }
                .padding()
			}

			HStack(spacing: 12) {
				HStack {
					TextField("Chat", text: $newMessage, onCommit: { sendMessage() })
						.frame(height: 40)
						.submitLabel(.send)

					if newMessage != "" {
						Button(action: sendMessage) {
							Image(systemName: "arrow.up.circle.fill")
								.resizable()
								.aspectRatio(contentMode: .fill)
								.frame(width: 28, height: 28)
								.foregroundStyle(.white, .blue)
						}
					}
				}
				.padding(.leading)
				.padding(.trailing, 6)
				.overlay(RoundedRectangle(cornerRadius: 20).stroke(.quaternary, lineWidth: 1))
			}
			.padding()
		}
		.navigationTitle("Chat")
		.navigationBarTitleDisplayMode(.inline)
		.task { try! await conversation..connect(ephemeralKey: YOUR_EPHEMERAL_KEY_HERE) }
	}

	func sendMessage() {
		guard newMessage != "" else { return }

		Task {
			try await conversation.send(from: .user, text: newMessage)
			newMessage = ""
		}
	}
}

Or, if you just want a simple app that lets the user talk and the AI respond:

import SwiftUI
import RealtimeAPI

struct ContentView: View {
	@State private var conversation = try! Conversation()

	var body: some View {
		Text("Say something!")
			.task { try! await conversation..connect(ephemeralKey: YOUR_EPHEMERAL_KEY_HERE) }
	}
}

Architecture

`Conversation`

The Conversation class provides a high-level interface for managing a conversation with the model. It wraps the RealtimeAPI class and handles the details of sending and receiving messages, managing the conversation history, recording the user's mic, and playing model responses as they stream in.

Reading messages

You can access the messages in the conversation through the messages property. Note that this won't include function calls and its responses, only the messages between the user and the model. To access the full conversation history, use the entries property. For example:

ScrollView {
    ScrollViewReader { scrollView in
        VStack(spacing: 12) {
            ForEach(conversation.messages, id: \.id) { message in
                MessageBubble(message: message).id(message.id)
            }
        }
        .onReceive(conversation.messages.publisher) { _ in
            withAnimation { scrollView.scrollTo(conversation.messages.last?.id, anchor: .center) }
        }
    }
}

Customizing the session

You can customize the current session using the setSession(: Session) or updateSession(withChanges: (inout Session) -> Void) methods. Note that they requires that a session has already been established, so it's recommended you call them from a whenConnected(: @Sendable () async throws -> Void) callback or await waitForConnection() first. For example:

try await conversation.whenConnected {
    try await conversation.updateSession { session in
        // update system prompt
        session.instructions = "You are a helpful assistant."

        // enable transcription of users' voice messages
        session.inputAudioTranscription = Session.InputAudioTranscription()

        // ...
    }
}

Manually sending messages

To send a text message, call the send(from: Item.ItemRole, text: String, response: Response.Config? = nil) providing the role of the sender (.user, .assistant, or .system) and the contents of the message. You can optionally also provide a Response.Config object to customize the response, such as enabling or disabling function calls.

To manually send an audio message (or part of one), call the send(audioDelta: Data, commit: Bool = false) with a valid audio chunk. If commit is true, the model will consider the message finished and begin responding to it. Otherwise, it might wait for more audio depending on your Session.turnDetection settings.

Manually sending events

To manually send an event to the API, use the send(event: RealtimeAPI.ClientEvent) method. Note that this bypasses some of the logic in the Conversation class such as handling interrupts, so you should prefer to use other methods whenever possible.

`RealtimeAPI`

To interact with the API directly, create a new instance of RealtimeAPI providing one of the available connectors. There are helper methods that let you create an instance from an apiKey or a URLRequest, like so:

let api = RealtimeAPI.webRTC(ephemeralKey: YOUR_EPHEMERAL_KEY, model: .gptRealtime) // or RealtimeAPI.webRTC(connectingTo: URLRequest)
let api = RealtimeAPI.webSocket(authToken: YOUR_OPENAI_API_KEY, model: .gptRealtime) // or RealtimeAPI.webSocket(connectingTo: URLRequest)

You can listen for new events through the events property, like so:

for try await event in api.events {
    switch event {
        case let .sessionCreated(event):
            print(event.session.id)
    }
}

To send an event to the API, call the send method with a ClientEvent instance:

try await api.send(event: .updateSession(session))
try await api.send(event: .appendInputAudioBuffer(encoding: audioData))
try await api.send(event: .createResponse())

License

This project is licensed under the MIT License - see the LICENSE file for details.

Package Metadata

Repository: m1guelpf/swift-realtime-openai

Default branch: main

README: README.md