paradigms-of-intelligence/swift-gemini-api
A flexible Swift package for interacting with Google's Gemini API. This library supports both the standard **REST API** (Multimodal generation, Text-to-Speech, Vision) and the **Gemini Live API** (Real-time bidirectional WebSocket streaming).
Features
- Gemini Live Client: Full WebSocket implementation for real-time, bidirectional audio and text interaction with Gemini.
Real-time audio streaming (Input & Output). Voice Activity Detection (VAD) configuration. Tool/Function calling support. Google Search integration. * Customizable voices and system prompts.
- Gemini REST API: A structured facade for standard request/response interactions.
Text: Chat and text generation. Speech: Text-to-Speech (TTS) synthesis. Vision/Audio/Video: Multimodal analysis. Files: Media upload management.
- Swift Concurrency: Built using
async/awaitand Actors for thread safety. - Observation: Uses the
@Observablemacro for seamless SwiftUI integration.
Requirements
- iOS: 14.0+
- macOS: 15.0+
- Swift: 6.0+
Installation
Swift Package Manager
Add the following to your Package.swift file:
dependencies: [
.package(url: "https://github.com/paradigms-of-intelligence/swift-gemini-api.git", from: "1.0.0")
]Demo app
Locate the Example Xcode project in the root folder
[root project folder]
Launch it in Xcode and make sure correct project is selected before you click the run button
[Xcode]
Setup the Gemini API Key
[Click on the red key icon]
Enter Gemini API Key
[Enter Gemini API Key]
Click Save
[Click Save]
Click Start Gemini Live Session
If key is setup correctly you will see Blue Key instead of a Red Key
[Start Gemini Live Session]
Successfully connected
If they Gemini API key was set correctly and valid, you will see the interface connect successfully.
To start streaming the voice (microphone audio input), you will need to click on Start Recording which will prompt you to confirm privacy access to your microphone, once it's confirmed, you can speak to the Gemini from Swift 🎉
[Gemini Live Interface]
Layout
Top red pane is Live API, bottom blue pane is REST API.
[live and rest APIs layout]
Usage
1. Gemini Live (Real-time WebSocket)
The GeminiLiveClient handles persistent connections for continuous conversation.
Note: Audio playback is 24KHz, and audio input should be in 16KHz.
Initialization
import swift_gemini_api
// Initialize the client
let liveClient = GeminiLiveClient(
model: "gemini-live-2.5-flash-preview", // <-- set current Gemini model
systemPrompt: "You are a helpful AI assistant.",
voice: .KORE,
enableGoogleSearch: true,
input_audio_transcription: true,
output_audio_transcription: true
)
Handling Events
Set up closures to handle incoming data, transcriptions, and tool calls before connecting.
// Handle text coming back from the model (what the AI is saying)
liveClient.setOutputTranscription { text in
print("AI: \(text)")
}
// Handle transcription of what the user said
liveClient.setInputTranscription { text in
print("User: \(text)")
}
// Handle connection setup
liveClient.onSetupComplete = { success in
if success {
print("Connected and ready to talk!")
}
}Connecting
liveClient.connect(apiKey: "YOUR_API_KEY")Sending Input
You can send text or raw audio data (PCM).
// Send text
liveClient.sendTextPrompt("Hello, how are you?")
// Send Audio (e.g., from a microphone buffer)
// Expects Base64 string or Data
liveClient.sendAudio(data: pcmData)Function Calling (Tools)
You can define tools that the AI can request to execute.
// 1. Define the tool definition
let lightTool = [
"name": "turn_on_lights",
"description": "Turns on the lights in the room.",
"parameters": [
"type": "object",
"properties": [:], // Add properties if needed
]
]
liveClient.addFunctionDeclarations(lightTool)
// 2. Handle the callback
liveClient.setToolCall { name, id, args in
if name == "turn_on_lights" {
// Execute your native code here
print("Lights turned on!")
// 3. Send the result back to Gemini
let response = ["status": "success"]
liveClient.sendFunctionResponse(id, response: response)
}
}2. Gemini API (REST)
Use GeminiAPI for single-turn requests or media processing.
import swift_gemini_api
let api = GeminiAPI(apiKey: "YOUR_API_KEY")Text Generation
let response = try await api.text.generateText("Explain Quantum Computing")Text-to-Speech
let audioData = try await api.speech.synthesizeSpeech("Hello world")
// Returns PCM Data ready for playbackLicense
Package Metadata
Repository: paradigms-of-intelligence/swift-gemini-api
Default branch: main
README: README.md