DePasqualeOrg/swift-tiktoken
A pure Swift implementation of OpenAI's tiktoken tokenizer
Motivation
FFI-based wrappers of tiktoken bundle a ~50 MB Rust binary. This library is pure Swift, resulting in a much smaller footprint. Performance is slightly slower than Rust for encoding (see Benchmarks), but decoding matches Rust speed.
Installation
Add to Package.swift:
dependencies: [
.package(url: "https://github.com/yourname/swift-tiktoken.git", from: "1.0.0")
]Usage
import SwiftTiktoken
// Load an encoding
let encoder = try await CoreBPE.cl100kBase() // GPT-3.5/4
let encoder = try await CoreBPE.o200kBase() // GPT-4o
let encoder = try await CoreBPE.forModel("gpt-4o")
// Encode
let tokens = encoder.encodeOrdinary(text: "Hello, world!")
// [9906, 11, 1917, 0]
// Decode
let text = try encoder.decode(tokens: tokens)
// "Hello, world!"
// With special tokens
let tokens = encoder.encodeWithSpecialTokens(text: "Hello<|endoftext|>")API
| Method | Description | |--------|-------------| | encodeOrdinary(text:) | Encode text to tokens | | encode(text:allowedSpecial:) | Encode with special token handling | | decode(tokens:) | Decode tokens to string | | decodeBytes(tokens:) | Decode tokens to raw bytes | | decodeWithOffsets(tokens:) | Decode with character offsets | | encodeBatch(:) | Parallel encoding (async) | | decodeBatch(:) | Parallel decoding (async) |
Supported Encodings
| Encoding | Models | |----------|--------| | cl100k_base | GPT-3.5, GPT-4 | | o200k_base | GPT-4o, GPT-4.1, GPT-5, o1, o3, o4-mini | | o200k_harmony | gpt-oss-20b, gpt-oss-120b | | p50k_base | Codex | | r50k_base | GPT-2 |
Package Metadata
Repository: DePasqualeOrg/swift-tiktoken
Stars: 3
Forks: 1
Open issues: 0
Default branch: main
Primary language: swift
License: MIT
Topics: ai, llm, llms, machine-learning, nlp, swift, tiktoken, tokenizer, tokenizers, whisper
README: README.md