alfianlosari/GPTEncoder
Swift BPE Encoder/Decoder for OpenAI GPT Models. A programmatic interface for tokenizing text for OpenAI ChatGPT API.
Supported Platforms
- iOS/macOS/watchOS/tvOS
- Linux
Installation
Swift Package Manager
- File > Swift Packages > Add Package Dependency
- Add - Add https://github.com/alfianlosari/GPTEncoder.git
Cocoapods
platform :ios, '15.0'
use_frameworks!
target 'MyApp' do
pod 'GPTEncoder', '~> 1.0.3'
endUsage
let encoder = SwiftGPTEncoder()
let str = "The GPT family of models process text using tokens, which are common sequences of characters found in text."
let encoded = encoder.encode(text: str)
print("String: \(str)")
print("Encoded this string looks like: \(encoded)")
print("Total number of token(s): \(encoded.count) and character(s): \(str.count)")
print("We can look at each token and what it represents")
encoded.forEach { print("Token: \(encoder.decode(tokens: [$0]))") }
print(encoded)
let decoded = encoder.decode(tokens: encoded)
print("We can decode it back into:\n\(decoded)")Encode
To encode a String to array of Int tokens, you can simply invoke encode passing the string.
let encoded = encoder.encode(text: "The GPT family of models process text using tokens, which are common sequences of characters found in text.")
// Output: [464, 402, 11571, 1641, 286, 4981, 1429, 2420, 1262, 16326, 11, 543, 389, 2219, 16311, 286, 3435, 1043, 287, 2420, 13]Decode
To decode an array of Int tokens back to the String you can invoke decode passing the tokens array.
let decoded = encoder.decode(tokens: [464, 402, 11571, 1641, 286, 4981, 1429, 2420, 1262, 16326, 11, 543, 389, 2219, 16311, 286, 3435, 1043, 287, 2420, 13])
// Output: "The GPT family of models process text using tokens, which are common sequences of characters found in text."Clear Cache
Internally, a cache is used to improve performance when encoding the tokens, you can reset the cache as well.
encoder.clearCache()Package Metadata
Repository: alfianlosari/GPTEncoder
Stars: 85
Forks: 20
Open issues: 6
Default branch: main
Primary language: swift
License: MIT
Topics: chatgpt, encoder-decoder, gpt, gpt-3, gpt4, openai, swift, tokenizer
README: README.md