Contents

alfianlosari/GPTEncoder

Swift BPE Encoder/Decoder for OpenAI GPT Models. A programmatic interface for tokenizing text for OpenAI ChatGPT API.

Supported Platforms

  • iOS/macOS/watchOS/tvOS
  • Linux

Installation

Swift Package Manager

  • File > Swift Packages > Add Package Dependency
  • Add - Add https://github.com/alfianlosari/GPTEncoder.git

Cocoapods

platform :ios, '15.0'
use_frameworks!

target 'MyApp' do
  pod 'GPTEncoder', '~> 1.0.3'
end

Usage

let encoder = SwiftGPTEncoder()

let str = "The GPT family of models process text using tokens, which are common sequences of characters found in text."
let encoded = encoder.encode(text: str)
print("String: \(str)")
print("Encoded this string looks like: \(encoded)")
print("Total number of token(s): \(encoded.count) and character(s): \(str.count)")

print("We can look at each token and what it represents")
encoded.forEach { print("Token: \(encoder.decode(tokens: [$0]))") }
print(encoded)

let decoded = encoder.decode(tokens: encoded)
print("We can decode it back into:\n\(decoded)")

Encode

To encode a String to array of Int tokens, you can simply invoke encode passing the string.

let encoded = encoder.encode(text: "The GPT family of models process text using tokens, which are common sequences of characters found in text.")
// Output: [464, 402, 11571, 1641, 286, 4981, 1429, 2420, 1262, 16326, 11, 543, 389, 2219, 16311, 286, 3435, 1043, 287, 2420, 13]

Decode

To decode an array of Int tokens back to the String you can invoke decode passing the tokens array.

let decoded = encoder.decode(tokens: [464, 402, 11571, 1641, 286, 4981, 1429, 2420, 1262, 16326, 11, 543, 389, 2219, 16311, 286, 3435, 1043, 287, 2420, 13])
// Output: "The GPT family of models process text using tokens, which are common sequences of characters found in text."

Clear Cache

Internally, a cache is used to improve performance when encoding the tokens, you can reset the cache as well.

encoder.clearCache()

Package Metadata

Repository: alfianlosari/GPTEncoder

Stars: 85

Forks: 20

Open issues: 6

Default branch: main

Primary language: swift

License: MIT

Topics: chatgpt, encoder-decoder, gpt, gpt-3, gpt4, openai, swift, tokenizer

README: README.md