mhayes853/swift-structured-grammar
Type-safe BNF grammar construction in Swift.
Overview
[Context-free Grammars](https://en.wikipedia.org/wiki/Context-free_grammar), or CFGs for short, are a formal language specification that describe how valid strings in a language are generated by repeatedly replacing symbols with other symbols or sequences. They are often used in applications like parsing, and are commonly used to describe the structure of various programming languages and DSLs.
Furthermore, in today's LLM world, they are also used to constrain a model's generated tokens to reliably guarantee that the output of a model matches a structurally defined grammar. This is how structured output works reliably in many inference engines and AI SDKs.
CFGs are often written in a notation called Backus Normal Form (BNF), or Extended Backus Normal Form (EBNF) like so:
```ebnf
root ::= expr " = " expr
term ::= [a-z0-9]
expr ::= (expr (" + " | " - ") expr) | term+ expr* | expr
```
Where this is a simple language that accepts strings of equations with basic addition and subtraction such as `"2x + 2 = 4"`.
While this is powerful, the above syntax is unfortunately not universal, and you'll often find that many parsers use their own spin of BNF/EBNF. For example, many inference engines use [GBNF](https://github.com/ggml-org/llama.cpp/blob/master/grammars/README.md), a form of EBNF that feels more regex like whilst other parsers use W3C EBNF. While there have been attempts at standards like ISO/IEC 14977, none have universally taken fruition outside of their domains.
It als goes without saying that higher-level CFG operations such as unioning and concatenation cannot easily be applied to raw EBNF syntax at runtime. Furthermore, raw EBNF syntax makes it difficult to compose together parts of grammars into a single grammar in a manner that's always guaranteed to be syntatically correct.
This package was designed to address these issues by allowing grammar construction to occur in pure Swift with a lightweight runtime surface.
### Quick Start
Use `Grammar` when you want to author a concrete grammar directly.
```swift
import StructuredCFG
let grammar = Grammar(startingSymbol: "expression") {
Rule("expression") {
Ref("term")
ZeroOrMore {
ChoiceOf {
"+"
"-"
}
Ref("term")
}
}
Rule("term") {
Ref("factor")
ZeroOrMore {
ChoiceOf {
"*"
"/"
}
Ref("factor")
}
}
Rule("factor") {
ChoiceOf {
Ref("number")
GroupExpression {
"("
Ref("expression")
")"
}
}
}
Rule("number") {
OneOrMore {
CharacterGroup.digit
}
}
}
```
You can then format the grammar to raw BNF/EBNF syntax in any of the supported formats like so.
```swift
// formatted returns a String
let gbnf = try grammar.formatted(with: .gbnf)
let w3c = try grammar.formatted(with: .w3cEbnf)
let bnf = try grammar.formatted(with: .bnf)
let iso = try grammar.formatted(with: .isoIecEbnf)
```
### Language
When you want to compose multiple grammars together through common CFG operations, you can use the `Language` struct.
```swift
import StructuredCFG
let language = Language {
Union {
Grammar(startingSymbol: .root) {
// ...
}
Grammar(startingSymbol: .root) {
// ...
}
}
ConcatenateLanguages {
Star {
Grammar(startingSymbol: .root) {
// ...
}
}
Reverse {
Grammar(startingSymbol: .root) {
// ...
}
}
}
}
let resolved = language.grammar()
let ebnf = try resolved.formatted(with: .w3cEbnf)
print(ebnf)
```
### Imperative Grammar Construction
You can also build grammars step by step imperatively. `Grammar` exposes mutating and non-mutating APIs that offer the same power as the builder syntax. This is useful if you're attempting to programatically build a grammar from another format (eg. Regex, JSON Schema, Structural Tags).
```swift
import StructuredCFG
var grammar = Grammar(startingSymbol: "expression", [])
grammar.append(
Rule("expression") {
Ref("term")
ZeroOrMore {
ChoiceOf {
"+"
"-"
}
Ref("term")
}
}
)
grammar.append(
Rule("term") {
OneOrMore {
CharacterGroup.digit
}
}
)
grammar.replaceRule(for: "term") {
Ref("number")
}
grammar.append(
Rule("number") {
OneOrMore {
CharacterGroup.digit
}
}
)
let extended = grammar.appending(
Rule("signed-number") {
OptionalExpression { "-" }
Ref("number")
}
)
```
`Language` also exposes imperative APIs.
```swift
import StructuredCFG
let digits = Grammar(Rule("digits") {
OneOrMore {
CharacterGroup.digit
}
})
let identifier = Grammar(Rule("identifier") {
OneOrMore {
CharacterGroup.word
}
})
var language = Language(digits)
language.formUnion(identifier)
language.concatenate(
Grammar(Rule("separator") {
"="
})
)
let repeated = language.starred()
let reversed = repeated.reversed()
let resolved = reversed.grammar()
let gbnf = try resolved.formatted(with: .gbnf)
print(gbnf)
```
### Components
Expression blocks inside `Rule` builder-closures must conform to the `Expression.Component` protocol. You can use this protocol to create reusable grammar components.
```swift
struct Factor: Expression.Component {
var expression: Expression {
ChoiceOf {
Ref("number")
GroupExpression {
"("
Ref("expression")
")"
}
}
}
}
let grammar = Grammar(startingSymbol: "expression") {
// ...
Rule("term") {
Factor()
ZeroOrMore {
ChoiceOf {
"*"
"/"
}
Factor()
}
}
// ...
}
```
### Custom Formats
By default, the library supports a number of built-in syntaxes, but you can also add your own by conforming to the `Grammar.StatementFormatter` protocol.
```swift
struct MyFormatter: Grammar.StatementFormatter {
func format(statement: Grammar.Statement) throws -> String {
// ...
}
}
let mySyntax = try grammar.formatted(with: MyFormatter())
```
### Usage With XGrammar
You can bridge a resolved `StructuredCFG` grammar into [`swift-xgrammar`](https://github.com/mattt/swift-xgrammar) using the following extensions.
```swift
import StructuredCFG
import XGrammar
extension XGrammar.Grammar {
init(
language: Language,
startingSymbol: Symbol = .root,
symbolResolver: some Language.GrammarSymbolResolver = .default
) throws {
try self.init(
grammar: language.grammar(
startingSymbol: startingSymbol,
symbolResolver: symbolResolver
)
)
}
init(grammar: StructuredCFG.Grammar) throws {
try self.init(
ebnf: grammar.formatted(with: .gbnf),
rootRule: grammar.startingSymbol.rawValue
)
}
}
let xgrammar = try XGrammar.Grammar(language: language)
```Installation
You can add Swift Structured Grammar to an Xcode project by adding it to your project as a package.
https://github.com/mhayes853/swift-structured-grammar
If you want to use Swift Structured Grammar in a SwiftPM project, add it to your Package.swift.
dependencies: [
.package(url: "https://github.com/mhayes853/swift-structured-grammar", from: "0.1.0")
]And then add the product to any target that needs access to the library.
.product(name: "StructuredCFG", package: "swift-structured-grammar")License
This library is licensed under an MIT License. See LICENSE for details.
Package Metadata
Repository: mhayes853/swift-structured-grammar
Default branch: main
README: README.md