davidskrundz/regex
A pure Swift implementation of a Regular Expression Engine
Usage
To avoid compiling overhead it is possible to create a Regex instance
// Compile the expression
let regex = try! Regex(pattern: "[a-zA-Z]+")
let string = "RegEx is tough, but useful."
// Search for matches
let words = regex.match(string)
/*
words = [
RegexMatch(match: "RegEx", groups: []),
RegexMatch(match: "is", groups: []),
RegexMatch(match: "tough", groups: []),
RegexMatch(match: "but", groups: []),
RegexMatch(match: "useful", groups: []),
]
*/If compiling overhead is not an issue it is possible to use the =~ operator to match a string
let fourLetterWords = "drink beer, it's very nice!" =~ "\\b\\w{4}\\b" ?? []
/*
fourLetterWords = [
RegexMatch(match: "beer", groups: []),
RegexMatch(match: "very", groups: []),
RegexMatch(match: "nice", groups: []),
]
*/By default the Global flag is active. To change which flag are active, add a / at the start of the pattern, and add /<flags> at the end. The available flags are:
gGlobal- Allows multiple matchesiCase Insensitive- Case insensitive matchingmMultiline-^and$also match the begining and end of a line
// Global and Case Insensitive search
let regex = try! Regex(pattern: "/\\w+/ig")Supported Operations
Character Classes
| Pattern | Description | Supported | |---------|------------|-----------| | . | [^\n\r] | <ul><li>[ ] </li></ul> | | [^] | [\s\S] | <ul><li>[ ] </li></ul> | | \w | [A-Za-z0-9_] | <ul><li>[ ] </li></ul> | | \W | [^A-Za-z0-9_] | <ul><li>[ ] </li></ul> | | \d | [0-9] | <ul><li>[ ] </li></ul> | | \D | [^0-9] | <ul><li>[ ] </li></ul> | | \s | [\ \r\n\t\v\f] | <ul><li>[ ] </li></ul> | | \S | [^\ \r\n\t\v\f] | <ul><li>[ ] </li></ul> | | [ABC] | Any in the set | <ul><li>[ ] </li></ul> | | [^ABC] | Any not in the set | <ul><li>[ ] </li></ul> | | [A-Z] | Any in the range inclusively | <ul><li>[ ] </li></ul> |
Anchors (Match positions not characters)
| Pattern | Description | Supported | |---------|------------|-----------| | ^ | Beginning of string | <ul><li>[ ] </li></ul> | | $ | End of string | <ul><li>[ ] </li></ul> | | \b | Word boundary | <ul><li>[ ] </li></ul> | | \B | Not word boundary | <ul><li>[ ] </li></ul> |
Escaped Characters
| Pattern | Description | Supported | |---------|------------|-----------| | \0 | Octal escaped character | <ul><li>[ ] </li></ul> | | \00 | Octal escaped character | <ul><li>[ ] </li></ul> | | \000 | Octal escaped character | <ul><li>[ ] </li></ul> | | \xFF | Hex escaped character | <ul><li>[ ] </li></ul> | | \uFFFF | Unicode escaped character | <ul><li>[ ] </li></ul> | | \cA | Control character | <ul><li>[ ] </li></ul> | | \t | Tab | <ul><li>[ ] </li></ul> | | \n | Newline | <ul><li>[ ] </li></ul> | | \v | Vertical tab | <ul><li>[ ] </li></ul> | | \f | Form feed | <ul><li>[ ] </li></ul> | | \r | Carriage return | <ul><li>[ ] </li></ul> | | \0 | Null | <ul><li>[ ] </li></ul> | | \. | . | <ul><li>[ ] </li></ul> | | \\ | \ | <ul><li>[ ] </li></ul> | | \+ | + | <ul><li>[ ] </li></ul> | | \ | | <ul><li>[ ] </li></ul> | | \? | ? | <ul><li>[ ] </li></ul> | | \^ | ^ | <ul><li>[ ] </li></ul> | | \$ | $ | <ul><li>[ ] </li></ul> | | \{ | { | <ul><li>[ ] </li></ul> | | \} | } | <ul><li>[ ] </li></ul> | | \[ | [ | <ul><li>[ ] </li></ul> | | \] | ] | <ul><li>[ ] </li></ul> | | \( | ( | <ul><li>[ ] </li></ul> | | \) | ) | <ul><li>[ ] </li></ul> | | \/ | / | <ul><li>[ ] </li></ul> | | \| | | | <ul><li>[ ] </li></ul> |
Groups and Lookaround
| Pattern | Description | Supported | |---------|------------|-----------| | (ABC) | Capture group | <ul><li>[ ] </li></ul> | | (<name>ABC) | Named capture group | <ul><li>[ ] </li></ul> | | \1 | Back reference | <ul><li>[ ] </li></ul> | | \'name' | Named back reference | <ul><li>[ ] </li></ul> | | (?:ABC) | Non-capturing group | <ul><li>[ ] </li></ul> | | (?=ABC) | Positive lookahead | <ul><li>[ ] </li></ul> | | (?!ABC) | Negative lookahead | <ul><li>[ ] </li></ul> | | (?<=ABC) | Positive lookbehind | <ul><li>[ ] </li></ul> | | (?<!ABC) | Negative lookbehing | <ul><li>[ ] </li></ul> |
Greedy Quantifiers
| Pattern | Description | Supported | |---------|------------|-----------| | + | One or more | <ul><li>[ ] </li></ul> | | | Zero or more | <ul><li>[ ] </li></ul> | | ? | Optional | <ul><li>[ ] </li></ul> | | {n} | n | <ul><li>[ ] </li></ul> | | {,} | Same as | <ul><li>[ ] </li></ul> | | {,n} | n or less | <ul><li>[ ] </li></ul> | | {n,} | n or more | <ul><li>[ ] </li></ul> | | {n,m} | n to m | <ul><li>[ ] </li></ul> |
Lazy Quantifiers
| Pattern | Description | Supported | |---------|------------|-----------| | +? | One or more | <ul><li>[ ] </li></ul> | | *? | Zero or more | <ul><li>[ ] </li></ul> | | ?? | Optional | <ul><li>[ ] </li></ul> | | {n}? | n | <ul><li>[ ] </li></ul> | | {,n}? | n or less | <ul><li>[ ] </li></ul> | | {n,}? | n or more | <ul><li>[ ] </li></ul> | | {n,m}? | n to m | <ul><li>[ ] </li></ul> |
Alternation
| Pattern | Description | Supported | |---------|------------|-----------| | \| | Everything before or everything after | <ul><li>[ ] </li></ul> |
Flags
| Pattern | Description | Supported | |---------|------------|-----------| | i | Case insensitive | <ul><li>[ ] </li></ul> | | g | Global | <ul><li>[ ] </li></ul> | | m | Multiline | <ul><li>[ ] </li></ul> |
Inner Workings
(Similar to before)
- Lexer (String input to Tokens)
- Parser (Tokens to NFA)
- Compiler (NFA to DFA)
- Optimizer (Simplify DFA (eg.
char(a), char(b)->string(ab)) for better performance) - Engine (Matches an input String using the DFA)
Note
Swift treats \r\n as a single Character. Use \n\r to have both.
Resources
- regexr.com - Regex testing
- swtch.com - Implementing Regular Expressions
- Powerset construction - NFA to DFA
- Minimization
Package Metadata
Repository: davidskrundz/regex
Default branch: master
README: README.md