jaredhowland/html-to-markdown-swift
A robust, fully featured Swift port of [html-to-markdown](https://github.com/JohannesKaufmann/html-to-markdown) — convert HTML (even entire websites) into clean, readable Markdown.
Features
- ✅ Handles deeply nested and malformed HTML
- ✅ Full CommonMark support
- ✅ GitHub Flavored Markdown (GFM) — tables, task lists, strikethrough
- ✅ Extensible plugin system — add custom renderers, pre/post processors, and text transformers
- ✅ Domain resolution — relative links become absolute URLs
- ✅ CSS selector–based include/exclude filtering
- ✅ Smart escaping (only escapes when necessary)
- ✅ Thread-safe converter instances
Usage
Swift Package Manager
Add to your Package.swift:
dependencies: [
.package(url: "https://github.com/jaredhowland/html-to-markdown-swift.git", from: "0.9.0")
]Add to your target:
.product(name: "HTMLToMarkdown", package: "html-to-markdown-swift")Basic Conversion
import HTMLToMarkdown
let html = "<strong>Bold</strong> and <em>italic</em>"
let markdown = try HTMLToMarkdown.convert(html)
// **Bold** and _italic_With Domain
Convert relative links to absolute URLs:
let html = "<a href=\"/about\">About</a>"
let markdown = try HTMLToMarkdown.convert(html, options: [.domain("https://example.com")])
// [About](https://example.com/about)With Plugins
let markdown = try HTMLToMarkdown.convert(html, plugins: [
BasePlugin(),
CommonmarkPlugin(),
GFMPlugin()
])Collapse & Tag Types
Each HTML element has a tag type — block, inline, or remove. This controls how whitespace and newlines are handled around elements. You can override the type for any tag:
// Treat <div> as inline instead of block
conv.Register.tagType("div", .inline, priority: PriorityEarly)
// Remove an element from output
conv.Register.tagType("nav", .remove, priority: PriorityStandard)Plugins
| Name | Description | |------|-------------| | BasePlugin | Core functionality: default tag types, removes <script>, <style>, <input> | | CommonmarkPlugin | CommonMark spec: headings, bold, italic, links, images, code, lists, blockquotes, etc. | | GFMPlugin | GitHub Flavored Markdown: bundles Strikethrough, Table, TaskListItems + definition lists, details/summary, sub/sup, abbreviations | | TaskListItemsPlugin | Converts <input type="checkbox"> in list items to - [x] / - [ ] | | StrikethroughPlugin | Converts <strike>, <s>, <del> to ~~text~~ | | TablePlugin | Converts HTML tables to GFM-style pipe tables | | VimeoEmbedPlugin | Converts Vimeo <iframe> embeds to Title links | | YouTubeEmbedPlugin | Converts YouTube <iframe> embeds to clickable thumbnail images | | AtlassianPlugin | Atlassian/Confluence: autolinks, image sizing, Confluence code macros, attachment links | | MultiMarkdownPlugin | MultiMarkdown 4: sub/sup, definition lists, image attributes, figure/figcaption, footnotes | | MarkdownExtraPlugin | PHP Markdown Extra: definition lists, footnotes, header IDs {#id}, abbreviation reference list | | PandocPlugin | Pandoc Markdown: LaTeX math ($...$, $$...$$), definition lists, footnotes, sub/sup ^x^/~x~, header IDs | | RMarkdownPlugin | R Markdown (extends Pandoc): tabsets → ## sections, figure captions from <figcaption> | | FrontmatterPlugin | Extracts page metadata (<title>, <meta>) and prepends YAML frontmatter | | TypographyPlugin | Bundles SmartQuotesPlugin, ReplacementsPlugin, LinkifyPlugin; configure with smartQuotes/replacements/linkify flags and quoteStyle (.english, .german, .french, .swedish) | | SmartQuotesPlugin | Converts straight " and ' to typographic quotes; locale-aware styles; skips code regions; handles <q> elements | | ReplacementsPlugin | (c)→©, (r)→®, (tm)→™, +-→±, ...→…, ---→—, --→–; skips code regions | | LinkifyPlugin | Converts bare https:///http:// URLs to url links; handles parentheses in URLs; skips code regions and existing Markdown links | | ReferenceLinkPlugin | Numbered reference-style links at document bottom (deduplication, titles); inlineLinks: true to revert to inline | | EmojiPlugin | GitHub emoji :shortcode: output from <img class="emoji"> and Unicode emoji conversion; bundled 1900+ entry table |
Writing a Plugin
Implement the Plugin protocol:
import HTMLToMarkdown
public class MyPlugin: Plugin {
public var name: String { return "my-plugin" }
public init() {}
public func initialize(conv: Converter) throws {
// Render <aside> as a blockquote
conv.Register.rendererFor("aside", .block, { ctx, w, node in
w.writeString("> ")
ctx.renderChildNodes(w, node)
return .success
}, priority: PriorityStandard)
// Pre-process the DOM before rendering
conv.Register.preRenderer({ ctx, doc in
// Modify the SwiftSoup document
}, priority: PriorityEarly)
// Post-process the final markdown string
conv.Register.postRenderer({ ctx, result in
return result.trimmingCharacters(in: .whitespacesAndNewlines)
}, priority: PriorityStandard)
// Bundle another plugin as a dependency
try conv.Register.plugin(CommonmarkPlugin())
}
}Available registration methods:
| Method | Purpose | |--------|---------| | rendererFor( tagName: String, tagType: TagType, fn: @escaping HandleRenderFunc, priority: Int) | Render a specific HTML tag | | renderer( fn: @escaping HandleRenderFunc, priority: Int) | Catch-all renderer for all tags | | preRenderer( fn: @escaping HandlePreRenderFunc, priority: Int) | Transform DOM before rendering | | postRenderer( fn: @escaping HandlePostRenderFunc, priority: Int) | Transform final markdown string | | textTransformer( fn: @escaping HandleTextTransformFunc, priority: Int) | Transform text node content | | escapedChar( chars: Character...) | Mark a character as needing escaping | | unEscaper( fn: @escaping HandleUnEscapeFunc, priority: Int) | Control when a character is unescaped | | tagType( tagName: String, type: TagType, priority: Int) | Override block/inline/remove classification | | plugin( p: Plugin) throws | Register a sub-plugin dependency |
Examples
These examples were generated using html-to-markdown-swift 0.9.0. See the folders under Examples/ for runnable sample code and their output.
See the Examples/ directory for complete runnable examples:
- 01 - Basic Conversion
- 02 - Vita with Frontmatter
- 03 - Wikipedia Article
- 04 - Exclude Navigation
- 05 - Custom Plugin
- 06 - GFM Features
- 07 - Atlassian Markdown
- 08 - MultiMarkdown
- 09 - YouTube & Vimeo Embeds
- 10 - Atlassian Confluence
- 11 - Markdown Extra
- 12 - Pandoc
- 13 - R Markdown
- 14 - Typography
- 15 - Reference Links
- 16 - Emoji
FAQ
Can I extend the converter with custom rules? Yes — implement the Plugin protocol and register renderers, pre/post processors, or text transformers in initialize(conv:).
Is the output safe to display in a browser? This library converts HTML to Markdown — it does not sanitize HTML. If you need XSS protection, sanitize the input HTML before conversion or the output Markdown before rendering.
Is it thread-safe? Yes. Each Converter instance is protected by an internal lock and safe for concurrent use from multiple threads.
Why does my [ get escaped as \[? The converter automatically escapes characters that would trigger unintended Markdown formatting. If you're writing a custom renderer, use w.writeString(...) directly (bypasses text transformation) instead of writing to a child context.
How do I run the tests?
swift testMany tests use golden files in Tests/data/ — an input HTML file and an expected Markdown output file. To update golden files after intentional output changes, update the .out.md files accordingly.
How do I contribute? Issues and pull requests are welcome. Please ensure all tests pass (swift test) and add tests for new behaviour.
License
MIT License. This Swift port is based on html-to-markdown by Johannes Kaufmann. HTML parsing uses SwiftSoup.
Package Metadata
Repository: jaredhowland/html-to-markdown-swift
Default branch: main
README: README.md