Contents

jaredhowland/html-to-markdown-swift

A robust, fully featured Swift port of [html-to-markdown](https://github.com/JohannesKaufmann/html-to-markdown) — convert HTML (even entire websites) into clean, readable Markdown.

Features

  • ✅ Handles deeply nested and malformed HTML
  • ✅ Full CommonMark support
  • GitHub Flavored Markdown (GFM) — tables, task lists, strikethrough
  • ✅ Extensible plugin system — add custom renderers, pre/post processors, and text transformers
  • ✅ Domain resolution — relative links become absolute URLs
  • ✅ CSS selector–based include/exclude filtering
  • ✅ Smart escaping (only escapes when necessary)
  • ✅ Thread-safe converter instances

Usage

Swift Package Manager

Add to your Package.swift:

dependencies: [
    .package(url: "https://github.com/jaredhowland/html-to-markdown-swift.git", from: "0.9.0")
]

Add to your target:

.product(name: "HTMLToMarkdown", package: "html-to-markdown-swift")

Basic Conversion

import HTMLToMarkdown

let html = "<strong>Bold</strong> and <em>italic</em>"
let markdown = try HTMLToMarkdown.convert(html)
// **Bold** and _italic_

With Domain

Convert relative links to absolute URLs:

let html = "<a href=\"/about\">About</a>"
let markdown = try HTMLToMarkdown.convert(html, options: [.domain("https://example.com")])
// [About](https://example.com/about)

With Plugins

let markdown = try HTMLToMarkdown.convert(html, plugins: [
    BasePlugin(),
    CommonmarkPlugin(),
    GFMPlugin()
])

Collapse & Tag Types

Each HTML element has a tag typeblock, inline, or remove. This controls how whitespace and newlines are handled around elements. You can override the type for any tag:

// Treat <div> as inline instead of block
conv.Register.tagType("div", .inline, priority: PriorityEarly)

// Remove an element from output
conv.Register.tagType("nav", .remove, priority: PriorityStandard)

Plugins

| Name | Description | |------|-------------| | BasePlugin | Core functionality: default tag types, removes <script>, <style>, <input> | | CommonmarkPlugin | CommonMark spec: headings, bold, italic, links, images, code, lists, blockquotes, etc. | | GFMPlugin | GitHub Flavored Markdown: bundles Strikethrough, Table, TaskListItems + definition lists, details/summary, sub/sup, abbreviations | | TaskListItemsPlugin | Converts <input type="checkbox"> in list items to - [x] / - [ ] | | StrikethroughPlugin | Converts <strike>, <s>, <del> to ~~text~~ | | TablePlugin | Converts HTML tables to GFM-style pipe tables | | VimeoEmbedPlugin | Converts Vimeo <iframe> embeds to Title links | | YouTubeEmbedPlugin | Converts YouTube <iframe> embeds to clickable thumbnail images | | AtlassianPlugin | Atlassian/Confluence: autolinks, image sizing, Confluence code macros, attachment links | | MultiMarkdownPlugin | MultiMarkdown 4: sub/sup, definition lists, image attributes, figure/figcaption, footnotes | | MarkdownExtraPlugin | PHP Markdown Extra: definition lists, footnotes, header IDs {#id}, abbreviation reference list | | PandocPlugin | Pandoc Markdown: LaTeX math ($...$, $$...$$), definition lists, footnotes, sub/sup ^x^/~x~, header IDs | | RMarkdownPlugin | R Markdown (extends Pandoc): tabsets → ## sections, figure captions from <figcaption> | | FrontmatterPlugin | Extracts page metadata (<title>, <meta>) and prepends YAML frontmatter | | TypographyPlugin | Bundles SmartQuotesPlugin, ReplacementsPlugin, LinkifyPlugin; configure with smartQuotes/replacements/linkify flags and quoteStyle (.english, .german, .french, .swedish) | | SmartQuotesPlugin | Converts straight " and ' to typographic quotes; locale-aware styles; skips code regions; handles <q> elements | | ReplacementsPlugin | (c)©, (r)®, (tm), +-±, ..., ---, --; skips code regions | | LinkifyPlugin | Converts bare https:///http:// URLs to url links; handles parentheses in URLs; skips code regions and existing Markdown links | | ReferenceLinkPlugin | Numbered reference-style links at document bottom (deduplication, titles); inlineLinks: true to revert to inline | | EmojiPlugin | GitHub emoji :shortcode: output from <img class="emoji"> and Unicode emoji conversion; bundled 1900+ entry table |

Writing a Plugin

Implement the Plugin protocol:

import HTMLToMarkdown

public class MyPlugin: Plugin {
    public var name: String { return "my-plugin" }
    public init() {}

    public func initialize(conv: Converter) throws {
        // Render <aside> as a blockquote
        conv.Register.rendererFor("aside", .block, { ctx, w, node in
            w.writeString("> ")
            ctx.renderChildNodes(w, node)
            return .success
        }, priority: PriorityStandard)

        // Pre-process the DOM before rendering
        conv.Register.preRenderer({ ctx, doc in
            // Modify the SwiftSoup document
        }, priority: PriorityEarly)

        // Post-process the final markdown string
        conv.Register.postRenderer({ ctx, result in
            return result.trimmingCharacters(in: .whitespacesAndNewlines)
        }, priority: PriorityStandard)

        // Bundle another plugin as a dependency
        try conv.Register.plugin(CommonmarkPlugin())
    }
}

Available registration methods:

| Method | Purpose | |--------|---------| | rendererFor( tagName: String, tagType: TagType, fn: @escaping HandleRenderFunc, priority: Int) | Render a specific HTML tag | | renderer( fn: @escaping HandleRenderFunc, priority: Int) | Catch-all renderer for all tags | | preRenderer( fn: @escaping HandlePreRenderFunc, priority: Int) | Transform DOM before rendering | | postRenderer( fn: @escaping HandlePostRenderFunc, priority: Int) | Transform final markdown string | | textTransformer( fn: @escaping HandleTextTransformFunc, priority: Int) | Transform text node content | | escapedChar( chars: Character...) | Mark a character as needing escaping | | unEscaper( fn: @escaping HandleUnEscapeFunc, priority: Int) | Control when a character is unescaped | | tagType( tagName: String, type: TagType, priority: Int) | Override block/inline/remove classification | | plugin( p: Plugin) throws | Register a sub-plugin dependency |

Examples

These examples were generated using html-to-markdown-swift 0.9.0. See the folders under Examples/ for runnable sample code and their output.

See the Examples/ directory for complete runnable examples:

FAQ

Can I extend the converter with custom rules? Yes — implement the Plugin protocol and register renderers, pre/post processors, or text transformers in initialize(conv:).

Is the output safe to display in a browser? This library converts HTML to Markdown — it does not sanitize HTML. If you need XSS protection, sanitize the input HTML before conversion or the output Markdown before rendering.

Is it thread-safe? Yes. Each Converter instance is protected by an internal lock and safe for concurrent use from multiple threads.

Why does my [ get escaped as \[? The converter automatically escapes characters that would trigger unintended Markdown formatting. If you're writing a custom renderer, use w.writeString(...) directly (bypasses text transformation) instead of writing to a child context.

How do I run the tests?

swift test

Many tests use golden files in Tests/data/ — an input HTML file and an expected Markdown output file. To update golden files after intentional output changes, update the .out.md files accordingly.

How do I contribute? Issues and pull requests are welcome. Please ensure all tests pass (swift test) and add tests for new behaviour.

License

MIT License. This Swift port is based on html-to-markdown by Johannes Kaufmann. HTML parsing uses SwiftSoup.

Package Metadata

Repository: jaredhowland/html-to-markdown-swift

Default branch: main

README: README.md