Contents

1amageek/app-mcp

> **AI-Powered macOS Application Automation via Model Context Protocol**

โœจ Features

๐ŸŽฏ Visual Intelligence

  • Smart Screenshots: Capture high-resolution app windows using ScreenCaptureKit
  • OCR Text Recognition: Extract text from screenshots using Apple's Vision Framework
  • UI Tree Analysis: Extract detailed accessibility hierarchies for precise element targeting
  • Multi-App Discovery: Identify and monitor multiple running applications simultaneously

๐Ÿ›  Automation Toolkit

  • Precise Interactions: Mouse clicks, keyboard input, and gesture automation
  • Smart Waiting: Intelligent delays and condition-based waiting mechanisms
  • Error Recovery: Robust fallback strategies for reliable automation

๐Ÿ”’ Privacy & Security

  • Permission Management: Seamless TCC (Transparency, Consent, and Control) integration
  • Secure Communication: JSON-RPC over STDIO with structured error handling
  • Bundle ID Validation: Verified application targeting for enhanced security

๐Ÿš€ Quick Start

Prerequisites

  • macOS 15.0+ (Sequoia or later)
  • Swift 6.1+
  • Xcode 16.0+

Installation

# Clone the repository
git clone https://github.com/your-username/AppMCP.git
cd AppMCP

# Build the project
swift build -c release

# Run the daemon
./.build/release/appmcpd --stdio

Permissions Setup

AppMCP requires the following macOS permissions:

  1. ๐Ÿ”“ Accessibility: System Preferences โ†’ Privacy & Security โ†’ Accessibility
  2. ๐Ÿ“บ Screen Recording: System Preferences โ†’ Privacy & Security โ†’ Screen Recording

The application will guide you through the permission setup process.


๐ŸŽฎ Usage Examples

Weather App Automation

import json
import subprocess

# Start AppMCP server
process = subprocess.Popen(['./appmcpd', '--stdio'])

# Take screenshot of Weather app
request = {
    "jsonrpc": "2.0",
    "id": 1,
    "method": "resources/read",
    "params": {"uri": "app://app_screenshot"}
}

# Send request and get response
response = send_mcp_request(request)
print(f"๐Ÿ“ธ Screenshot captured: {response['result']['contents'][0]['text']}")

UI Element Discovery

# Get accessibility tree
request = {
    "jsonrpc": "2.0",
    "id": 2,
    "method": "resources/read",
    "params": {"uri": "app://app_accessibility_tree"}
}

tree = send_mcp_request(request)
print(f"๐ŸŒณ UI Elements: {tree['result']['contents'][0]['text']}")

Automated Interactions

# Click on coordinates
request = {
    "jsonrpc": "2.0",
    "id": 3,
    "method": "tools/call",
    "params": {
        "name": "mouse_click",
        "arguments": {"x": 300, "y": 150}
    }
}

# Type text
request = {
    "jsonrpc": "2.0",
    "id": 4,
    "method": "tools/call",
    "params": {
        "name": "type_text",
        "arguments": {"text": "Tokyo"}
    }
}

๐Ÿ— Architecture

graph TB
    A[๐Ÿค– AI Model] -->|JSON-RPC| B[๐Ÿ“ก MCP Server]
    B --> C[๐Ÿ” Resources]
    B --> D[๐Ÿ›  Tools]
    
    C --> E[๐Ÿ“ฑ App Screenshot]
    C --> F[๐ŸŒณ Accessibility Tree]
    C --> G[๐Ÿ“‹ Running Apps]
    
    D --> H[๐Ÿ–ฑ Mouse Control]
    D --> I[โŒจ๏ธ Keyboard Input]
    D --> J[โฑ Wait Functions]
    
    E --> K[๐Ÿ“ฑ macOS Apps]
    F --> K
    G --> K
    H --> K
    I --> K
    J --> K

Core Components

| Component | Description | Technology | |-----------|-------------|------------| | ๐ŸŽฏ AppSelector | Application discovery and targeting | AppKit, NSWorkspace | | ๐Ÿ“ธ ScreenCaptureProvider | High-quality screenshot capture | ScreenCaptureKit (macOS 15+) | | ๐ŸŒณ AppAXTreeProvider | Accessibility tree extraction | Accessibility API | | ๐Ÿ–ฑ MouseClickTool | Precise mouse automation | CGEvent, Quartz | | โŒจ๏ธ KeyboardTool | Text input and shortcuts | CGEvent, Carbon | | ๐Ÿ”’ TCCManager | Permission management | TCC Framework |


๐Ÿ“ฆ Package Structure

AppMCP/
โ”œโ”€โ”€ ๐Ÿ“ Sources/
โ”‚   โ””โ”€โ”€ ๐Ÿ“ AppMCP/
โ”‚       โ”œโ”€โ”€ ๐ŸŽฏ AppMCP.swift          # Core protocols & types
โ”‚       โ”œโ”€โ”€ ๐Ÿ–ฅ MCPServer.swift        # Main MCP server
โ”‚       โ”œโ”€โ”€ ๐Ÿ“ Resources/            # Data providers
โ”‚       โ”œโ”€โ”€ ๐Ÿ“ Tools/                # Automation tools
โ”‚       โ””โ”€โ”€ ๐Ÿ“ Permissions/          # Security management
โ”œโ”€โ”€ ๐Ÿ“ Sources/appmcpd/
โ”‚   โ””โ”€โ”€ ๐Ÿš€ Command.swift             # CLI daemon
โ”œโ”€โ”€ ๐Ÿ“ Tests/
โ”‚   โ””โ”€โ”€ ๐Ÿ“ AppMCPTests/              # Comprehensive test suite
โ”œโ”€โ”€ ๐Ÿ“‹ Package.swift                 # Swift Package configuration
โ””โ”€โ”€ ๐Ÿ“– CLAUDE.md                     # Development guidelines

๐Ÿงช Testing

Run All Tests

swift test

Test Categories

  • ๐Ÿ”ง Unit Tests: Core functionality validation
  • ๐Ÿ”— Integration Tests: End-to-end workflow testing
  • โšก Performance Tests: Response time benchmarking
  • ๐Ÿ›ก Security Tests: Permission and validation checks

Example Test Results

Test Suite 'AppMCPTests' passed at 2025-06-04 16:42:04.049
    Executed 19 tests, with 0 failures (0 unexpected) in 0.015 seconds
โœ… All tests passing

๐Ÿ›  API Reference

MCP Tools

AppMCP provides the following specialized tools for macOS automation:

Screenshot & UI Analysis
  • capture_ui_snapshot: Capture screenshot with UI element hierarchy

- Optional text recognition via Vision Framework - Element filtering with queries - Returns base64 screenshot + structured UI data

  • recognize_text_in_screenshot: ๐Ÿ†• OCR text extraction from app windows

- Multi-language support (en-US, ja-JP, zh-Hans, etc.) - Fast vs accurate recognition modes - Confidence scores and bounding boxes

Automation Controls
  • click_element: Element-based clicking with multi-button support
  • input_text: Text input with setValue/type methods
  • drag_drop: Drag and drop between elements
  • scroll_window: Scrolling at specific element locations
App Discovery
  • list_running_applications: Get all running apps with metadata
  • list_application_windows: List windows with bounds and visibility

Text Recognition Features

The Vision Framework integration provides powerful OCR capabilities:

{
  "bundleID": "com.apple.TextEdit",
  "includeTextRecognition": true,
  "recognitionLanguages": ["en-US", "ja-JP"],
  "recognitionLevel": "accurate"
}

Recognition Results:

  • Full text extraction in reading order
  • Individual text regions with confidence scores
  • Bounding boxes in normalized coordinates
  • Support for 50+ languages
  • Handwritten text detection

๐ŸŽฏ Roadmap

๐ŸŒŸ Current (v1.0.0)

  • [x] Weather app automation PoC
  • [x] Basic screenshot & UI tree extraction
  • [x] Mouse & keyboard automation
  • [x] Permission management
  • [x] Vision Framework OCR text recognition

๐Ÿš€ Near Future (v0.2.0)

  • [ ] Multi-app simultaneous control
  • [ ] DevTools integration
  • [ ] Enhanced error recovery
  • [ ] Performance optimizations

๐Ÿ”ฎ Long Term (v1.0.0)

  • [ ] HTTP transport support
  • [ ] Shortcuts.app integration
  • [ ] Plugin SDK for extensions
  • [ ] Real-time UI streaming

๐Ÿค Contributing

We welcome contributions! Please see our Contributing Guidelines for details.

Development Setup

# Install dependencies
swift package resolve

# Run tests
swift test

# Format code
swift-format -i -r Sources/ Tests/

# Build for development
swift build

๐Ÿ“„ License

AppMCP is released under the MIT License. See LICENSE for details.


๐Ÿ™ Acknowledgments

  • Model Context Protocol - For the excellent MCP Swift SDK
  • Apple Developer Team - For the powerful macOS automation APIs
  • Swift Community - For the robust Swift ecosystem

<div align="center">

Built with โค๏ธ for the AI automation community

๐Ÿ“– Documentation โ€ข ๐Ÿ› Issues โ€ข ๐Ÿ’ฌ Discussions

</div>

Package Metadata

Repository: 1amageek/app-mcp

Stars: 21

Forks: 2

Open issues: 0

Default branch: main

Primary language: swift

README: README.md