1amageek/app-mcp
> **AI-Powered macOS Application Automation via Model Context Protocol**
โจ Features
๐ฏ Visual Intelligence
- Smart Screenshots: Capture high-resolution app windows using ScreenCaptureKit
- OCR Text Recognition: Extract text from screenshots using Apple's Vision Framework
- UI Tree Analysis: Extract detailed accessibility hierarchies for precise element targeting
- Multi-App Discovery: Identify and monitor multiple running applications simultaneously
๐ Automation Toolkit
- Precise Interactions: Mouse clicks, keyboard input, and gesture automation
- Smart Waiting: Intelligent delays and condition-based waiting mechanisms
- Error Recovery: Robust fallback strategies for reliable automation
๐ Privacy & Security
- Permission Management: Seamless TCC (Transparency, Consent, and Control) integration
- Secure Communication: JSON-RPC over STDIO with structured error handling
- Bundle ID Validation: Verified application targeting for enhanced security
๐ Quick Start
Prerequisites
- macOS 15.0+ (Sequoia or later)
- Swift 6.1+
- Xcode 16.0+
Installation
# Clone the repository
git clone https://github.com/your-username/AppMCP.git
cd AppMCP
# Build the project
swift build -c release
# Run the daemon
./.build/release/appmcpd --stdioPermissions Setup
AppMCP requires the following macOS permissions:
- ๐ Accessibility: System Preferences โ Privacy & Security โ Accessibility
- ๐บ Screen Recording: System Preferences โ Privacy & Security โ Screen Recording
The application will guide you through the permission setup process.
๐ฎ Usage Examples
Weather App Automation
import json
import subprocess
# Start AppMCP server
process = subprocess.Popen(['./appmcpd', '--stdio'])
# Take screenshot of Weather app
request = {
"jsonrpc": "2.0",
"id": 1,
"method": "resources/read",
"params": {"uri": "app://app_screenshot"}
}
# Send request and get response
response = send_mcp_request(request)
print(f"๐ธ Screenshot captured: {response['result']['contents'][0]['text']}")UI Element Discovery
# Get accessibility tree
request = {
"jsonrpc": "2.0",
"id": 2,
"method": "resources/read",
"params": {"uri": "app://app_accessibility_tree"}
}
tree = send_mcp_request(request)
print(f"๐ณ UI Elements: {tree['result']['contents'][0]['text']}")Automated Interactions
# Click on coordinates
request = {
"jsonrpc": "2.0",
"id": 3,
"method": "tools/call",
"params": {
"name": "mouse_click",
"arguments": {"x": 300, "y": 150}
}
}
# Type text
request = {
"jsonrpc": "2.0",
"id": 4,
"method": "tools/call",
"params": {
"name": "type_text",
"arguments": {"text": "Tokyo"}
}
}๐ Architecture
graph TB
A[๐ค AI Model] -->|JSON-RPC| B[๐ก MCP Server]
B --> C[๐ Resources]
B --> D[๐ Tools]
C --> E[๐ฑ App Screenshot]
C --> F[๐ณ Accessibility Tree]
C --> G[๐ Running Apps]
D --> H[๐ฑ Mouse Control]
D --> I[โจ๏ธ Keyboard Input]
D --> J[โฑ Wait Functions]
E --> K[๐ฑ macOS Apps]
F --> K
G --> K
H --> K
I --> K
J --> KCore Components
| Component | Description | Technology | |-----------|-------------|------------| | ๐ฏ AppSelector | Application discovery and targeting | AppKit, NSWorkspace | | ๐ธ ScreenCaptureProvider | High-quality screenshot capture | ScreenCaptureKit (macOS 15+) | | ๐ณ AppAXTreeProvider | Accessibility tree extraction | Accessibility API | | ๐ฑ MouseClickTool | Precise mouse automation | CGEvent, Quartz | | โจ๏ธ KeyboardTool | Text input and shortcuts | CGEvent, Carbon | | ๐ TCCManager | Permission management | TCC Framework |
๐ฆ Package Structure
AppMCP/
โโโ ๐ Sources/
โ โโโ ๐ AppMCP/
โ โโโ ๐ฏ AppMCP.swift # Core protocols & types
โ โโโ ๐ฅ MCPServer.swift # Main MCP server
โ โโโ ๐ Resources/ # Data providers
โ โโโ ๐ Tools/ # Automation tools
โ โโโ ๐ Permissions/ # Security management
โโโ ๐ Sources/appmcpd/
โ โโโ ๐ Command.swift # CLI daemon
โโโ ๐ Tests/
โ โโโ ๐ AppMCPTests/ # Comprehensive test suite
โโโ ๐ Package.swift # Swift Package configuration
โโโ ๐ CLAUDE.md # Development guidelines๐งช Testing
Run All Tests
swift testTest Categories
- ๐ง Unit Tests: Core functionality validation
- ๐ Integration Tests: End-to-end workflow testing
- โก Performance Tests: Response time benchmarking
- ๐ก Security Tests: Permission and validation checks
Example Test Results
Test Suite 'AppMCPTests' passed at 2025-06-04 16:42:04.049
Executed 19 tests, with 0 failures (0 unexpected) in 0.015 seconds
โ
All tests passing๐ API Reference
MCP Tools
AppMCP provides the following specialized tools for macOS automation:
Screenshot & UI Analysis
capture_ui_snapshot: Capture screenshot with UI element hierarchy
- Optional text recognition via Vision Framework - Element filtering with queries - Returns base64 screenshot + structured UI data
recognize_text_in_screenshot: ๐ OCR text extraction from app windows
- Multi-language support (en-US, ja-JP, zh-Hans, etc.) - Fast vs accurate recognition modes - Confidence scores and bounding boxes
Automation Controls
click_element: Element-based clicking with multi-button supportinput_text: Text input with setValue/type methodsdrag_drop: Drag and drop between elementsscroll_window: Scrolling at specific element locations
App Discovery
list_running_applications: Get all running apps with metadatalist_application_windows: List windows with bounds and visibility
Text Recognition Features
The Vision Framework integration provides powerful OCR capabilities:
{
"bundleID": "com.apple.TextEdit",
"includeTextRecognition": true,
"recognitionLanguages": ["en-US", "ja-JP"],
"recognitionLevel": "accurate"
}Recognition Results:
- Full text extraction in reading order
- Individual text regions with confidence scores
- Bounding boxes in normalized coordinates
- Support for 50+ languages
- Handwritten text detection
๐ฏ Roadmap
๐ Current (v1.0.0)
- [x] Weather app automation PoC
- [x] Basic screenshot & UI tree extraction
- [x] Mouse & keyboard automation
- [x] Permission management
- [x] Vision Framework OCR text recognition
๐ Near Future (v0.2.0)
- [ ] Multi-app simultaneous control
- [ ] DevTools integration
- [ ] Enhanced error recovery
- [ ] Performance optimizations
๐ฎ Long Term (v1.0.0)
- [ ] HTTP transport support
- [ ] Shortcuts.app integration
- [ ] Plugin SDK for extensions
- [ ] Real-time UI streaming
๐ค Contributing
We welcome contributions! Please see our Contributing Guidelines for details.
Development Setup
# Install dependencies
swift package resolve
# Run tests
swift test
# Format code
swift-format -i -r Sources/ Tests/
# Build for development
swift build๐ License
AppMCP is released under the MIT License. See LICENSE for details.
๐ Acknowledgments
- Model Context Protocol - For the excellent MCP Swift SDK
- Apple Developer Team - For the powerful macOS automation APIs
- Swift Community - For the robust Swift ecosystem
<div align="center">
Built with โค๏ธ for the AI automation community
๐ Documentation โข ๐ Issues โข ๐ฌ Discussions
</div>
Package Metadata
Repository: 1amageek/app-mcp
Stars: 21
Forks: 2
Open issues: 0
Default branch: main
Primary language: swift
README: README.md