Give your AI eyes, hands, and a real iPhone. An MCP server that lets any AI agent see the screen, tap what it needs, and figure the rest out — through macOS iPhone Mirroring. Experimental support for macOS windows. 32 tools, any MCP client.
- macOS 15+
- iPhone connected via iPhone Mirroring
/bin/bash -c "$(curl -fsSL https://mirroir.dev/get-mirroir.sh)"or via npx:
npx -y mirroir-mcp installor via Homebrew:
brew tap jfarcand/tap && brew install mirroir-mcpThe first time you take a screenshot, macOS will prompt for Screen Recording and Accessibility permissions. Grant both.
Per-client setup
claude mcp add --transport stdio mirroir -- npx -y mirroir-mcpInstall from the MCP server gallery: search @mcp mirroir in the Extensions view, or add to .vscode/mcp.json:
{
"servers": {
"mirroir": {
"type": "stdio",
"command": "npx",
"args": ["-y", "mirroir-mcp"]
}
}
}Add to .cursor/mcp.json in your project root:
{
"mcpServers": {
"mirroir": {
"command": "npx",
"args": ["-y", "mirroir-mcp"]
}
}
}codex mcp add mirroir -- npx -y mirroir-mcpOr add to ~/.codex/config.toml:
[mcp_servers.mirroir]
command = "npx"
args = ["-y", "mirroir-mcp"]Install from source
git clone https://github.com/jfarcand/mirroir-mcp.git
cd mirroir-mcp
./mirroir.shUse the full path to the binary in your .mcp.json: <repo>/.build/release/mirroir-mcp.
Every interaction follows the same loop: observe, reason, act. describe_screen gives the AI every text element with tap coordinates (eyes). The LLM decides what to do next (brain). tap, type_text, swipe execute the action (hands) — then it loops back to observe. No scripts, no coordinates, just intent.
mirroir can explore any iOS app blindly — but it works best when you tell it what to expect. Write an APP.md file and mirroir gets a map before it starts:
---
app: Santé
archetype: dashboard
obstacle_mode: auto
---
## Structure
Dashboard with 4 tabs: Résumé, Partage, Parcourir, Profil.
## Résumé Tab
- Summary cards for health metrics that drill down to charts
- Cards often show "Aucune donnée" on test devices
## Obstacles
- Health Access permission → tap "Autoriser"
- Notification permission → tap "Ne pas autoriser"
## Skip
- Supprimer les données de Santé
- Réinitialiser30 seconds of writing saves minutes of blind exploration. The archetype tells mirroir how the app navigates (dashboard = card drill-down, social-feed = infinite scroll, settings-list = chevron rows). Obstacles are auto-dismissed. Skip elements are never tapped. Structure and tab descriptions become AI context in generated skills.
Three levels of patterns work together — elements (what rows look like), screens (what the page layout means), and apps (what the developer knows). See Patterns & Skills for the full system.
Paste any of these into Claude Code, Claude Desktop, ChatGPT, Cursor, or any MCP client:
Open Messages, find my conversation with Alice, and send "running 10 min late".
Open Calendar, create a new event called "Dentist" next Tuesday at 2pm.
Open my Expo Go app, tap "LoginDemo", test the login screen with
test@example.com / password123. Screenshot after each step.
Start recording, open Settings, scroll to General > About, stop recording.
describe_screen is the AI's eyes. Three backends work together to give the agent a complete picture of what's on screen — text, icons, and semantic UI structure.
The default backend uses Apple's Vision framework to detect every text element on screen and return exact tap coordinates. This is fast, local, and requires no API keys or external services.
Text-only OCR misses non-text UI elements — buttons, toggles, tab bar icons, activity rings. Drop a YOLO CoreML model (.mlmodelc) in ~/.mirroir-mcp/models/ and the server auto-detects it at startup, merging icon detection results with OCR text. The AI gets tap targets for elements that text-only OCR cannot see.
| Mode | ocrBackend setting |
Behavior |
|---|---|---|
| Auto-detect (default) | "auto" |
Uses Vision + YOLO if a model is installed, Vision only otherwise |
| Vision only | "vision" |
Apple Vision OCR text only |
| YOLO only | "yolo" |
CoreML element detection only |
| Both | "both" |
Always merge both backends (falls back to Vision if no model) |
Instead of local OCR, describe_screen can send the screenshot to an AI vision model that identifies UI elements semantically — cards, tabs, buttons, icons, navigation structure — not just raw text. This produces richer context for the agent, especially on screens with complex layouts.
The embacle runtime is embedded directly into the mirroir-mcp binary via Rust FFI. describe_screen calls the embedded runtime in-process — no separate server, no network round-trip, no additional setup. The FFI layer (EmbacleFFI.swift → libembacle.a) handles initialization, chat completion requests, and memory management across the Swift/Rust boundary.
embacle routes vision requests through already-authenticated CLI tools (GitHub Copilot, Claude Code) so there is no separate API key to manage. If you have a Copilot or Claude Code subscription, you already have access.
brew tap dravr-ai/tap
brew install embacle # CLI tools (embacle-server, embacle-mcp)
brew install embacle-ffi # Rust FFI static library (libembacle.a)Then rebuild mirroir-mcp from source (or reinstall via Homebrew) so the binary links against libembacle.a:
# From source
swift build -c release
# Or via Homebrew (rebuilds automatically)
brew reinstall mirroir-mcpWhen the embacle FFI is linked into the binary, screenDescriberMode defaults to "auto" which automatically resolves to vision mode. No settings change required — install embacle-ffi, rebuild, and describe_screen starts using AI vision.
To force local OCR even when embacle is available, explicitly set "ocr":
// .mirroir-mcp/settings.json
{
"screenDescriberMode": "ocr"
}See Configuration for all available settings.
When you find yourself repeating the same agent workflow, capture it as a skill. Skills are SKILL.md files — numbered steps the AI follows, adapting to layout changes and unexpected dialogs. Steps like Tap "Email" use OCR — no hardcoded coordinates.
Place files in ~/.mirroir-mcp/skills/ (global) or <cwd>/.mirroir-mcp/skills/ (project-local).
Describe your app's structure to guide exploration — see Describe Your App above. Place APP.md files in ~/.mirroir-mcp/skills/ or the mirroir-skills repo at patterns/apps/.
---
version: 1
name: Commute ETA Notification
app: Waze, Messages
tags: ["workflow", "cross-app"]
---
## Steps
1. Launch **Waze**
2. Wait for "Où va-t-on ?" to appear
3. Tap "Où va-t-on ?"
4. Wait for "${DESTINATION:-Travail}" to appear
5. Tap "${DESTINATION:-Travail}"
6. Wait for "Y aller" to appear
7. Tap "Y aller"
8. Wait for "min" to appear
9. Remember: Read the commute time and ETA.
10. Press Home
11. Launch **Messages**
12. Tap "New Message"
13. Type "${RECIPIENT}" and select the contact
14. Type "On my way! ETA {eta}"
15. Press **Return**
16. Screenshot: "message_sent"${VAR} placeholders resolve from environment variables. ${VAR:-default} for fallbacks.
Install ready-to-use skills from jfarcand/mirroir-skills:
git clone https://github.com/jfarcand/mirroir-skills ~/.mirroir-mcp/skillsThe generate_skill tool lets an AI agent explore an app and produce SKILL.md files. It uses breadth-first search (BFS) to traverse the app as a navigation graph — screens are nodes, tappable elements are edges. The explorer describes each screen, matches elements against component definitions to decide what to tap, visits child screens, and backtracks via the back chevron. Duplicate screens are skipped via structural fingerprinting. See Component Detection below for how the explorer interprets raw elements into structured UI components.
The explorer works viewport-by-viewport: after calibrating the page length, it builds a plan from the current viewport, taps elements top-to-bottom, scrolls down to reveal more content, and rebuilds the plan for each new viewport. This approach works with both OCR and AI vision describers. Pass seed for deterministic ordering across runs.
Exploration is bounded — it does not discover every reachable screen in large apps. Depth, screen count, and time limits keep runs practical. For targeted flows, provide a goal to focus the traversal.
graph TD
A["Launch App"] --> B["Describe Screen"]
B --> C{"Calibrated?"}
C -- No --> D["Scroll Full Page"]
D --> E{"skip_calibration?"}
E -- No --> F["Component Detect +\nClassify + Validate"]
E -- Yes --> G["Classify Elements\nDirectly"]
F --> H["Build Plan"]
G --> H
C -- Yes --> H
H --> I{"Untried\nElements?"}
I -- Yes --> J["Tap Element"]
I -- No --> K["Return to Root"]
J --> M["Describe +\nClassify Edge"]
M --> N{"Transition"}
N -- new screen --> O["Add to Frontier"]
O --> P["Backtrack"]
N -- revisited/dead --> P
P -- push: tap back --> H
P -- modal: tap close --> H
P -- tab: tap prev --> H
K --> Q{"Frontier\nEmpty?"}
Q -- No --> R["Next Frontier\nScreen"]
R --> B
Q -- Yes --> S["Generate SKILL.md"]
Two modes: autonomous exploration (BFS) and guided session (manual step-by-step).
Autonomous BFS exploration — the agent explores on its own:
Explore the Settings app and generate a skill that checks the iOS version.
This calls generate_skill(action: "explore", app_name: "Settings", goal: "check iOS version") under the hood. The explorer launches the app, runs BFS from the root screen, and outputs a SKILL.md for the discovered path.
| Parameter | Default | Description |
|---|---|---|
app_name |
required | App to explore |
goal |
none | Focus exploration toward a specific flow (e.g. "check software version") |
goals |
none | Array of goals — one SKILL.md per goal |
max_depth |
6 | Maximum BFS depth |
max_screens |
30 | Maximum screens to visit |
max_time |
300 | Maximum seconds before stopping |
strategy |
auto | "mobile" (default), "social" (Reddit, Instagram), or "desktop" (macOS windows) |
skip_calibration |
false | Skip component detection during calibration. Scrolling still runs. Useful with AI vision describers that produce clean semantic elements |
seed |
random | Integer seed for deterministic exploration ordering. Same seed produces identical tap sequences |
fresh |
true | Discard persisted navigation graph and explore from scratch. Set false for incremental exploration |
Guided session — the AI navigates manually, capturing each screen:
generate_skill(action: "start", app_name: "MyApp")— launch app, OCR first screen- Use
tap/swipe/type_textto navigate, thengenerate_skill(action: "capture")to record each screen generate_skill(action: "finish")— assemble captured screens into a SKILL.md
Run skills deterministically from the CLI — no AI in the loop:
mirroir test apps/settings/check-about
mirroir test --junit results.xml --verbose # JUnit output
mirroir test --dry-run apps/settings/check-about # validate without executing| Option | Description |
|---|---|
--junit <path> |
Write JUnit XML report |
--screenshot-dir <dir> |
Save failure screenshots (default: ./mirroir-test-results/) |
--timeout <seconds> |
wait_for timeout (default: 15) |
--verbose |
Step-by-step detail |
--dry-run |
Parse and validate without executing |
--no-compiled |
Skip compiled skills, force full OCR |
Exit code 0 = all pass, 1 = any failure.
Compile a skill once to capture coordinates and timing. Replay with zero OCR — a 10-step skill drops from 5+ seconds of OCR to under a second.
mirroir compile apps/settings/check-about # compile
mirroir test apps/settings/check-about # auto-detects .compiled.json
mirroir test --no-compiled check-about # force full OCRAI agents auto-compile skills as a side-effect of the first MCP run. See Compiled Skills for details.
When a test step fails, pass --agent to get an AI diagnosis of what went wrong and suggested fixes:
mirroir test --agent gpt-5.3 apps/settings/check-about
mirroir test --agent claude-sonnet-4-6 apps/settings/check-about
mirroir test --agent ollama:llama3 apps/settings/check-about
mirroir test --agent embacle apps/settings/check-aboutBuilt-in agents:
| Agent | Provider | API Key |
|---|---|---|
gpt-5.3 |
OpenAI | OPENAI_API_KEY |
claude-sonnet-4-6, claude-haiku-4-5 |
Anthropic | ANTHROPIC_API_KEY |
ollama:<model> |
Ollama (local) | None |
embacle, embacle:claude |
embacle-server | CLI agent key |
Custom agents can be defined as YAML profiles in ~/.mirroir-mcp/agents/.
No API key? Use embacle
embacle routes requests through already-authenticated CLI tools (GitHub Copilot, Claude Code, etc.) — no separate API key needed:
brew tap dravr-ai/tap && brew install embacle
mirroir test --agent embacle my-skillThe explorer uses a three-layer pattern system to understand iOS apps — the same declarative concept at different scales:
- Element patterns (
patterns/elements/) — 34 definitions matching row-level UI components (table rows, tab bars, toggles, summary cards). Each specifies match rules, interaction behavior, and grouping logic. - Screen patterns (
patterns/screens/) — 7 archetype recipes that identify screen-level navigation models from element composition. Auto-detected during calibration, or declared viaarchetypein APP.md. - App patterns (
patterns/apps/) — APP.md files with structure, obstacles, skip lists, and archetype declarations. The developer's source of truth.
Built-in archetypes: settings-list, dashboard, social-feed, content-grid, conversation-list, utility-display, detail-form.
Place custom patterns in ~/.mirroir-mcp/components/ (elements), ~/.mirroir-mcp/recipes/ (screens), or the mirroir-skills repo.
AI vision describers describe UI elements semantically ("Activité chevron") rather than character-by-character ("Activité" + ">"). A vision-indicators.md file maps these descriptions to OCR-compatible characters so the component pipeline works identically with both backends:
## Indicators
- chevron: >
- dismiss: ×
- back: <When a vision element ends with a mapped suffix (e.g. "Entraînements chevron"), the normalizer splits it into two elements: "Entraînements" + ">". Place vision-indicators.md alongside your component definitions.
See Component Detection for the full definition format, match rule reference, and the detection pipeline.
Giving an AI access to your phone demands defense in depth. mirroir-mcp is fail-closed at every layer.
- Tool permissions — Without a config file, only read-only tools (
screenshot,describe_screen) are exposed. Mutating tools are hidden from the MCP client entirely — it never sees them. - App blocking —
blockedAppsinpermissions.jsonprevents the AI from interacting with sensitive apps like Wallet or Banking, even if mutating tools are allowed. - No root required — Runs as a regular user process using the macOS CGEvent API. No daemons, no kernel extensions, no root privileges — just Accessibility permissions.
- Kill switch — Close iPhone Mirroring to kill all input instantly.
// ~/.mirroir-mcp/permissions.json
{
"allow": ["tap", "swipe", "type_text", "press_key", "launch_app"],
"deny": [],
"blockedApps": ["Wallet", "Banking"]
}See Permissions and Security for the full threat model.
Record interactions as a skill file:
mirroir record -o login-flow.yaml -n "Login Flow" --app "MyApp"Verify your setup:
mirroir doctor
mirroir doctor --json # machine-readable outputSet up your keyboard layout for non-US keyboards:
mirroir configure# curl installer
/bin/bash -c "$(curl -fsSL https://mirroir.dev/get-mirroir.sh)"
# npx
npx -y mirroir-mcp install
# Homebrew
brew upgrade mirroir-mcp
# From source
git pull && swift build -c release# Homebrew
brew uninstall mirroir-mcp
# From source
./uninstall-mirroir.shAll settings live in settings.json — project-local (.mirroir-mcp/settings.json) or global (~/.mirroir-mcp/settings.json). Project-local settings override global ones. Every setting also has a corresponding environment variable (e.g. MIRROIR_SCREEN_DESCRIBER_MODE).
{
"screenDescriberMode": "auto",
"agent": "embacle",
"ocrBackend": "auto",
"keystrokeDelayUs": 15000,
"explorationMaxScreens": 30
}See Configuration Reference for all 40+ settings covering screen intelligence, input timing, scroll behavior, exploration budgets, AI providers, and keyboard layouts.
| Tools Reference | All 32 tools, parameters, and input workflows |
| Configuration | All settings: screen intelligence, input timing, exploration, AI providers |
| FAQ | Security, focus stealing, keyboard layouts, embacle/vision mode |
| Security | Threat model, kill switch, and recommendations |
| Permissions | Fail-closed permission model and config file |
| Known Limitations | Focus stealing, keyboard layout gaps, autocorrect |
| Patterns & Skills | Element patterns, screen recipes, APP.md app descriptions, and the detection pipeline |
| YOLO Icon Detection | Recommended YOLO models, CoreML setup, and configuration |
| Compiled Skills | Zero-OCR skill replay |
| Testing | FakeMirroring, integration tests, and CI strategy |
| Troubleshooting | Debug mode and common issues |
| Contributing | How to add tools, commands, and tests |
| Skills Marketplace | Skill format, plugin discovery, and authoring |
Join the Discord server to ask questions, share skills, and discuss ideas.
Contributions welcome. By submitting a patch, you agree to the Contributor License Agreement — your Git commit metadata serves as your electronic signature.
Why "mirroir"? — It's the old French spelling of miroir (mirror). A nod to the author's roots, not a typo.
