Documentation
Overview
Proberun is a local-first MCP server that lets AI coding agents — Claude Code, Cursor, Codex — drive your iOS Simulator and Android emulator the way Playwright drives a browser. You describe a test in plain language; the agent reads an indexed accessibility tree, calls the right tools, waits between screens, and reports what broke.
Everything runs on your machine. Your source, screenshots, and traces never leave it. The local tier is free and open source; a hosted tier (cloud runs on our simulators and real devices) is opt-in and metered per minute.
Requirements
| What | Why | Notes |
|---|---|---|
| macOS 14+ | iOS Simulator + idb require it | Apple Silicon or Intel |
| Xcode + Command Line Tools | Builds your app, runs the Simulator | xcode-select --install |
| Node 20+ | Runs the MCP server | any LTS or current |
| Python 3.12 | fb-idb (the iOS bridge) needs ≤3.12 | not 3.13/3.14 yet |
| Android SDK (optional) | Android emulator support | adb + emulator auto-detected |
Android is optional — if you only test iOS you can skip the SDK. The server auto-detects both toolchains and exposes whichever is present.
Install
Three system tools, then the server, then register it with your AI editor.
Verify the toolchain any time with proberun doctor — it checks xcrun, simctl, idb, idb_companion, ffmpeg, and reports booted simulators and data dirs.
ANDROID_HOME (or install to the default~/Library/Android/sdk) and the android_* tools light up automatically.Quickstart — your first test
Open your AI editor with Proberun connected and describe the test. For example:
Behind the scenes the agent calls these tools in sequence:
No selectors, no test code, no maintenance. The agent reasons over the live UI tree and adapts when the screen changes.
Core concepts
Indexed snapshot
ui_snapshot returns a compact, numbered accessibility tree — [3] Button "Sign In" frame=(32,400,320,44). Indices are stable until the next snapshot; the agent taps by index or by text, never by raw pixels. Cheap in tokens, robust to layout shifts.
App Atlas
atlas_build autonomously walks your whole app, fingerprinting each screen and recording transitions into a graph stored at ~/.proberun/atlas/<app>.json. Tests then navigate by name (atlas_path_to "SettingsScreen") instead of re-discovering the UI — roughly 80% fewer tokens per test, and the key to cheap cloud runs (see below).
State snapshot / restore
save_state clones the simulator (via simctl clone) so you can restore_state a logged-in starting point in 2–5s instead of a 30–60s cold boot + login. Playwright contexts, for iOS.
Backend observability
start_log_capture and start_network_capturerecord the app's logs and full HTTPS traffic during a test, auto-classifying Firebase / Supabase / Stripe / Sentry errors and HTTP 4xx/5xx. When a flow breaks, the agent knows whether it was a 401 from your auth backend or a UI bug — not just "the button didn't work."
Vision fallback
When the accessibility tree is sparse (React Native, Flutter, Unity, canvas), vision_ocr (Apple Vision, free + local) or vision_describe_llm (bring-your-own Anthropic key) reads the screen so those apps still work.
Tool reference
68 tools. Pass an optional udid/serial to any; omit it to reuse the last/only device.
Lifecycle (iOS)
| list_simulators | List sims with UDID, name, state, runtime |
| boot_simulator | Boot by name or UDID; opens Simulator.app |
| build_app | xcodebuild an .xcodeproj/.xcworkspace for the sim |
| install_app / launch_app | Install a .app, launch by bundle id |
| terminate_app / uninstall_app | Kill or remove an app |
| open_url | Trigger a deep link |
| reset_simulator | Erase content & settings (fresh context) |
| list_installed_apps | Bundle ids + display names on the sim |
Perception
| ui_snapshot | Indexed accessibility tree; vision_fallback option |
| screenshot | PNG, returned inline for vision models |
| vision_ocr | Apple Vision OCR — text + bboxes, free & local |
| vision_describe_llm | Vision-LLM screen description (BYOK Anthropic) |
Action
| tap / tap_index / tap_text | Tap by coords, snapshot index, or fuzzy text |
| long_press | Press-and-hold at coords |
| swipe | Coords or direction (up/down/left/right) |
| type_text | Type into the focused field |
| press_button | HOME / LOCK / SIRI / etc. |
Wait & sync
| wait_for_text | Block until text appears |
| wait_for_text_disappear | Block until text is gone |
| wait_for_snapshot_stable | Block until the UI settles |
| wait_for_index_change | Confirm a tap changed the screen |
App Atlas
| atlas_build | Autonomously map the whole app into a graph |
| atlas_path_to | Shortest action path between two screens |
| atlas_which_screen | Identify the current screen by fingerprint |
| atlas_record_screen / atlas_record_transition | Manual graph building |
| atlas_get_screen / atlas_list_screens | Read recorded screens |
State
| save_state / restore_state | Clone & restore sim state (logged-in, etc.) |
| list_states / delete_state | Manage saved states |
Backend observability
| start_log_capture / stop_log_capture | Capture + classify app logs |
| get_log_entries / wait_for_log_entry | Query / block on classified log events |
| start_network_capture / stop_network_capture | mitmproxy HTTPS capture |
| get_network_flows / inspect_network_flow | List + inspect full request/response |
| wait_for_network_request | Block until a backend call fires |
Recording & trace
| start_recording / stop_recording | Record an .mp4 of the run |
| log_thought | Narrate reasoning into the trace (for tracecast) |
| report_issue | Send in-tool feedback to the maintainers |
Cloud cost-savers
| estimate_cloud_run | Predict cloud minutes + cost before you run |
| export_cloud_context | Bundle local atlas so the cloud skips discovery |
Android
| android_list_devices / android_boot_emulator | Devices/AVDs; boot an emulator |
| android_install_app / android_launch_app | Install APK, launch a package |
| android_ui_snapshot | uiautomator dump → indexed tree |
| android_tap / android_tap_index / android_tap_text | Tap by coords / index / text |
| android_swipe / android_type_text / android_press_key | Gestures, text, keys |
| android_screenshot | PNG of the device |
Cloud runs & saving money
Local runs are free forever. When you need parallel runs in CI, add --cloud and your tests execute on our hosted simulators (and real devices on Business). You pay only for the minutes you run, and Proberun is built to make that bill small.
Do the expensive work locally, for free
Exploration — mapping your app — is the slow, expensive part. Run it once locally (free), then upload the result so the cloud skips it:
estimate_cloud_run shows the cost before you spend a metered minute. export_cloud_contextbundles your atlas so the runner loads the map instead of re-discovering it. Most vendors bill you for that discovery every run — we don't.
| Tier | Flat | Included | Overage |
|---|---|---|---|
| Pro | $29/mo | 200 sim-min | $0.15/min |
| Team | $99/mo · 5 seats | 1,000 sim-min | $0.12/min |
| Business | $499/mo | 300 device-min | $0.40/min |
CI integration
Run flows on every PR. The same flow specs you author locally run in CI; results come back as a trace + screenshot diff. (GitHub Action ships with the hosted tier.)
Locally, the included eval harness (eval/runner.ts) runs flow specs against the MCP server and writes a pass/fail leaderboard — wire it into any CI that has macOS runners.
Troubleshooting
| Symptom | Fix |
|---|---|
| “idb not found” | Re-run pipx install fb-idb --python python3.12 (must be ≤3.12) |
| “No simulator booted” | Open Simulator.app and boot one, or call boot_simulator |
| ui_snapshot returns few elements | App is RN/Flutter/canvas — pass vision_fallback=true or use vision_ocr |
| long_press no-ops on 2nd call | Fixed in v0.1.1 (settle delay) — update if older |
| Recording file is 0 bytes | Keep the sim booted for the whole recording; don't reset mid-record |
| Android tools missing | Set ANDROID_HOME or install SDK to ~/Library/Android/sdk |
FAQ
Real devices?
Sim + emulator today; real-device runs are the Business tier (hosted). Local is Simulator-only.
React Native / Flutter / Unity?
Native views work directly; sparse trees fall back to Apple Vision OCR (free, local) or a vision LLM.
Which assistants?
Anything that speaks MCP — Claude Code, Cursor, Codex. Tested primarily on Claude Code.
Is my code uploaded?
No. The local CLI is local-only. Telemetry is opt-in and anonymous — tool names + error counts, never code or arguments.
License?
The local tool is Apache 2.0. The hosted orchestration is proprietary.
Want the hosted tier + early access?
Request early access →