Docs — Proberun

Documentation

Overview

Proberun is a local-first MCP server that lets AI coding agents — Claude Code, Cursor, Codex — drive your iOS Simulator and Android emulator the way Playwright drives a browser. You describe a test in plain language; the agent reads an indexed accessibility tree, calls the right tools, waits between screens, and reports what broke.

Everything runs on your machine. Your source, screenshots, and traces never leave it. The local tier is free and open source; a hosted tier (cloud runs on our simulators and real devices) is opt-in and metered per minute.

New here? Jump to Install then Quickstart— you'll have an AI testing your app in about five minutes.

Requirements

What	Why	Notes
macOS 14+	iOS Simulator + idb require it	Apple Silicon or Intel
Xcode + Command Line Tools	Builds your app, runs the Simulator	xcode-select --install
Node 20+	Runs the MCP server	any LTS or current
Python 3.12	fb-idb (the iOS bridge) needs ≤3.12	not 3.13/3.14 yet
Android SDK (optional)	Android emulator support	adb + emulator auto-detected

Android is optional — if you only test iOS you can skip the SDK. The server auto-detects both toolchains and exposes whichever is present.

Install

Three system tools, then the server, then register it with your AI editor.

terminal

# 1. iOS automation bridge

$ brew install facebook/fb/idb-companion

$ pipx install fb-idb --python /opt/homebrew/bin/python3.12

# 2. ffmpeg (for video recording / tracecast)

$ brew install ffmpeg

# 3. the Proberun MCP server

$ npm i -g proberun-cli

# 4. register with Claude Code (or Cursor/Codex)

$ claude mcp add proberun -- proberun

✓ Added stdio MCP server proberun (68 tools)

Verify the toolchain any time with proberun doctor — it checks xcrun, simctl, idb, idb_companion, ffmpeg, and reports booted simulators and data dirs.

On Android, set ANDROID_HOME (or install to the default~/Library/Android/sdk) and the android_* tools light up automatically.

Quickstart — your first test

Open your AI editor with Proberun connected and describe the test. For example:

Claude Code

you ▸ Use proberun. Boot iPhone 17, build my app at ./MyApp.xcodeproj scheme MyApp, install and launch it, then walk the signup flow and confirm a bad email shows an error.

Behind the scenes the agent calls these tools in sequence:

proberun

$ boot_simulator "iPhone 17"

✓ Booted iPhone 17

$ build_app ./MyApp.xcodeproj MyApp

✓ Built MyApp.app (42s)

$ install_app && launch_app com.you.MyApp

$ ui_snapshot

[3] Button "Sign Up" [7] TextField "Email" value=""

$ tap_index 3 → type_text → tap_text "Continue"

$ wait_for_text "Enter a valid email"

✓ Found in 0.9s — validation works

No selectors, no test code, no maintenance. The agent reasons over the live UI tree and adapts when the screen changes.

Core concepts

Indexed snapshot

ui_snapshot returns a compact, numbered accessibility tree — [3] Button "Sign In" frame=(32,400,320,44). Indices are stable until the next snapshot; the agent taps by index or by text, never by raw pixels. Cheap in tokens, robust to layout shifts.

App Atlas

atlas_build autonomously walks your whole app, fingerprinting each screen and recording transitions into a graph stored at ~/.proberun/atlas/<app>.json. Tests then navigate by name (atlas_path_to "SettingsScreen") instead of re-discovering the UI — roughly 80% fewer tokens per test, and the key to cheap cloud runs (see below).

State snapshot / restore

save_state clones the simulator (via simctl clone) so you can restore_state a logged-in starting point in 2–5s instead of a 30–60s cold boot + login. Playwright contexts, for iOS.

Backend observability

start_log_capture and start_network_capturerecord the app's logs and full HTTPS traffic during a test, auto-classifying Firebase / Supabase / Stripe / Sentry errors and HTTP 4xx/5xx. When a flow breaks, the agent knows whether it was a 401 from your auth backend or a UI bug — not just "the button didn't work."

Vision fallback

When the accessibility tree is sparse (React Native, Flutter, Unity, canvas), vision_ocr (Apple Vision, free + local) or vision_describe_llm (bring-your-own Anthropic key) reads the screen so those apps still work.

Tool reference

68 tools. Pass an optional udid/serial to any; omit it to reuse the last/only device.

Lifecycle (iOS)

list_simulators	List sims with UDID, name, state, runtime
boot_simulator	Boot by name or UDID; opens Simulator.app
build_app	xcodebuild an .xcodeproj/.xcworkspace for the sim
install_app / launch_app	Install a .app, launch by bundle id
terminate_app / uninstall_app	Kill or remove an app
open_url	Trigger a deep link
reset_simulator	Erase content & settings (fresh context)
list_installed_apps	Bundle ids + display names on the sim

Perception

ui_snapshot	Indexed accessibility tree; vision_fallback option
screenshot	PNG, returned inline for vision models
vision_ocr	Apple Vision OCR — text + bboxes, free & local
vision_describe_llm	Vision-LLM screen description (BYOK Anthropic)

Action

tap / tap_index / tap_text	Tap by coords, snapshot index, or fuzzy text
long_press	Press-and-hold at coords
swipe	Coords or direction (up/down/left/right)
type_text	Type into the focused field
press_button	HOME / LOCK / SIRI / etc.

Wait & sync

wait_for_text	Block until text appears
wait_for_text_disappear	Block until text is gone
wait_for_snapshot_stable	Block until the UI settles
wait_for_index_change	Confirm a tap changed the screen

App Atlas

atlas_build	Autonomously map the whole app into a graph
atlas_path_to	Shortest action path between two screens
atlas_which_screen	Identify the current screen by fingerprint
atlas_record_screen / atlas_record_transition	Manual graph building
atlas_get_screen / atlas_list_screens	Read recorded screens

State

save_state / restore_state	Clone & restore sim state (logged-in, etc.)
list_states / delete_state	Manage saved states

Backend observability

start_log_capture / stop_log_capture	Capture + classify app logs
get_log_entries / wait_for_log_entry	Query / block on classified log events
start_network_capture / stop_network_capture	mitmproxy HTTPS capture
get_network_flows / inspect_network_flow	List + inspect full request/response
wait_for_network_request	Block until a backend call fires

Recording & trace

start_recording / stop_recording	Record an .mp4 of the run
log_thought	Narrate reasoning into the trace (for tracecast)
report_issue	Send in-tool feedback to the maintainers

Cloud cost-savers

estimate_cloud_run	Predict cloud minutes + cost before you run
export_cloud_context	Bundle local atlas so the cloud skips discovery

Android

android_list_devices / android_boot_emulator	Devices/AVDs; boot an emulator
android_install_app / android_launch_app	Install APK, launch a package
android_ui_snapshot	uiautomator dump → indexed tree
android_tap / android_tap_index / android_tap_text	Tap by coords / index / text
android_swipe / android_type_text / android_press_key	Gestures, text, keys
android_screenshot	PNG of the device

Cloud runs & saving money

Local runs are free forever. When you need parallel runs in CI, add --cloud and your tests execute on our hosted simulators (and real devices on Business). You pay only for the minutes you run, and Proberun is built to make that bill small.

Do the expensive work locally, for free

Exploration — mapping your app — is the slow, expensive part. Run it once locally (free), then upload the result so the cloud skips it:

lower your cloud bill

$ atlas_build com.you.MyApp # free, on your Mac

$ estimate_cloud_run com.you.MyApp --flows 5

✓ Atlas present (24 screens) — cloud skips exploration

estimated: 6 min → ~$0.90 (saved ~9 min / ~$1.35)

$ export_cloud_context com.you.MyApp

↘ ~/.proberun/cloud/com.you.MyApp/ — upload with your app

estimate_cloud_run shows the cost before you spend a metered minute. export_cloud_contextbundles your atlas so the runner loads the map instead of re-discovering it. Most vendors bill you for that discovery every run — we don't.

Tier	Flat	Included	Overage
Pro	$29/mo	200 sim-min	$0.15/min
Team	$99/mo · 5 seats	1,000 sim-min	$0.12/min
Business	$499/mo	300 device-min	$0.40/min

CI integration

Run flows on every PR. The same flow specs you author locally run in CI; results come back as a trace + screenshot diff. (GitHub Action ships with the hosted tier.)

.github/workflows/proberun.yml

- name: Proberun

run: proberun run --cloud --flow flows/*.json

env: { PROBERUN_TOKEN: ${{ secrets.PROBERUN_TOKEN }} }

Locally, the included eval harness (eval/runner.ts) runs flow specs against the MCP server and writes a pass/fail leaderboard — wire it into any CI that has macOS runners.

Troubleshooting

Symptom	Fix
“idb not found”	Re-run pipx install fb-idb --python python3.12 (must be ≤3.12)
“No simulator booted”	Open Simulator.app and boot one, or call boot_simulator
ui_snapshot returns few elements	App is RN/Flutter/canvas — pass vision_fallback=true or use vision_ocr
long_press no-ops on 2nd call	Fixed in v0.1.1 (settle delay) — update if older
Recording file is 0 bytes	Keep the sim booted for the whole recording; don't reset mid-record
Android tools missing	Set ANDROID_HOME or install SDK to ~/Library/Android/sdk

FAQ

Real devices?

Sim + emulator today; real-device runs are the Business tier (hosted). Local is Simulator-only.

React Native / Flutter / Unity?

Native views work directly; sparse trees fall back to Apple Vision OCR (free, local) or a vision LLM.

Which assistants?

Anything that speaks MCP — Claude Code, Cursor, Codex. Tested primarily on Claude Code.

Is my code uploaded?

No. The local CLI is local-only. Telemetry is opt-in and anonymous — tool names + error counts, never code or arguments.

License?

The local tool is Apache 2.0. The hosted orchestration is proprietary.

Want the hosted tier + early access?

Request early access →