GhostDesk

MCP Compatible Python 3.12+ FSL-1.1-ALv2 License Platform Give your AI agent eyes, hands, and a full Linux desktop.
An MCP server that lets LLM agents see the screen, move the mouse, type on the keyboard, launch apps, and run shell commands — all inside a sandboxed virtual desktop.

If a human can do it on a desktop, your agent can too.

ghostdesk-google-news.mp4

GhostDesk demo — from a single prompt ("open the browser, go to Google News, and tell me the latest headlines in the Technology section"), the agent launches Firefox, navigates to Google News, switches to the Technology section, and reports the latest stories back.

Why GhostDesk?
How it works
Quick start
Secure local run (TLS + auth)
Tools
Model requirements
From one agent to a workforce
Configuration
Security
Troubleshooting
Custom image
License

Why GhostDesk?

Browser automation tools (Playwright, Puppeteer, Selenium…) were built for human test engineers driving a browser with selectors. They do one thing, and they do it well — inside the browser.

GhostDesk is built from the other end: for AI agents, driving everything a desktop runs. Browsers, native apps, IDEs, terminals, office suites, legacy software, internal tools. If it renders pixels on screen, your agent can see it and use it — in one conversation, across many applications, without a line of glue code.

You don't write selectors. You write a prompt:

"Open the CRM, export last month's leads as CSV, open LibreOffice Calc, build a pivot table, screenshot the chart, and email it to the team."

The agent opens the browser, logs in, downloads the file, switches to LibreOffice, processes the data, captures the result, composes the email, sends it. One prompt, multiple apps, fully autonomous — no glue code, no per-site scraper, no brittle selector chain.

That is what agents using a desktop looks like.

Runs on models you can actually host

Desktop control needs to be fast — an agent that takes twelve seconds to decide where to click is unusable. GhostDesk is tuned so that vision-language models from the Qwen family running on a single workstation GPU are a first-class target, not an afterthought. No API bill, no screenshots of your desktop leaving your network.

Frontier models (Claude, GPT-4o, Gemini) work too and remain the smoothest path — but they are not the bar. See Model requirements for the supported stacks and the one coordinate-space setting that matters.

How it works

GhostDesk runs a virtual Linux desktop inside Docker and exposes it as an MCP server. Your agent gets a sandboxed desktop with a taskbar, clock, and pre-installed applications — equivalent to what a human sees on their screen.

The agent perceives the screen by calling screen_shot(), which captures the full desktop at native resolution and returns it as WebP (or PNG). An optional region= argument can crop to a sub-rectangle when the agent explicitly wants to narrow its focus.

This works with any application — web apps, native apps, legacy software, Canvas, WebGL.

Quick start

1. Run the container

One command, plain HTTP, no password. Fine for kicking the tires on a laptop you trust — not fit for anything beyond that. Ready to harden it? Jump to Secure local run.

docker run -d --name ghostdesk-demo \
  --shm-size 2g \
  -p 3000:3000 \
  -p 6080:6080 \
  ghcr.io/yv17labs/ghostdesk:latest

The latest image ships with Firefox, the foot terminal, mousepad (text editor), galculator, and passwordless sudo for the agent user — enough to demo a browsing + note-taking workflow out of the box. Need a different app set? Build your own on top of base — see Custom image.

The container boots in the dev posture: plain HTTP on both ports, every auth gate disarmed on purpose. You'll see warnings in the logs reminding you of that — they go away once you follow the secured path below.

2. Connect your AI

GhostDesk speaks MCP over the Streamable HTTP transport — any MCP-compatible client can drive it. Point your client at http://localhost:3000/mcp:

Claude Desktop / Claude Code

{
  "mcpServers": {
    "ghostdesk": {
      "type": "http",
      "url": "http://localhost:3000/mcp"
    }
  }
}

Any other MCP-compatible client — same URL, no headers, no auth. That's the whole demo posture.

3. Watch your agent work

Open http://localhost:6080/ in your browser to see the virtual desktop in real time. No password prompt — the dev posture skips it.

Service	URL
MCP server	`http://localhost:3000/mcp`
noVNC (browser)	`http://localhost:6080/`

Give your agent a first prompt to confirm the wiring is right:

"Take a screenshot of the desktop, list the installed applications, then open Firefox and go to wikipedia.org."

You should see Firefox launch in the noVNC tab, the URL bar fill in, and the page load — all under your agent's control.

4. When you're done

docker stop ghostdesk-demo && docker rm ghostdesk-demo

The demo run creates no named volume, so this leaves nothing behind.

Secure local run (TLS + auth)

The Quick start above drops every gate so you can kick the tires in thirty seconds. The moment you want to expose this to anything beyond your own laptop — another machine on your LAN, a devcontainer port-forward on an untrusted network, a teammate's browser — flip to the secured posture: real TLS + bearer-token auth on MCP + password prompt on noVNC.

GhostDesk couples TLS and auth: mount a cert and you get wss:// + bearer-token on MCP + a single-password prompt on noVNC (see Security → Auth ≡ TLS). mkcert issues a browser-trusted cert for localhost in two commands:

# Issue a locally-trusted cert (first time only — installs a local CA in your trust store)
mkcert -install
mkdir -p tls
mkcert -cert-file tls/server.crt -key-file tls/server.key localhost 127.0.0.1 ::1

# Generate the MCP and VNC secrets
export GHOSTDESK_AUTH_TOKEN=$(openssl rand -hex 32)
export GHOSTDESK_VNC_PASSWORD=$(openssl rand -hex 16)

Pick a container name that matches the agent's role — sales-agent, research-agent, accounting-agent… Below we use my-agent as a placeholder; replace it everywhere in the command.

# Run the container — cert mounted, TLS + auth enabled everywhere
docker run -d --name ghostdesk-my-agent \
  --restart unless-stopped \
  --cap-add SYS_ADMIN \
  --shm-size 2g \
  -p 3000:3000 \
  -p 6080:6080 \
  -v ghostdesk-my-agent-home:/home/agent \
  -v "$PWD/tls/server.crt:/etc/ghostdesk/tls/server.crt:ro" \
  -v "$PWD/tls/server.key:/etc/ghostdesk/tls/server.key:ro" \
  -e GHOSTDESK_AUTH_TOKEN \
  -e GHOSTDESK_VNC_PASSWORD \
  -e TZ=America/New_York \
  -e LANG=en_US.UTF-8 \
  ghcr.io/yv17labs/ghostdesk:latest

echo "MCP token:    $GHOSTDESK_AUTH_TOKEN"
echo "VNC password: $GHOSTDESK_VNC_PASSWORD"

Once the container is up, update your MCP client config — same shape as the demo, now over https:// with a bearer token:

Claude Desktop / Claude Code

{
  "mcpServers": {
    "ghostdesk": {
      "type": "http",
      "url": "https://localhost:3000/mcp",
      "headers": {
        "Authorization": "Bearer <paste $GHOSTDESK_AUTH_TOKEN here>"
      }
    }
  }
}

Any other MCP-compatible client — same URL, plus an Authorization: Bearer <token> header in whatever form your client accepts.

Then open https://localhost:6080/ in your browser — the mkcert CA installed by mkcert -install is already in your trust store, so the browser accepts the cert with no warning. noVNC will prompt for $GHOSTDESK_VNC_PASSWORD.

Going to production? Swap the mkcert leaf for a real cert, source both secrets from your secret manager, and front port 6080 with an identity-aware proxy — SECURITY.md has the full contract.

--cap-add SYS_ADMIN — Required by Electron apps (VS Code, Slack, etc.) and other applications that need Linux user namespaces to run their sandbox. Safe to remove if you don't need them.

The named volume persists the agent's home directory across restarts — browser passwords, bookmarks, cookies, downloads, and desktop preferences are all preserved. On the first run, Docker automatically seeds the volume with the default configuration from the image.

Tools

13 tools at your agent's fingertips, grouped by concern (verb_noun naming):

Screen

Tool	Description
`screen_shot`	Capture the screen as a WebP image (pass `format="png"` for lossless). Pass `region=` to crop to a sub-rectangle at native resolution. Set `stabilize=False` to skip page stabilization checks (default: True, waits max 5 sec for page to stabilize)

Mouse

Tool	Description
`mouse_move`	Move the cursor to coordinates without clicking — reveals hover-only menus, tooltips, and CSS `:hover` states (e.g. Gmail action bar)
`mouse_click`	Click at coordinates
`mouse_double_click`	Double-click at coordinates
`mouse_drag`	Drag from one position to another
`mouse_scroll`	Scroll in any direction (up/down/left/right)

Keyboard

Tool	Description
`key_type`	Type text with realistic per-character delays
`key_press`	Press keys or combos (`ctrl+c`, `alt+F4`, `Return`...)

Clipboard

Tool	Description
`clipboard_get`	Read clipboard contents
`clipboard_set`	Write to clipboard

Apps

Tool	Description
`app_list`	List the GUI applications installed on the desktop
`app_launch`	Start a GUI application by name
`app_status`	Check if an application is running and read its logs

Model requirements

Your inference stack must cover four capabilities — all four are mandatory:

Text + vision — the agent perceives the desktop through screenshots and needs a model that can interpret them.
Tool use — GhostDesk exposes 12 tools as function calls; the model must be able to invoke them.
MCP client — the host needs to speak Streamable HTTP MCP to reach the GhostDesk server.
WebP image support — GhostDesk returns screenshots as WebP by default to keep payloads small and inference fast. A stack that can only decode PNG or JPEG will not work out of the box.

Coordinate space — `GhostDesk-Model-Space` header

By default no header is needed: Claude and the other major frontier LLMs work out of the box. Qwen3.5 and Qwen3-VL need the client to send GhostDesk-Model-Space: 1000 on every MCP request.

Example MCP client config:

{
  "mcpServers": {
    "ghostdesk": {
      "url": "https://localhost:3000/mcp",
      "headers": {
        "GhostDesk-Model-Space": "1000"
      }
    }
  }
}

Running locally

For self-hosted inference we use and recommend our fork of llama.cpp, which adds WebP decoding and turbo quant on top of upstream: YV17labs/llama.cpp, branch integration/webp-turbo. The day WebP lands upstream we will archive the fork and point there directly.

macOS users: use llama.cpp, not mlx-vlm (as of 2026-04-01). The mlx-vlm stack currently produces inaccurate coordinate outputs for the same models that work correctly under llama.cpp. This is caused by an upstream bug in an Apple dependency, not the model itself. Until the fix lands, llama.cpp is the recommended backend on every platform — including Apple Silicon Macs.

Run whatever local model you like. Three from the Qwen vision family that I've used and that work well for desktop control:

Qwen3.6-35B-A3B — 35B parameters, only 3B active per token.
Qwen3.5-35B-A3B — 35B parameters, only 3B active per token.
Qwen3-VL — the Qwen3 vision-language branch, available in several sizes on the Qwen Hugging Face org.

From one agent to a workforce

Each GhostDesk instance is a container. Spin up one, ten, or a hundred — each agent gets its own isolated desktop, its own apps, its own role. Think of it as hiring a team of digital employees, each with their own workstation.

Scale horizontally

# docker-compose.yml — 3 specialized agents, one command
#
# Prerequisites: the TLS cert + key at ./tls and the two secrets
# (GHOSTDESK_AUTH_TOKEN, GHOSTDESK_VNC_PASSWORD) in your environment or a
# .env file. Generate both exactly as shown in the Secure local run
# section above. See SECURITY.md for the production secret-handling
# contract.

x-ghostdesk-defaults: &ghostdesk-defaults
  image: ghcr.io/yv17labs/ghostdesk:latest
  restart: unless-stopped
  cap_add: [SYS_ADMIN]
  shm_size: 2g
  environment:
    - GHOSTDESK_AUTH_TOKEN
    - GHOSTDESK_VNC_PASSWORD
    - TZ=America/New_York
    - LANG=en_US.UTF-8

services:
  sales-agent:
    <<: *ghostdesk-defaults
    container_name: ghostdesk-sales-agent
    ports: ["3001:3000", "6081:6080"]
    volumes:
      - ghostdesk-sales-agent-home:/home/agent
      - ./tls/server.crt:/etc/ghostdesk/tls/server.crt:ro
      - ./tls/server.key:/etc/ghostdesk/tls/server.key:ro

  research-agent:
    <<: *ghostdesk-defaults
    container_name: ghostdesk-research-agent
    ports: ["3002:3000", "6082:6080"]
    volumes:
      - ghostdesk-research-agent-home:/home/agent
      - ./tls/server.crt:/etc/ghostdesk/tls/server.crt:ro
      - ./tls/server.key:/etc/ghostdesk/tls/server.key:ro

  accounting-agent:
    <<: *ghostdesk-defaults
    container_name: ghostdesk-accounting-agent
    ports: ["3003:3000", "6083:6080"]
    volumes:
      - ghostdesk-accounting-agent-home:/home/agent
      - ./tls/server.crt:/etc/ghostdesk/tls/server.crt:ro
      - ./tls/server.key:/etc/ghostdesk/tls/server.key:ro

volumes:
  ghostdesk-sales-agent-home:
  ghostdesk-research-agent-home:
  ghostdesk-accounting-agent-home:

docker compose up -d   # Your workforce is ready

Each agent runs in parallel, independently, on its own desktop. Connect each to a different LLM, give each a different system prompt, install different apps — full specialization.

Secure by design

Every agent is sandboxed in its own container. No access to the host machine. No access to other agents. Network, filesystem, and process isolation come free from Docker.

This makes GhostDesk a natural fit for enterprises:

Concern	How GhostDesk handles it
Data isolation	Each agent lives in its own container — no shared filesystem, no shared memory
Access control	Restrict network access per agent with Docker networking. An agent with CRM access doesn't see finance tools
Auditability	Watch any agent live via VNC, record sessions, review screenshots
Blast radius	If an agent goes wrong, kill the container. Nothing else is affected
Compliance	No data touches your host. Containers can run in air-gapped environments

Specialize each agent

Give each agent a role, like you would a new hire:

Sales agent — monitors the CRM, enriches leads, updates the pipeline
Research agent — browses the web, compiles competitive intelligence, writes reports
Accounting agent — processes invoices in legacy ERP software, reconciles spreadsheets
QA agent — clicks through your app, files bug reports with screenshots
Support agent — handles tickets, looks up customer info across multiple internal tools

Each agent gets its own system prompt defining its mission, its own installed applications, and its own network permissions. Manage AI agents like employees — each with their own desktop, their own tools, and their own clearance level.

Supervise in real time

Every agent exposes a VNC/noVNC endpoint. Open a browser tab and watch your agent work — or open ten tabs and monitor your entire workforce. Intervene at any time: take over the mouse, correct course, or chat with the orchestrating LLM.

Configuration

Every variable GhostDesk reads is namespaced under GHOSTDESK_*. Standard POSIX variables (TZ, LANG) are kept as-is so the existing Unix ecosystem keeps working.

Secrets (required — container refuses to boot without them)

Variable	Description
`GHOSTDESK_AUTH_TOKEN`	Bearer token required on every MCP request. Generate with `openssl rand -hex 32`.
`GHOSTDESK_VNC_PASSWORD`	Password for wayvnc (username is `agent` in the prod image). Generate with `openssl rand -hex 16`.

Both are plain environment variables. Wire them from your secret store (secretKeyRef on Kubernetes, Docker secrets / Vault / AWS SM on compose) — see SECURITY.md for the full contract.

Runtime knobs

Variable	Default	Description
`GHOSTDESK_PORT`	`3000`	MCP server listening port
`GHOSTDESK_HOST`	`127.0.0.1` (standalone) / `0.0.0.0` (container)	Bind address for the MCP endpoint. Defaults to loopback per MCP transports spec; the container's entrypoint exports `0.0.0.0` so Docker's port-publishing layer can reach it.
`GHOSTDESK_ALLOWED_ORIGINS`	(empty)	Comma-separated list of `Origin` headers accepted from browser clients (e.g. `https://app.example.com,https://localhost:8080`). Non-browser clients (Claude Desktop, SDKs, `curl`) send no `Origin` and are always allowed. Required for any browser-based MCP UI; without it, browser requests are rejected with HTTP 403 to mitigate DNS rebinding (per MCP transports spec).
`GHOSTDESK_TLS_CERT`	`/etc/ghostdesk/tls/server.crt`	Path to the TLS certificate. When the file exists, `websockify` and the MCP server auto-switch to `wss://` / `https://`. See Security.
`GHOSTDESK_TLS_KEY`	`/etc/ghostdesk/tls/server.key`	Path to the TLS private key (matching `GHOSTDESK_TLS_CERT`).
`GHOSTDESK_SCREEN_WIDTH`	`1280`	Virtual screen width in pixels
`GHOSTDESK_SCREEN_HEIGHT`	`1024`	Virtual screen height in pixels
`TZ`	`America/New_York`	IANA timezone (POSIX standard, e.g. `Europe/Paris`)
`LANG`	`en_US.UTF-8`	POSIX locale (e.g. `fr_FR.UTF-8`)

Pinned values (not configurable)

Variable	Value	Rationale
`GHOSTDESK_VNC_ADDRESS`	`127.0.0.1`	wayvnc is locked to loopback inside the container's netns; the VNC port is only reachable via the noVNC bridge on 6080. Override attempts are logged and ignored — see SECURITY.md.

Security

GhostDesk owns two things: transport encryption and authentication. Everything else (rate limiting, SSO, WAF, session recording, brute-force protection, per-user identity on noVNC) is a reverse-proxy concern — the container is designed to run behind one, not directly on the internet.

The full threat model, the Auth ≡ TLS posture switch, the wayvnc RFB-type-2-inside-wss:// rationale, the secrets handling contract, and the exhaustive in-scope / out-of-scope table all live in SECURITY.md — single source of truth. Start there before deploying to anything you don't fully trust.

Reporting a vulnerability? Use GitHub's private security advisory — see SECURITY.md § Reporting.

Troubleshooting

My agent's clicks land off-target by a huge margin

Almost always a coordinate-space mismatch. Frontier models (Claude, GPT-4o, Gemini) need no header (default pass-through); the Qwen vision family needs the client to send GhostDesk-Model-Space: 1000 on every MCP request. Full rationale in Model requirements → Coordinate space.

The container refuses to start with a secrets error

The prod posture (cert mounted) requires both GHOSTDESK_AUTH_TOKEN and GHOSTDESK_VNC_PASSWORD to be set — GhostDesk refuses to boot without them on purpose, to prevent an unauthenticated prod container. Generate them as shown in Secure local run and pass them with -e. The demo posture (no cert) has no such requirement.

noVNC shows a black screen or the desktop renders with graphical glitches

You're probably short on shared memory. Browsers and other GPU-accelerated apps inside the container need a reasonable /dev/shm — --shm-size 2g is the baseline in every example and should not be trimmed. If you already have --shm-size 2g, check the container logs for wayvnc or compositor errors.

Firefox / Electron apps fail to launch or crash immediately

Electron-based apps (VS Code, Slack, Discord…) need Linux user namespaces for their sandbox. Add --cap-add SYS_ADMIN to your docker run (already present in the Secure local run example). Firefox itself works without it.

Custom image

The base tag provides GhostDesk without any pre-installed GUI application — just the virtual desktop, VNC, and the MCP server. Use it to build your own image with only the tools you need:

FROM ghcr.io/yv17labs/ghostdesk:base

RUN apt-get update \
    && apt-get install -y --no-install-recommends \
        chromium-browser \
        libreoffice-calc \
    && rm -rf /var/lib/apt/lists/*

docker build -t my-agent .

See the project's Dockerfile for a complete example.

Tag	Description
`latest`, `X.Y.Z`, `X.Y`	Full image — Firefox, foot terminal, mousepad, galculator, passwordless sudo
`base`, `base-X.Y.Z`, `base-X.Y`	Minimal image — no GUI app, meant to be extended

License

Functional Source License, Version 1.1, ALv2 Future License (FSL-1.1-ALv2) — see LICENSE for the authoritative terms.

What this means in practice (informal summary — the LICENSE file governs; this is not legal advice):

Permitted purposes cover the use cases that matter for the vast majority of users: internal use and access inside your company, non-commercial education and research, and professional services you provide to a licensee who is using GhostDesk in accordance with the license. Self-hosting GhostDesk to run your own agents — even commercial, revenue-generating workflows that power your product — is a permitted internal use.
Competing Use is prohibited. You may not make GhostDesk available to others in a commercial product or service that substitutes for GhostDesk, substitutes for any product or service the project offers using GhostDesk, or provides the same or substantially similar functionality. In short: you cannot take GhostDesk and rebrand it, host it as a paid service, or build a competing desktop-automation-for-agents product from it.
Apache 2.0 in two years. Each released version of GhostDesk becomes available under the Apache License 2.0 on the second anniversary of its release, automatically and irrevocably. The Competing Use restriction only applies for those first two years.

Commercial licensing. If your intended use falls under Competing Use — you want to resell GhostDesk, offer it as a managed service, or build a competing product — contact the maintainers to discuss a commercial license before deploying. Open a GitHub issue or reach out directly; we are happy to talk.

Version	Changes	Urgency	Date
v7.4.2	## What's Changed Server-only secrets stop leaking into the GUI apps the agent launches, the dependency lockfile is refreshed, and the package finally advertises itself as Production/Stable. ### Security - Server auth/VNC secrets scrubbed from launched app environments. `app_launch` handed every GUI child (`firefox`, `mousepad`, …) the full `{**os.environ, ...}`, so `GHOSTDESK_AUTH_TOKEN` (MCP bearer auth) and `GHOSTDESK_VNC_PASSWORD` were inherited by processes that have no business	High	6/10/2026
v7.4.1	Operator-supplied `LANG` (e.g. `fr_CA.UTF-8`, `de_DE.UTF-8`) is honored again at boot. The Ubuntu 26.04 base regression that crashed any non-default locale on `docker run` is neutralized inside the entrypoint. ### Fixed - Container restart loop when `-e LANG=` is set to anything other than `en_US.UTF-8` or `C.UTF-8`. Ubuntu 26.04 (introduced in v7.3.0) ships `rust-coreutils 0.8.0`, whose `icu_collator` panics with `index out of bounds` when `locale-gen` runs while `LC_COLLATE` points at	High	5/19/2026
v7.3.1	## Highlights - Wallpaper renders on Ubuntu 26.04 prod images. `swaybg` 26.04 routes loaders through `libglycin` + `bubblewrap`; Docker's default AppArmor profile blocks `pivot_root` with a message glycin doesn't recognize, so it kept retrying the sandbox and only the fallback colour painted. Shadowed `bwrap` with a stub that emits the message glycin DOES recognize, forcing the no-sandbox loader path. - Wallpaper migrated PNG → SVG and redesigned (Aurora). `swaybg` rasterizes the SVG	High	5/3/2026
v7.3.0	# Highlights - `app_running` MCP tool. Agents now check for already-open windows before `app_launch` — no more duplicate Firefox / foot / mousepad instances stacked across a session. - Idle session watchdog. Server-side cleanup walks the Sway tree and gracefully closes client windows after `GHOSTDESK_IDLE_TIMEOUT` seconds (default 30 min) of inactivity, sparing Sway / mako / wayvnc / the MCP server itself. - Ubuntu 26.04 LTS + noVNC 1.7.0 in both runtime and devcontainer images,	High	5/2/2026
v7.2.0	## Highlights - Reliable `screen_changed` feedback. Input tools no longer return false negatives. Polling now compares the full screen at quarter resolution via a bounding-box ratio, so any real UI change is caught regardless of where it lands — particularly for keyboard actions, where focus is unrelated to the mouse cursor and the previous zone-based check was systematically wrong. - New `mouse_move` tool. Lets agents trigger hover-only UI reactions (CSS `:hover` states, dropdowns t	High	4/22/2026
v7.1.0	Native MCP surfaces the server wasn't exposing yet (resources, lifespan warm-up, icons, tool annotations), stricter HTTP-transport security, finer-grained tool feedback through MCP `notifications/message`, and a consolidated system-level brief delivered through the spec-canonical `instructions` field. ### Added - MCP resources. `ghostdesk://apps` (JSON catalogue of installed GUI apps) and `ghostdesk://clipboard` (current clipboard text) mirror the `app_list` / `clipboard_get` tools so	High	4/19/2026
v7.0.1	### Fixed - Missing `envsubst` in runtime images. `entrypoint.sh` uses `envsubst` to inject `GHOSTDESK_SCREEN_WIDTH` / `GHOSTDESK_SCREEN_HEIGHT` into the Sway config, but the binary was not part of the runtime stack — containers booted into a crash loop (`envsubst: command not found`). Added `gettext-base` to both `docker/base/Dockerfile` and `.devcontainer/Dockerfile`.	High	4/15/2026
v7.0.0	Major platform overhaul: migration from X11 / Openbox to a native Wayland / Sway stack, end-to-end TLS, per-request coordinate model space for mixed frontier + local model fleets, and a simplified agent-first documentation story. ## Highlights - Native Wayland / Sway stack. The devcontainer and runtime images now boot a Wayland session managed by supervisord. `wl-copy` / `wl-paste` replace the X11 clipboard path and `grim` replaces the X11 capture tool. The input stack drops `dotoo	High	4/15/2026
v6.0.0	## New Features - Grid ruler overlay — `screenshot()` now accepts `grid=True` to draw a coordinate ruler in the margins of a region crop (major ticks every 50px on X / 20px on Y, alternating magenta/cyan minor gridlines), letting smaller vision models read click coordinates straight off the labels instead of estimating pixel offsets - Small-model prompt — New dedicated prompt with an explicit click-coordinate recipe and workflow built around the grid ruler, targeted at compact vision	High	4/10/2026
v5.0.0	## New Features - Visual feedback system — Mouse and keyboard actions now return `screen_changed` and `reaction_time_ms`, giving agents immediate confirmation of their interactions - Ruler-based coordinate system — New `screen/rulers.py` produces zoomed screenshots with coordinate rulers (major ticks every 50px, minor ticks every 25px) for precise, reliable targeting - `process_status` tool — New shell tool to inspect the state and logs of processes launched via `launch()` - **	High	4/8/2026
v4.1.0	## New Features - Base Docker image — Introduced a dedicated base Docker image to separate foundational layers from the application image, improving build times and layer caching - Split CI workflow — CI pipeline now builds base and latest images independently, enabling more granular and efficient deployments - Gnome Keyring support — Added `gnome-keyring-daemon` to supervisor for secure credential storage within the container ## Refactoring - Shared Docker scripts — M	Medium	4/7/2026
v4.0.1	## Bug Fixes - Healthcheck reliability — Replaced `curl`-based healthcheck with `supervisorctl status` to verify the MCP server process is running. This eliminates false-negative healthchecks caused by HTTP endpoint timing issues during container startup ## Documentation - Docker examples improved — Added required environment variables (`DISPLAY`, `RESOLUTION`, etc.) to all Docker run/compose examples for easier onboarding - Restart policy — Added `restart: unless-stopped`	Medium	4/6/2026
v4.0.0	## Major Changes - SOM Grounding (Intelligent UI Detection) — Every call to `screenshot()` now returns structured JSON with every detected UI element (buttons, labels, text fields, links) and their exact `(x, y)` click coordinates via OCR (RapidOCR + ONNX Runtime). Result: ~90% click accuracy on large LLMs and medium-sized models (~30B parameters) - `inspect()` tool — Text-only vision — New tool that returns a complete structured view of the screen (elements, windows, cursor, s	Medium	4/6/2026
v3.0.0	## Major Changes - Removed AT-SPI accessibility layer — Models are capable enough to interact with the desktop using screenshots alone. Removed _atspi.py, clickables.py, and system dependencies (python3-gi, gir1.2-atspi-2.0, at-spi2-core, dconf-cli) - New window listing via xdotool — Screenshot now includes open windows with app name, title, and geometry (x, y, width, height) - Standardized API responses — All tools return consistent `{"result": ...}` format. Screenshot metadata	Medium	4/1/2026
v2.1.0	## Improvements ### Screenshot Metadata Cleanup - Removed redundant `active_window` field from screenshot metadata — the active window is already available in the `windows` list - Filtered phantom windows (e.g. Openbox WM with 1x1 pixel geometry) from the `windows` list, reducing metadata noise - Removed the now-unused `get_active_window()` internal function ### Code Quality - Refactored logging/middleware — split into dedicated modules for clarity - **Coerced malformed xy	Medium	3/31/2026
v2.0.0	### 🚀 Breaking Changes #### API Simplification (22 → 13 Tools) This is a MAJOR refactor of the tool surface. The API has been significantly simplified to focus on core functionality: Removed Tools: - Accessibility tools (screen reader, magnifier, etc.) - - Settings configuration tools - - Module-specific tools that duplicated functionality Result: Cleaner, more maintainable API with better focus on screenshot and desktop control #### Tool Updates - **read_screen refac	Medium	3/28/2026
v1.1.0	## What's Changed * Add full AT-SPI role coverage (130/130 roles) by @maltyxx in https://github.com/YV17labs/GhostDesk/pull/2 Full Changelog: https://github.com/YV17labs/GhostDesk/compare/v1.0.1...v1.1.0	Medium	3/25/2026
v1.0.1	## What's Changed * Fix set_clipboard timeout caused by xclip background process by @maltyxx in https://github.com/YV17labs/GhostDesk/pull/1 ## New Contributors * @maltyxx made their first contribution in https://github.com/YV17labs/GhostDesk/pull/1 Full Changelog: https://github.com/YV17labs/GhostDesk/commits/v1.0.1	Medium	3/25/2026

Description

README

Table of contents