Windows 365 for Agents MCP server reference (preview)

Important

This is a preview feature.
Preview features aren't meant for production use and might have restricted functionality. These features are subject to supplemental terms of use, and are available before an official release so that customers can get early access and provide feedback.

Windows 365 for Agents is an MCP server for full operational control of a Windows 365 cloud PC. Use this MCP server to drive a real Windows environment through desktop interaction (mouse, keyboard, screen capture, command execution), browser automation via Microsoft Edge, and semantic UI inspection via Windows UI Automation.

Note

Existing connections that use previous versions of Microsoft MCP servers remain supported.
For all new connections, use the latest Windows 365 Agents MCP server, which exposes tools across desktop, browser, and accessibility capabilities.
Browser automation operates on Microsoft Edge. Edge launches automatically on the first browser tool call. focus_browser can also target Chrome or Firefox, but DOM-level browser tools only operate on the Edge instance.

To learn more about Windows 365 for Agents, see Windows 365 for Agents documentation.

Overview

Server ID	Display name	Description
`mcp_W365AServer`	Windows 365 for Agents MCP server	Full operational control of a Windows 365 cloud PC, including desktop interaction, browser automation, and UI inspection.

Available tools

mcp_desktop_move_mouse

Move the cursor to a screen position. Use mcp_desktop_click instead if you intend to click at the destination. Required parameters:

x: X coordinate in screen pixels
y: Y coordinate in screen pixels

mcp_desktop_click

Click at a position, or at the current cursor location if coordinates are omitted. Supports single-click, double-click, and all five mouse buttons.

Optional parameters:

x: X coordinate in screen pixels (omit for current position)
y: Y coordinate in screen pixels (omit for current position)
button: Left, Right, Middle, Forward, or Backward (default Left)
clickCount: 1 = single click, 2 = double click (default 1)

mcp_desktop_get_cursor_position

Return the current cursor coordinates. No parameters. Returns {cursorX, cursorY}.

mcp_desktop_drag_mouse

Drag from one position to another. Useful for moving objects, resizing windows, or pixel-precise scrolling. Required parameters:

startX: Start X coordinate.
startY: Start Y coordinate.
endX: End X coordinate.
endY: End Y coordinate. Optional parameters:
button: Left, Right, or Middle (default is Left)

mcp_desktop_scroll

Scroll at a position using notch units (not pixels). Three notches is approximately one page.

Required parameters:

x: Scroll position X
y: Scroll position Y

Optional parameters:

deltaX: Horizontal notches, positive = right (default 0)
deltaY: Vertical notches, positive = down (default 0)

Note

Values are clamped to the range [-20, 20].

mcp_desktop_type_text

Type text via keyboard simulation. For keyboard shortcuts, use mcp_desktop_press_keys. For web form fields, use mcp_browser_type.

Required parameters:

text: Text to type

mcp_desktop_press_keys

Press a key combination simultaneously. Supports modifier keys, function keys, and standard keys.

Required parameters:

keys: Array of key names to press together (for example, \["ctrl","c"\], \["alt","tab"\], \["ctrl","shift","s"\])

mcp_desktop_take_screenshot

Capture the full screen or a cropped region as a PNG image (base64-encoded).

Optional parameters:

x: Crop region left edge
y: Crop region top edge
width: Crop region width
height: Crop region height

Note

Provide all four crop parameters together, or omit all four for a full-screen capture.

mcp_desktop_analyze_screen

Perform OCR on the entire screen. No parameters. Returns {fullText, averageConfidence, boxes[{text, confidence, x, y, width, height}], width, height}.

mcp_desktop_get_screen_size

Return the screen resolution. No parameters. Returns {width, height}.

mcp_desktop_list_windows

List all visible windows with their titles, positions, and dimensions. No parameters. Returns an array of {title, processName, handle, x, y, width, height}.

mcp_desktop_activate_window

Bring a window to the foreground using a fuzzy title match.

Required parameters:

titlePattern: Partial window title (case-insensitive substring)

mcp_desktop_focus_browser

Focus a browser window (Edge, Chrome, or Firefox), optionally filtered by URL or title.

Optional parameters:

pattern: URL or title substring to match (omit for any browser window)

mcp_desktop_close_window

Gracefully close a window by fuzzy title match. System-critical processes are protected and cannot be closed.

Required parameters:

titlePattern: Partial window title (80% match threshold). Returns {matchedTitle, processName, closed}.

mcp_desktop_execute_shell_command

Run a shell command in a sandboxed environment. Commands are validated against an allow list and dangerous patterns are blocked.

Required parameters:

command: Command to execute

Optional parameters:

cwd: Working directory
timeoutMs: Timeout in milliseconds (default 30000, max 30000)

Note

Allowed commands: git, npm, dotnet, python, cargo, node, pip, dir, mkdir, del, copy, move, robocopy, findstr, where, and type.
Blocked patterns include shell metacharacters (|, ;, &, <, >), environment variable expansion (%VAR%), interpreter eval flags (python -c or node -e), git config --global, npm -g, path-prefixed executables, rm -rf, sudo, and disk/system commands.
stdout and stderr are each truncated at 32 KB. Use mcp_desktop_execute_python_code for arbitrary computation. Returns {stdout, stderr, exitCode, success, timedOut, resourceLimitsApplied}.

mcp_desktop_execute_python_code

Execute Python code in a sandboxed environment with resource limits. Ideal for data processing, calculations, file I/O, and any computation that goes beyond simple shell commands.

Required parameters:

code: Python code (max 262,144 characters).

Optional parameters:

cwd: Working directory
timeoutMs: Timeout in milliseconds (default 30000, max 30000).

Returns the same schema as mcp_desktop_execute_shell_command

mcp_desktop_wait_milliseconds

Pause execution to allow animations or transitions to complete. Do not use in polling loops—use mcp_browser_wait_for for DOM polling.

Required parameters:

ms: Wait duration in milliseconds (clamped to [0, 5000])

mcp_browser_navigate

Navigate to a URL and wait for the page to load.

Required parameters:

url: Full URL including protocol (for example, https://example.com)

mcp_browser_back

Navigate back in browser history. No parameters.

mcp_browser_forward

Navigate forward in browser history. No parameters.

mcp_browser_reload

Reload the current page. No parameters.

mcp_browser_get_url

Return the current page URL as a plain string. No parameters.

mcp_browser_get_title

Return the current page title as a plain string. No parameters.

mcp_browser_get_text

Return the visible page text content as a plain string. No parameters. Truncated at 512 KB.

mcp_browser_get_html

Return the full page HTML source as a plain string. No parameters. Truncated at 512 KB.

mcp_browser_click

Click a DOM element by CSS selector. More reliable than coordinate-based clicking for web content.

Required parameters:

selector: CSS selector (for example, #submit-btn or a.nav-link)

mcp_browser_type

Type text into a form element by CSS selector.

Required parameters:

selector: CSS selector of the input element.
text: Text to type

mcp_browser_query_text

Get the text content of the first element matching a CSS selector.

Required parameters:

selector: CSS selector

mcp_browser_wait_for

Wait for a DOM element to appear. Useful for dynamic content that loads asynchronously.

Required parameters:

selector: CSS selector to wait for

Optional parameters:

timeoutMs: Timeout in milliseconds (default 5000, max 30000)

mcp_browser_eval_js

Evaluate a JavaScript expression in the page context and return the result as a string.

Required parameters:

expression: JavaScript expression that returns a string

Note

If your expression returns an object or number, convert it to a string explicitly (for example, JSON.stringify(obj) or .toString()).

mcp_browser_list_tabs

List all open tabs with their index, title, and URL. No parameters. Returns an array of {index, title, url}.

mcp_browser_switch_tab

Switch to a tab by index.

Required parameters:

tabIndex: 0-based tab index

mcp_browser_new_tab

Open a new tab, optionally navigating to a URL.

Optional parameters:

url: URL to open (blank tab if omitted)

Returns {index, title, url}.

mcp_browser_close_tab

Close a tab by index.

Required parameters:

tabIndex: 0-based tab index

mcp_browser_screenshot

Capture a PNG screenshot of the browser viewport only (not the full screen). No parameters. Returns a base64-encoded PNG.

mcp_accessibility_get_accessibility_tree

Retrieve the UI element tree for the foreground window. Each element includes its role, name, value, and screen coordinates.

Optional parameters:

maxDepth: Maximum tree traversal depth, 1-10 (default 3)
maxElements: Maximum elements to return, 1-2000 (default 500)

Returns a hierarchical tree of {role, name, value, x, y, width, height, children[...]}.

mcp_accessibility_find_ui_element

Search for UI elements by text content, accessibility role, or name (case-insensitive substring). Returns matching elements with their clickable screen coordinates.

Optional parameters:

text: Text to search for (used as name if name omitted)
role: UI role filter — Button, TextBox, CheckBox, MenuItem, ComboBox, and more
name: Accessible name (takes precedence over text if both provided)
windowHandle: Target window handle (null = foreground window)

Note

At least one of text, role, or name must be provided. Returns an array of {role, name, value, x, y, width, height}.

Key features

Desktop interaction

Click, double-click, right-click, and five-button mouse control.
Pixel-precise drag and drop.
Notch-based scrolling (three notches ≈ one page).
Keyboard typing and multi-key shortcut combos.
Cursor position tracking.
Screen resolution detection.

Screen capture and analysis

Full-screen or cropped PNG screenshots.
OCR of the full screen with per-region confidence scores and bounding boxes.
Browser-viewport-only screenshots for web content.

Window management

Enumerate all visible windows with positions and dimensions.
Activate windows by fuzzy title match.
Focus browser windows (Edge, Chrome, Firefox) optionally filtered by URL or title.
Graceful window close with protection for system-critical processes.

Command execution

Sandboxed shell commands with an allow list (git, npm, dotnet, python, cargo, node, pip, dir, mkdir, del, copy, move, robocopy, findstr, where, type).
Sandboxed Python execution up to 262,144 characters of code.
Working-directory and per-call timeout control (max 30 seconds).
Resource limits and hardened block list against shell metacharacters, eval flags, privilege escalation, and destructive operations.

Browser automation

Navigate, back, forward, reload.
Read pageURL, title, visible text (512 KB cap), and full HTML (512 KB cap).
DOM-level click, type, and text query by CSS selector.
Wait for dynamic elements with configurable timeout.
Evaluate JavaScript expressions in the page context.
Multi-tab management: list, switch, open, close.
Runs on Microsoft Edge, launched automatically on first use.

UI accessibility

Retrieve the Windows UI Automation tree for the foreground window with configurable depth and element count.
Find UI elements by text, role, or accessible name.
Returns clickable screen coordinates for precise targeting of buttons, text boxes, checkboxes, menu items, and combo boxes.

Timing and synchronization

Short one-shot pauses via mcp_desktop_wait_milliseconds (max five seconds).
DOM-level polling via mcp_browser_wait_for (max 30 seconds).

Notes

All coordinates are in screen pixels with (0,0) at the top-left corner. Coordinates from mcp_desktop_take_screenshot, mcp_desktop_analyze_screen, mcp_accessibility_find_ui_element, and mcp_desktop_list_windows all share the same coordinate space.
A cursor failsafe is active: If the cursor moves within five pixels of any screen corner, mouse operations are cancelled. Avoid targeting the extreme edges of the screen.
Shell pipe operators (|), semicolons (;), ampersands (&), and output redirection (>, <) are blocked. To transform command output, capture it and process it with mcp_desktop_execute_python_code.
If interpreter eval flags are blocked or if python -c "..." and node -e "..." are rejected, you can use mcp_desktop_execute_python_code for Python code, or write code to a file first.
Command stdout/stderr is truncated at 32 KB each. Use flags to limit verbose output (for example, git log --oneline -20) or redirect to a file and read it separately.
Maximum timeout for mcp_desktop_execute_shell_command and mcp_desktop_execute_python_code is 30 seconds. For longer work, break it into smaller steps or launch a background process from Python and poll.
There is no dedicated file read/write tool. Read files with mcp_desktop_execute_shell_command using the type command; write files with mcp_desktop_execute_python_code using Python's built-in file I/O. Shell output redirection (>, >>) is blocked.
mcp_browser_eval_js always returns a string. Convert objects or numbers explicitly before returning.
Browser DOM tools (mcp_browser_click, mcp_browser_type, mcp_browser_eval_js, etc.) operate only on the Microsoft Edge instance. mcp_desktop_focus_browser can focus Chrome or Firefox windows, but DOM tools will not target them.
mcp_desktop_take_screenshot requires all four crop parameters (x, y, width, height) together, or none for a full-screen capture.
mcp_desktop_scroll uses notch units (clamped to [-20, 20]), not pixels. Three notches is approximately one page.
mcp_accessibility_find_ui_element requires at least one of text, role, or name. When both text and name are provided, name takes precedence.

Common use cases

Fill out a web form

Call mcp_browser_navigate to open the target page.
Call mcp_browser_wait_for to wait for the form to load.
Call mcp_browser_type to fill each field by CSS selector.
Call mcp_browser_click to submit the form.
Call mcp_browser_wait_for to wait for the confirmation element.
Call mcp_browser_get_text to read and verify the result.

Automate a desktop application

Call mcp_desktop_activate_window to bring the application to the foreground.
Call mcp_desktop_take_screenshot to capture the current state.
Call mcp_accessibility_find_ui_element to locate a button or field by name.
Call mcp_desktop_click on the element's reported coordinates.
Call mcp_desktop_type_text to enter data.
Call mcp_desktop_press_keys for shortcuts (for example, ["ctrl","s"] to save).
Call mcp_desktop_take_screenshot to verify the result.

Extract data from a web page

Call mcp_browser_navigate to open the page.
Call mcp_browser_get_text to extract visible text content.
Call mcp_desktop_execute_python_code to parse and process the extracted data.
Call mcp_browser_eval_js to query specific values via JavaScript when text extraction isn't enough.

Run development tasks

Call mcp_desktop_execute_shell_command for git pull, npm install, and dotnet build.
Call mcp_desktop_take_screenshot to capture build output.
Call mcp_desktop_execute_python_code to analyze logs or test results.
Call mcp_browser_navigate to open a local dev server in the browser.
Call mcp_browser_screenshot to capture the rendered page.

Read and write files

Read a file with mcp_desktop_execute_shell_command using type C:\path\to\file.txt.
Write a file with mcp_desktop_execute_python_code using Python's open(...) and write(...).
Verify with mcp_desktop_execute_shell_command using dir C:\path\to\output.txt.

Navigate complex UI with accessibility

Call mcp_accessibility_get_accessibility_tree to understand the full UI structure.
Call mcp_accessibility_find_ui_element to find a specific control (for example, role: "MenuItem", name: "Settings").
Call mcp_desktop_click using the element's reported coordinates.
Call mcp_accessibility_find_ui_element again to find the next control in the dialog.
Call mcp_desktop_type_text or mcp_desktop_click to interact with it.

Keep a long-running session alive

Send any MCP request at least once every 30 minutes to prevent idle eviction.
mcp_desktop_get_screen_size is lightweight and works well as a heartbeat.

Feedback

Was this page helpful?

Last updated on 2026-05-06

Windows 365 for Agents MCP server reference (preview)

Overview

Available tools

mcp_desktop_move_mouse

mcp_desktop_click

mcp_desktop_get_cursor_position

mcp_desktop_drag_mouse

mcp_desktop_scroll

mcp_desktop_type_text

mcp_desktop_press_keys

mcp_desktop_take_screenshot

mcp_desktop_analyze_screen

mcp_desktop_get_screen_size

mcp_desktop_list_windows

mcp_desktop_activate_window

mcp_desktop_focus_browser

mcp_desktop_close_window

mcp_desktop_execute_shell_command

mcp_desktop_execute_python_code

mcp_desktop_wait_milliseconds

mcp_browser_navigate

mcp_browser_back

mcp_browser_forward

mcp_browser_reload

mcp_browser_get_url

mcp_browser_get_title

mcp_browser_get_text

mcp_browser_get_html

mcp_browser_click

mcp_browser_type

mcp_browser_query_text

mcp_browser_wait_for

mcp_browser_eval_js

mcp_browser_list_tabs

mcp_browser_switch_tab

mcp_browser_new_tab

mcp_browser_close_tab

mcp_browser_screenshot

mcp_accessibility_get_accessibility_tree

mcp_accessibility_find_ui_element

Key features

Desktop interaction

Screen capture and analysis

Window management

Command execution

Browser automation

UI accessibility

Timing and synchronization

Notes

Common use cases

Fill out a web form

Automate a desktop application

Extract data from a web page

Run development tasks

Read and write files

Navigate complex UI with accessibility

Keep a long-running session alive

Feedback

Additional resources