AI Best Practices

Playwright Agent CLI

The official Microsoft binary + its Skills to drive Playwright from an AI agent. Practitioner's feedback in progress on a personal project: visual QA that happens inside the conversation.

Tool designed by Microsoft Playwright Team· 2026-06-13

What it is

Playwright Agent CLI (`@playwright/agent-cli`) is an official binary published by the Playwright team (Microsoft) that exposes browser control — Chromium, Firefox, WebKit — to an AI agent through a protocol inspired by MCP. The CLI runs alongside the agent; the agent connects to it to click, wait for selectors, screenshot, check text.

The second official pillar: Playwright Skills (`@playwright/skills`), a set of installable skills on the agent side that teach it how to drive the CLI effectively — wait patterns, selector strategies, timeout handling, screenshot manipulation. Without these skills, the agent has to reinvent everything; with them, the usage pattern is framed.

Why I picked it up

On my projects, visual QA is a friction point. Locally testing a component that depends on scroll, hover, responsive layout, takes time manually and is annoying to automate without good tooling. Playwright Agent CLI moves the test into the conversation: no tool switch, no separate test file to write, the agent clicks and observes.

Three concrete reasons. First, it's official Microsoft, so maintained — not a community wrapper that can disappear. Second, it integrates natively with Claude Code via the matching skill, no manual plumbing to set up. Finally, it replaces 80% of the e2e tests I used to write manually during dev — the final Playwright tests become an export of the flow validated in conversation.

How it works

Two-step installation. The global CLI via npm, started as a daemon in a dedicated terminal. Then the skill on the Claude Code side, which auto-loads usage context.

# CLI (dedicated terminal)
npm install -g @playwright/agent-cli
playwright-agent serve

# Skill (Claude Code)
claude skill install @playwright/skills

Once the CLI is running, the agent accesses it through a communication protocol. Concretely, in Claude Code, you say "open https://localhost:3000, click the Contact button and tell me what you see". The agent drives the CLI, screenshots the page, and answers by analyzing the image. All from the same conversation.

Concrete use cases

Five cases where it changes daily life on a React/Next.js project.

  • Visual validation during a UI refresh. I modify a component, I ask the agent: "navigate to /pricing, screenshot the 3 price cards, tell me if the vertical spacing is consistent". The agent answers by looking at the image, I correct, we iterate.
  • Regression catcher between 2 commits. Before merging a CSS refactor PR, the agent screenshots 5 key routes before and after — if something moved without intent, we see it.
  • Production smoke after deploy. After each release, the agent visits the 3 critical routes (home, contact, checkout) and verifies they show the right content. No need to spin up a parallel test environment.
  • Cross-browser e2e tests driven in natural language. "Redo the same contact flow on Firefox, then WebKit, tell me if anything diverges". The agent iterates over the 3 browsers without me having to write 3 test files.
  • Guided visual debugging. "The /contact form is broken in prod, look and tell me why". The agent screenshots, analyzes the React error in the console, suggests a fix. Much faster than a round trip with support.

The common pattern: QA is no longer a separate step after dev, it's the dialogue itself.

Limits observed so far

Three visible limits at this stage of the personal project. First, complex selectors — shadow DOM, nested iframes, non-standard custom components — sometimes require manual intervention. The skill helps but doesn't perform miracles on a poorly-built third-party widget.

Then, the agent can over-screenshot and blow up the conversational context. Each image consumes heavy tokens. I haven't yet found a stable pattern to limit this automatically — for now I systematically specify "screenshot only the form area". Finally, sensitive authenticated flows (real client login, payment) require a dedicated sandbox strategy I haven't set up yet.

To go further: the official Microsoft documentation is exhaustive on the API and patterns.

Sources & credits