Issue No. 01: The Vibium CLI

Browser automation is almost always written, not typed. You create a test file, import a client library, call methods, run the suite. The feedback loop takes seconds at minimum, longer when there's setup involved. The Vibium CLI offers a different model: a direct line to a live browser, one command at a time.

Navigate

go · back · forward
reload · url · title

Find & Read

find · map · count
attr · value · text
html · a11y-tree

Interact

click · dblclick · fill
type · press · keys
check · uncheck · select
drag · hover · focus
scroll · mouse

Pages & Frames

page · pages · frame
frames · content

Capture

screenshot · record
pdf · highlight · diff

State

cookies · storage
download · geolocation
media · viewport
window · sleep · wait
eval · dialog · is · upload

Browser

start · stop · daemon
install · launch-test
bidi-test · ws-test

Dev Tools

mcp · serve · pipe
paths · completion
add-skill · help · version

Starting the browser

start · stop · daemon · install · launch-test · bidi-test · ws-test

Everything starts with vibium start. This launches a browser session and hands control back to your terminal. The session stays open until you call stop or the process exits.

$ vibium start

Browser session started (Chromium · headless: false)

$ vibium start --headless

Browser session started (Chromium · headless: true)

daemon is the long-running variant. It starts a browser server in the background that persists across terminal sessions — useful in CI when you want a warm browser ready before your test suite begins. stop terminates any running session.

install handles browser binary management. Run vibium install chromium to download the Chromium build that Vibium is tested against. The three diagnostic commands — launch-test, bidi-test, and ws-test — verify that the browser, its BiDi protocol layer, and the WebSocket transport are all healthy. Run them when something isn't working and you're not sure where the failure is.

Navigating the page

go · back · forward · reload · url · title

go navigates to a URL and waits for the page to reach network-idle before returning. That implicit wait is load-bearing: because go blocks until the page settles, you can call find immediately after without sleeping.

$ vibium go https://github.com/login

Navigated to https://github.com/login

$ vibium title

Sign in to GitHub · GitHub

$ vibium url

https://github.com/login

back and forward navigate the browser history — the same as clicking the browser's back and forward buttons, but scriptable. reload refreshes the current page; pass --hard to skip the cache.

url and title are read-only — they print the current URL and document title to stdout. Pipe them into other tools or use them as assertions in shell scripts: [ "$(vibium title)" = "Dashboard" ] && echo "pass".

Finding and reading elements

find · map · count · attr · value · text · html · a11y-tree

find is the primary selector. It accepts four strategies — and the right choice matters for test stability.

# Role + text: most stable across HTML refactors

$ vibium find --role textbox --text "Username"

# CSS: fast and familiar, brittle under structural changes

$ vibium find --css "#username"

# XPath: powerful, reach for it when CSS can't express the query

$ vibium find --xpath "//input[@name='login']"

The role+text combination is the most stable selector. It targets elements by their ARIA role and visible label, so it stays green through HTML restructuring as long as the UX intent is unchanged.

map returns a flat list of all interactive elements on the page — roles, labels, bounding boxes. Run it when you're exploring a page and aren't sure what's there. It's also what AI agents see when they read a page through Vibium's MCP server.

count returns the number of elements matching a selector. attr reads a named HTML attribute from the found element: vibium attr href --css "a.primary". value reads the current value of an input field. text returns the visible text content. html returns the raw inner HTML.

a11y-tree dumps the full accessibility tree of the page. This is the structured representation that screen readers and AI agents consume — useful for auditing ARIA roles, labels, and states that aren't visible in the DOM but matter for accessibility and agent automation.

Interacting with elements

click · dblclick · fill · type · press · keys · check · uncheck · select · drag · hover · focus · scroll · mouse

Fourteen commands. Most interactions fit into three: fill for form inputs, click for buttons and links, press for keyboard events.

$ vibium fill --css "#login_field" "your-username"

$ vibium fill --css "#password" "your-password"

$ vibium click --role button --text "Sign in"

fill is the right command for text inputs. It clears the field and enters the value in one step, triggering the input and change events that single-page apps watch for. type is the character-by-character variant — reach for it when you need to exercise autocomplete behavior that fires on each keypress.

press fires a named key event without targeting a specific element: vibium press Enter submits the focused form. keys sends a string of keystrokes one at a time — useful for keyboard shortcut testing: vibium keys "Control+Shift+P".

check and uncheck handle checkboxes and radio buttons. select picks an option from a <select> dropdown by visible label or value attribute.

drag drags from one element to another — pass two selectors. hover moves the mouse over an element without clicking, useful for triggering CSS hover states and tooltip visibility. focus focuses an element via the tab order, not via mouse — the way keyboard users actually navigate.

scroll scrolls the page or a specific container. mouse drops down to raw pointer control: vibium mouse move 640 400, vibium mouse down, vibium mouse up — for when higher-level commands can't reach the interaction.

Pages and frames

page · pages · frame · frames · content

The browser can have multiple pages open simultaneously. pages lists them; page switches the active context to a specific one by index.

$ vibium pages

[0] https://github.com/login (active)

[1] https://docs.github.com

$ vibium page 1

Switched to page 1: https://docs.github.com

When JavaScript opens a new tab — via window.open or a link with target="_blank" — it appears in the pages list immediately. This is one of the patterns that trips people up: after clicking a link that opens a new tab, you need to call vibium page 1 before any subsequent commands will see the new page.

frames lists all iframes on the current page. frame switches the active context into a specific one by name, ID, or URL pattern — all subsequent find and interact commands operate inside that frame until you switch back.

content replaces the entire page HTML from a string. Pass it a self-contained HTML fragment and Vibium renders it in the browser. Useful for testing isolated components without spinning up a dev server.

Capturing output

screenshot · record · pdf · highlight · diff

screenshot captures the current viewport to a PNG. Pass --full-page to capture the full scrollable document, or --selector to crop to a specific element.

$ vibium screenshot --path ./login.png

$ vibium screenshot --full-page --path ./full.png

$ vibium screenshot --selector ".hero" --path ./hero.png

record captures a video of the session. vibium record start --path ./session.webm begins recording; vibium record stop writes the file. Recording runs in the background — you can continue interacting with the page while it captures.

pdf renders the current page to a PDF using the browser's print renderer. CSS @media print rules apply, so pages that have print-specific styles will look different here than in a screenshot.

highlight draws a visible bounding box around a matched element and takes a screenshot. Useful for documenting bugs — the screenshot shows exactly which element was targeted, annotated in place.

diff compares the current screenshot against a baseline image and returns a pixel-difference score. The classic visual regression check: run once to establish the baseline, then on every build to catch unintended visual changes.

Managing state

cookies · storage · download · geolocation · media · viewport · window · sleep · wait · eval · dialog · is · upload

The largest category — 13 commands that handle session state, timing, and the tricky interaction patterns that headless test runners routinely struggle with.

cookies has get, set, and delete subcommands. Use it to seed authentication state before a test rather than logging in through the UI on every run. storage works the same way for localStorage and sessionStorage.

$ vibium cookies set --name session --value abc123 --domain .example.com

$ vibium storage local set authToken eyJhbGci...

viewport sets the browser viewport dimensions before navigation — the right way to simulate a specific device size. geolocation sets the reported GPS position. media overrides the media type and color scheme: vibium media --type print or vibium media --color dark.

wait polls a condition and blocks until it's met. Prefer it over sleep: vibium wait --selector ".spinner" --state hidden waits for a loading indicator to disappear without burning a fixed delay. sleep is the unconditional pause — sometimes necessary, never preferable.

eval executes JavaScript in the page context and prints the result: vibium eval "window.innerWidth". dialog handles browser dialogs before they block: vibium dialog accept and vibium dialog dismiss cover alerts, confirms, and prompts.

is checks a boolean property on an element and exits with code 0 for true, 1 for false — composable with shell conditionals: vibium is enabled --role button --text "Submit" && vibium click ....

upload populates a file input without triggering the OS file picker: vibium upload --css "input[type=file]" ./document.pdf. download captures a file download triggered by the page and saves it to a local path.

Dev tools and extensions

mcp · serve · pipe · paths · completion · add-skill · help · version

mcp is the command that changes Vibium's category. It starts Vibium as a Model Context Protocol server, exposing the full command surface as structured tools that Claude, Cursor, and other AI agents can call directly.

$ vibium mcp

MCP server running — 85 tools registered

Listening on stdio

Once vibium mcp is running, an AI agent sees every browser automation operation as a tool call. Navigate, find, interact, capture — the same 66-command surface, but now callable from natural-language instructions. This is the bridge between scripted test automation and agentic browser control.

serve starts Vibium as an HTTP API server — the same capabilities over REST, useful when you need to drive the browser from a process that can't use stdio. pipe accepts a stream of commands from stdin, one per line, for scripting complex multi-step sequences without a shell loop.

paths prints the filesystem locations Vibium uses — browser binaries, session data, log files — useful for debugging installation issues. completion generates shell completion scripts for bash, zsh, and fish.

add-skill installs a skill package — a named collection of command sequences that extends what the CLI can do. Skills are how custom automation patterns get distributed and reused across teams.

66 commands. 8 categories. One binary that handles everything from a simple page navigation to a full agentic browser session.

The CLI is the fastest way to get hands-on with Vibium — no imports, no test runner, no build step. Type a command, see what happens. Every browser automation pattern you build in the CLI — navigate, find, interact, capture — has an equivalent in the TypeScript/JavaScript, Python, and Java clients. And vibium mcp exposes that same surface to AI agents as structured tool calls.

The next issue covers the MCP server — all 85 tools, how to wire it into Claude and Cursor, and the patterns that work best for agent-driven browser automation.

Up next · Issue No. 02

The Vibium MCP server — all 85 tools.

→