Browser automation is almost always written, not typed. You create a test file, import a client library, call methods, run the suite. The feedback loop takes seconds at minimum, longer when there's setup involved. The Vibium CLI offers a different model: a direct line to a live browser, one command at a time.
reload · url · title
attr · value · text
html · a11y-tree
type · press · keys
check · uncheck · select
drag · hover · focus
scroll · mouse
frames · content
pdf · highlight · diff
download · geolocation
media · viewport
window · sleep · wait
eval · dialog · is · upload
install · launch-test
bidi-test · ws-test
paths · completion
add-skill · help · version
Starting the browser
Everything starts with vibium start. This launches a browser session and hands
control back to your terminal. The session stays open until you call stop or
the process exits.
Browser session started (Chromium · headless: false)
$ vibium start --headless
Browser session started (Chromium · headless: true)
daemon is the long-running variant. It starts a browser server in the background
that persists across terminal sessions — useful in CI when you want a warm browser ready before
your test suite begins. stop terminates any running session.
install handles browser binary management. Run vibium install chromium
to download the Chromium build that Vibium is tested against. The three diagnostic commands —
launch-test, bidi-test, and ws-test — verify that the
browser, its BiDi protocol layer, and the WebSocket transport are all healthy. Run them when
something isn't working and you're not sure where the failure is.
Finding and reading elements
find is the primary selector. It accepts four strategies — and the right choice
matters for test stability.
$ vibium find --role textbox --text "Username"
# CSS: fast and familiar, brittle under structural changes
$ vibium find --css "#username"
# XPath: powerful, reach for it when CSS can't express the query
$ vibium find --xpath "//input[@name='login']"
The role+text combination is the most stable selector. It targets elements by their ARIA role and visible label, so it stays green through HTML restructuring as long as the UX intent is unchanged.
map returns a flat list of all interactive elements on the page — roles, labels,
bounding boxes. Run it when you're exploring a page and aren't sure what's there. It's also
what AI agents see when they read a page through Vibium's MCP server.
count returns the number of elements matching a selector. attr reads
a named HTML attribute from the found element: vibium attr href --css "a.primary".
value reads the current value of an input field. text returns
the visible text content. html returns the raw inner HTML.
a11y-tree dumps the full accessibility tree of the page. This is the structured
representation that screen readers and AI agents consume — useful for auditing ARIA roles,
labels, and states that aren't visible in the DOM but matter for accessibility and agent
automation.
Interacting with elements
Fourteen commands. Most interactions fit into three: fill for form inputs,
click for buttons and links, press for keyboard events.
$ vibium fill --css "#password" "your-password"
$ vibium click --role button --text "Sign in"
fill is the right command for text inputs. It clears the field and enters the value
in one step, triggering the input and change events that single-page
apps watch for. type is the character-by-character variant — reach for it when you
need to exercise autocomplete behavior that fires on each keypress.
press fires a named key event without targeting a specific element:
vibium press Enter submits the focused form. keys sends a string of
keystrokes one at a time — useful for keyboard shortcut testing:
vibium keys "Control+Shift+P".
check and uncheck handle checkboxes and radio buttons.
select picks an option from a <select> dropdown by visible label
or value attribute.
drag drags from one element to another — pass two selectors.
hover moves the mouse over an element without clicking, useful for triggering CSS
hover states and tooltip visibility. focus focuses an element via the tab order,
not via mouse — the way keyboard users actually navigate.
scroll scrolls the page or a specific container. mouse drops down to
raw pointer control: vibium mouse move 640 400, vibium mouse down,
vibium mouse up — for when higher-level commands can't reach the interaction.
Pages and frames
The browser can have multiple pages open simultaneously. pages lists them;
page switches the active context to a specific one by index.
[0] https://github.com/login (active)
[1] https://docs.github.com
$ vibium page 1
Switched to page 1: https://docs.github.com
When JavaScript opens a new tab — via window.open or a link with
target="_blank" — it appears in the pages list immediately. This is
one of the patterns that trips people up: after clicking a link that opens a new tab, you need
to call vibium page 1 before any subsequent commands will see the new page.
frames lists all iframes on the current page. frame switches the
active context into a specific one by name, ID, or URL pattern — all subsequent find and interact
commands operate inside that frame until you switch back.
content replaces the entire page HTML from a string. Pass it a self-contained HTML
fragment and Vibium renders it in the browser. Useful for testing isolated components without
spinning up a dev server.
Capturing output
screenshot captures the current viewport to a PNG. Pass --full-page
to capture the full scrollable document, or --selector to crop to a specific
element.
$ vibium screenshot --full-page --path ./full.png
$ vibium screenshot --selector ".hero" --path ./hero.png
record captures a video of the session.
vibium record start --path ./session.webm begins recording;
vibium record stop writes the file. Recording runs in the background — you can
continue interacting with the page while it captures.
pdf renders the current page to a PDF using the browser's print renderer.
CSS @media print rules apply, so pages that have print-specific styles will look
different here than in a screenshot.
highlight draws a visible bounding box around a matched element and takes a
screenshot. Useful for documenting bugs — the screenshot shows exactly which element was
targeted, annotated in place.
diff compares the current screenshot against a baseline image and returns a
pixel-difference score. The classic visual regression check: run once to establish the baseline,
then on every build to catch unintended visual changes.
Managing state
The largest category — 13 commands that handle session state, timing, and the tricky interaction patterns that headless test runners routinely struggle with.
cookies has get, set, and delete subcommands.
Use it to seed authentication state before a test rather than logging in through the UI on every
run. storage works the same way for localStorage and
sessionStorage.
$ vibium storage local set authToken eyJhbGci...
viewport sets the browser viewport dimensions before navigation — the right
way to simulate a specific device size. geolocation sets the reported GPS position.
media overrides the media type and color scheme: vibium media --type print
or vibium media --color dark.
wait polls a condition and blocks until it's met. Prefer it over
sleep: vibium wait --selector ".spinner" --state hidden waits for
a loading indicator to disappear without burning a fixed delay. sleep is the
unconditional pause — sometimes necessary, never preferable.
eval executes JavaScript in the page context and prints the result:
vibium eval "window.innerWidth". dialog handles browser dialogs
before they block: vibium dialog accept and vibium dialog dismiss
cover alerts, confirms, and prompts.
is checks a boolean property on an element and exits with code 0 for true,
1 for false — composable with shell conditionals:
vibium is enabled --role button --text "Submit" && vibium click ....
upload populates a file input without triggering the OS file picker:
vibium upload --css "input[type=file]" ./document.pdf.
download captures a file download triggered by the page and saves it to a local path.
Dev tools and extensions
mcp is the command that changes Vibium's category. It starts Vibium as a
Model Context Protocol server, exposing the full command surface as structured tools that
Claude, Cursor, and other AI agents can call directly.
MCP server running — 85 tools registered
Listening on stdio
Once vibium mcp is running, an AI agent sees every browser automation operation
as a tool call. Navigate, find, interact, capture — the same 66-command surface, but now
callable from natural-language instructions. This is the bridge between scripted test automation
and agentic browser control.
serve starts Vibium as an HTTP API server — the same capabilities over REST,
useful when you need to drive the browser from a process that can't use stdio.
pipe accepts a stream of commands from stdin, one per line, for scripting complex
multi-step sequences without a shell loop.
paths prints the filesystem locations Vibium uses — browser binaries, session data,
log files — useful for debugging installation issues. completion generates shell
completion scripts for bash, zsh, and fish.
add-skill installs a skill package — a named collection of command sequences that
extends what the CLI can do. Skills are how custom automation patterns get distributed and
reused across teams.
66 commands. 8 categories. One binary that handles everything from a simple page navigation to a full agentic browser session.
The CLI is the fastest way to get hands-on with Vibium — no imports, no test runner, no build
step. Type a command, see what happens. Every browser automation pattern you build in the CLI —
navigate, find, interact, capture — has an equivalent in the TypeScript/JavaScript, Python, and Java clients.
And vibium mcp exposes that same surface to AI agents as structured tool calls.
The next issue covers the MCP server — all 85 tools, how to wire it into Claude and Cursor, and the patterns that work best for agent-driven browser automation.