AI & Machine Learning

Browser Agent

Andrius Putna
#browser-agent#computer-use#automation#playwright#claude-computer-use

A lot of work doesn’t have an API — supplier portals, insurance claim sites, legacy admin panels, government forms, internal vendor tools. Until 2024 the options were hire a person or break your skull on a Selenium script. Computer-use models — Claude’s Computer Use, OpenAI’s Operator, and their open-weight cousins — changed that. The agent sees the screen, clicks, types, scrolls, and knows when something didn’t work.

What we build

Real browser automation, model-driven. The agent runs a Chromium session (via Playwright, Browserbase, or Anchor depending on deploy target), sees screenshots plus accessibility tree, and takes actions through the same keyboard/mouse affordances a human would. No brittle CSS selectors hard-coded to a specific page layout.

Structured outputs. “Log into our supplier portal, download every invoice from Q1, rename them with vendor+date+amount, upload to Drive” produces a CSV of what was done (and what failed, with screenshots) — not just a loose description.

Robust to UI changes. When the portal adds a new field or renames a button, the agent adapts. That’s the whole reason model-driven automation beats scripted automation — the script breaks, the agent re-reads the page.

Checkpointing and recovery. Long flows (50-step forms, multi-page wizards) resume from the last good state if anything fails. You don’t rerun the whole thing.

Bounded and auditable. The agent only navigates domains you allow-list. Every click, every keystroke, every extracted value is logged with the screenshot context. Sensitive fields (passwords, SSNs, card numbers) are masked in the trace.

Where it beats RPA tools

RPA (UiPath, Automation Anywhere) needed you to script every click. When a page changed, the bot broke. A browser agent reads the page on every run and figures out the path. Maintenance cost drops by an order of magnitude; the activities that were economically unviable to automate suddenly aren’t.

Where it fits

Operations teams doing repetitive portal work — supplier invoice downloads, insurance submissions, compliance form filing, employee onboarding across SaaS tools that don’t have APIs. Research work that requires pulling structured data from many web sources. Any workflow you’ve wanted to automate but decided was too brittle to bother with.

Deployed either as a scheduled job (nightly invoice pull) or on-demand via API (trigger when a new order lands). Pilots scope one flow end-to-end in 2-3 weeks; expansion follows.

← Back to agents
Start a conversation

AI is only as good as the infrastructure underneath.

Whether you're shipping your first agent or scaling a multi-cluster inference fleet, we can help you skip the expensive detours.