知识引擎/Hermes 知识引擎/浏览器自动化 (Browser Automation)

Hermes Agent includes a full browser automation toolset with multiple backend options: - Browserbase cloud mode via Browserbase for managed cloud browsers and a

浏览器自动化 (Browser Automation)

> 📖 本文档翻译自 Hermes Agent 官方文档 > 最后更新:2026-04-16


Hermes Agent includes a full browser automation toolset with multiple backend options:

  • Browserbase cloud mode via Browserbase for managed cloud browsers and anti-bot tooling
  • Browser Use cloud mode via Browser Use as an alternative cloud browser provider
  • Firecrawl cloud mode via Firecrawl for cloud browsers with built-in scraping
  • Camofox local mode via Camofox for local anti-detection browsing (Firefox-based fingerprint spoofing)
  • Local Chrome via CDP — connect browser tools to your own Chrome instance using /browser connect
  • Local browser mode via the agent-browser CLI and a local Chromium installation

In all modes, the agent can navigate websites, interact with page elements, fill forms, and extract information.

Overview

Pages are represented as accessibility trees (text-based snapshots), making them ideal for LLM agents. Interactive elements get ref IDs (like @e1, @e2) that the agent uses for clicking and typing.

Key capabilities:

  • Multi-provider cloud execution — Browserbase, Browser Use, or Firecrawl — no local browser needed
  • Local Chrome integration — attach to your running Chrome via CDP for hands-on browsing
  • Built-in stealth — random fingerprints, CAPTCHA solving, residential proxies (Browserbase)
  • Session isolation — each task gets its own browser session
  • Automatic cleanup — inactive sessions are closed after a timeout
  • Vision analysis — screenshot + AI analysis for visual understanding

Setup

Browserbase cloud mode

To use Browserbase-managed cloud browsers, add:

# Add to ~/.hermes/.env
BROWSERBASE_API_KEY=***
BROWSERBASE_PROJECT_ID=your-project-id-here

Get your credentials at browserbase.com.

Browser Use cloud mode

To use Browser Use as your cloud browser provider, add:

# Add to ~/.hermes/.env
BROWSER_USE_API_KEY=***

Get your API key at browser-use.com. Browser Use provides a cloud browser via its REST API. If both Browserbase and Browser Use credentials are set, Browserbase takes priority.

Firecrawl cloud mode

To use Firecrawl as your cloud browser provider, add:

# Add to ~/.hermes/.env
FIRECRAWL_API_KEY=fc-***

Get your API key at firecrawl.dev. Then select Firecrawl as your browser provider:

hermes setup tools
# → Browser Automation → Firecrawl

Optional settings:

# Self-hosted Firecrawl instance (default: https://api.firecrawl.dev)
FIRECRAWL_API_URL=http://localhost:3002

# Session TTL in seconds (default: 300)
FIRECRAWL_BROWSER_TTL=600

Camofox local mode

Camofox is a self-hosted Node.js server wrapping Camoufox (a Firefox fork with C++ fingerprint spoofing). It provides local anti-detection browsing without cloud dependencies.

# Install and run
git clone https://github.com/jo-inc/camofox-browser && cd camofox-browser
npm install && npm start   # downloads Camoufox (~300MB) on first run

# Or via Docker
docker run -d --network host -e CAMOFOX_PORT=9377 jo-inc/camofox-browser

Then set in ~/.hermes/.env:

CAMOFOX_URL=http://localhost:9377

Or configure via hermes tools → Browser Automation → Camofox.

When CAMOFOX_URL is set, all browser tools automatically route through Camofox instead of Browserbase or agent-browser.

Persistent browser sessions

By default, each Camofox session gets a random identity — cookies and logins don't survive across agent restarts. To enable persistent browser sessions:

# In ~/.hermes/config.yaml
browser:
  camofox:
    managed_persistence: true

When enabled, Hermes sends a stable profile-scoped identity to Camofox. The Camofox server maps this identity to a persistent browser profile directory, so cookies, logins, and localStorage survive across restarts. Different Hermes profiles get different browser profiles (profile isolation).

:::note

:::

VNC live view

When Camofox runs in headed mode (with a visible browser window), it exposes a VNC port in its health check response. Hermes automatically discovers this and includes the VNC URL in navigation responses, so the agent can share a link for you to watch the browser live.

Local Chrome via CDP (/browser connect)

Instead of a cloud provider, you can attach Hermes browser tools to your own running Chrome instance via the Chrome DevTools Protocol (CDP). This is useful when you want to see what the agent is doing in real-time, interact with pages that require your own cookies/sessions, or avoid cloud browser costs.

In the CLI, use:

/browser connect              # Connect to Chrome at ws://localhost:9222
/browser connect ws://host:port  # Connect to a specific CDP endpoint
/browser status               # Check current connection
/browser disconnect            # Detach and return to cloud/local mode

If Chrome isn't already running with remote debugging, Hermes will attempt to auto-launch it with --remote-debugging-port=9222.

:::tip

:::

When connected via CDP, all browser tools (browser_navigate, browser_click, etc.) operate on your live Chrome instance instead of spinning up a cloud session.

Local browser mode

If you do not set any cloud credentials and don't use /browser connect, Hermes can still use the browser tools through a local Chromium install driven by agent-browser.

Optional Environment Variables

# Residential proxies for better CAPTCHA solving (default: "true")
BROWSERBASE_PROXIES=true

# Advanced stealth with custom Chromium — requires Scale Plan (default: "false")
BROWSERBASE_ADVANCED_STEALTH=false

# Session reconnection after disconnects — requires paid plan (default: "true")
BROWSERBASE_KEEP_ALIVE=true

# Custom session timeout in milliseconds (default: project default)
# Examples: 600000 (10min), 1800000 (30min)
BROWSERBASE_SESSION_TIMEOUT=600000

# Inactivity timeout before auto-cleanup in seconds (default: 120)
BROWSER_INACTIVITY_TIMEOUT=120

Install agent-browser CLI

npm install -g agent-browser
# Or install locally in the repo:
npm install

:::info

:::

Available Tools

browser_navigate

Navigate to a URL. Must be called before any other browser tool. Initializes the Browserbase session.

Navigate to https://github.com/NousResearch

:::tip

:::

browser_snapshot

Get a text-based snapshot of the current page's accessibility tree. Returns interactive elements with ref IDs like @e1, @e2 for use with browser_click and browser_type.

  • full=false (default): Compact view showing only interactive elements
  • full=true: Complete page content

Snapshots over 8000 characters are automatically summarized by an LLM.

browser_click

Click an element identified by its ref ID from the snapshot.

Click @e5 to press the "Sign In" button

browser_type

Type text into an input field. Clears the field first, then types the new text.

Type "hermes agent" into the search field @e3

browser_scroll

Scroll the page up or down to reveal more content.

Scroll down to see more results

browser_press

Press a keyboard key. Useful for submitting forms or navigation.

Press Enter to submit the form

Supported keys: Enter, Tab, Escape, ArrowDown, ArrowUp, and more.

browser_back

Navigate back to the previous page in browser history.

browser_get_images

List all images on the current page with their URLs and alt text. Useful for finding images to analyze.

browser_vision

Take a screenshot and analyze it with vision AI. Use this when text snapshots don't capture important visual information — especially useful for CAPTCHAs, complex layouts, or visual verification challenges.

The screenshot is saved persistently and the file path is returned alongside the AI analysis. On messaging platforms (Telegram, Discord, Slack, WhatsApp), you can ask the agent to share the screenshot — it will be sent as a native photo attachment via the MEDIA: mechanism.

What does the chart on this page show?

Screenshots are stored in ~/.hermes/cache/screenshots/ and automatically cleaned up after 24 hours.

browser_console

Get browser console output (log/warn/error messages) and uncaught JavaScript exceptions from the current page. Essential for detecting silent JS errors that don't appear in the accessibility tree.

Check the browser console for any JavaScript errors

Use clear=True to clear the console after reading, so subsequent calls only show new messages.

Practical Examples

Filling Out a Web Form

User: Sign up for an account on example.com with my email john@example.com

Agent workflow:
1. browser_navigate("https://example.com/signup")
2. browser_snapshot()  → sees form fields with refs
3. browser_type(ref="@e3", text="john@example.com")
4. browser_type(ref="@e5", text="SecurePass123")
5. browser_click(ref="@e8")  → clicks "Create Account"
6. browser_snapshot()  → confirms success

Researching Dynamic Content

User: What are the top trending repos on GitHub right now?

Agent workflow:
1. browser_navigate("https://github.com/trending")
2. browser_snapshot(full=true)  → reads trending repo list
3. Returns formatted results

Session Recording

Automatically record browser sessions as WebM video files:

browser:
  record_sessions: true  # default: false

When enabled, recording starts automatically on the first browser_navigate and saves to ~/.hermes/browser_recordings/ when the session closes. Works in both local and cloud (Browserbase) modes. Recordings older than 72 hours are automatically cleaned up.

Stealth Features

Browserbase provides automatic stealth capabilities:

FeatureDefaultNotes
Basic StealthAlways onRandom fingerprints, viewport randomization, CAPTCHA solving
Residential ProxiesOnRoutes through residential IPs for better access
Advanced StealthOffCustom Chromium build, requires Scale Plan
Keep AliveOnSession reconnection after network hiccups

:::note

:::

会话管理

  • Each task gets an isolated browser session via Browserbase
  • Sessions are automatically cleaned up after inactivity (default: 2 minutes)
  • A background thread checks every 30 seconds for stale sessions
  • Emergency cleanup runs on process exit to prevent orphaned sessions
  • Sessions are released via the Browserbase API (REQUEST_RELEASE status)

Limitations

  • Text-based interaction — relies on accessibility tree, not pixel coordinates
  • Snapshot size — large pages may be truncated or LLM-summarized at 8000 characters
  • Session timeout — cloud sessions expire based on your provider's plan settings
  • Cost — cloud sessions consume provider credits; sessions are automatically cleaned up when the conversation ends or after inactivity. Use /browser connect for free local browsing.
  • No file downloads — cannot download files from the browser

Continue Exploring

继续探索

这不是课程式的上一篇下一篇,而是从当前节点向外继续漫游。

核心功能

工具与工具集 (Tools & Toolsets)

Tools are functions that extend the agent's capabilities. They're organized into logical toolsets that can be enabled or disabled per platform.

核心功能

记忆系统 (Memory System)

Hermes Agent has bounded, curated memory that persists across sessions. This lets it remember your preferences, your projects, your environment, and things it h

核心功能

技能系统 (Skill System)

技能是 Hermes 的可复用知识模块。每个技能都是一个 Markdown 文件,在激活时注入到 Agent 的上下文中——为其提供持久的工作流、领域知识和行为指南,而无需将这些内容塞入系统提示中。 技能是可热插拔的:你可以在会话中途安装、创建、编辑和切换技能。它们在 CLI、消息平台和 Gateway 后台任务中均可

核心功能

MCP 集成 (MCP Integration)

MCP 让 Hermes Agent 连接到外部工具服务器,使 Agent 能够使用 Hermes 本身之外的工具——GitHub、数据库、文件系统、浏览器栈、内部 API 等。 如果你曾想让 Hermes 使用一个已经存在于其他地方的工具,MCP 通常是最简洁的方式。 - 无需先编写原生 Hermes 工具即可访问外

核心功能

ACP 编辑器集成 (ACP Editor Integration)

Hermes Agent 可以作为 ACP 服务器运行,让 ACP 兼容的编辑器通过 stdio 与 Hermes 通信,并渲染: - 聊天消息 - 工具活动 - 文件差异 - 终端命令 - 审批提示 - 流式思考 / 响应片段 当你希望 Hermes 像编辑器原生的编程 Agent 一样工作,而不是独立的 CLI 或

核心功能

API 服务器 (API Server)

The API server exposes hermes-agent as an OpenAI-compatible HTTP endpoint. Any frontend that speaks the OpenAI format — Open WebUI, LobeChat, LibreChat, NextCha

Core Features

核心功能

Hermes 的能力核心:工具、记忆、技能、委派、自动化、语音、插件与浏览器控制。

31 篇文档30 个节点

当前节点

浏览器自动化 (Browser Automation)

同主题继续探索

工具与工具集 (Tools & Toolsets)

Tools are functions that extend the agent's capabilities. They're organized into logical toolsets that can be enabled or disabled per platform.

记忆系统 (Memory System)

Hermes Agent has bounded, curated memory that persists across sessions. This lets it remember your preferences, your projects, your environment, and things it h

技能系统 (Skill System)

技能是 Hermes 的可复用知识模块。每个技能都是一个 Markdown 文件,在激活时注入到 Agent 的上下文中——为其提供持久的工作流、领域知识和行为指南,而无需将这些内容塞入系统提示中。 技能是可热插拔的:你可以在会话中途安装、创建、编辑和切换技能。它们在 CLI、消息平台和 Gateway 后台任务中均可

MCP 集成 (MCP Integration)

MCP 让 Hermes Agent 连接到外部工具服务器,使 Agent 能够使用 Hermes 本身之外的工具——GitHub、数据库、文件系统、浏览器栈、内部 API 等。 如果你曾想让 Hermes 使用一个已经存在于其他地方的工具,MCP 通常是最简洁的方式。 - 无需先编写原生 Hermes 工具即可访问外

ACP 编辑器集成 (ACP Editor Integration)

Hermes Agent 可以作为 ACP 服务器运行,让 ACP 兼容的编辑器通过 stdio 与 Hermes 通信,并渲染: - 聊天消息 - 工具活动 - 文件差异 - 终端命令 - 审批提示 - 流式思考 / 响应片段 当你希望 Hermes 像编辑器原生的编程 Agent 一样工作,而不是独立的 CLI 或

API 服务器 (API Server)

The API server exposes hermes-agent as an OpenAI-compatible HTTP endpoint. Any frontend that speaks the OpenAI format — Open WebUI, LobeChat, LibreChat, NextCha

相关节点

工具与工具集 (Tools & Toolsets)

Tools are functions that extend the agent's capabilities. They're organized into logical toolsets that can be enabled or disabled per platform.

记忆系统 (Memory System)

Hermes Agent has bounded, curated memory that persists across sessions. This lets it remember your preferences, your projects, your environment, and things it h

技能系统 (Skill System)

技能是 Hermes 的可复用知识模块。每个技能都是一个 Markdown 文件,在激活时注入到 Agent 的上下文中——为其提供持久的工作流、领域知识和行为指南,而无需将这些内容塞入系统提示中。 技能是可热插拔的:你可以在会话中途安装、创建、编辑和切换技能。它们在 CLI、消息平台和 Gateway 后台任务中均可

MCP 集成 (MCP Integration)

MCP 让 Hermes Agent 连接到外部工具服务器,使 Agent 能够使用 Hermes 本身之外的工具——GitHub、数据库、文件系统、浏览器栈、内部 API 等。 如果你曾想让 Hermes 使用一个已经存在于其他地方的工具,MCP 通常是最简洁的方式。 - 无需先编写原生 Hermes 工具即可访问外

ACP 编辑器集成 (ACP Editor Integration)

Hermes Agent 可以作为 ACP 服务器运行,让 ACP 兼容的编辑器通过 stdio 与 Hermes 通信,并渲染: - 聊天消息 - 工具活动 - 文件差异 - 终端命令 - 审批提示 - 流式思考 / 响应片段 当你希望 Hermes 像编辑器原生的编程 Agent 一样工作,而不是独立的 CLI 或

API 服务器 (API Server)

The API server exposes hermes-agent as an OpenAI-compatible HTTP endpoint. Any frontend that speaks the OpenAI format — Open WebUI, LobeChat, LibreChat, NextCha