知识引擎/Hermes 知识引擎/API 服务器 (API Server)

返回分馆所属主题：核心功能更新于 2026年4月16日官方来源

The API server exposes hermes-agent as an OpenAI-compatible HTTP endpoint. Any frontend that speaks the OpenAI format — Open WebUI, LobeChat, LibreChat, NextCha

API 服务器 (API Server)

> 📖 本文档翻译自 Hermes Agent 官方文档 > 最后更新：2026-04-16

The API server exposes hermes-agent as an OpenAI-compatible HTTP endpoint. Any frontend that speaks the OpenAI format — Open WebUI, LobeChat, LibreChat, NextChat, ChatBox, and hundreds more — can connect to hermes-agent and use it as a backend.

Your agent handles requests with its full toolset (terminal, file operations, web search, memory, skills) and returns the final response. When streaming, tool progress indicators appear inline so frontends can show what the agent is doing.

快速开始

1. Enable the API server

Add to ~/.hermes/.env:

API_SERVER_ENABLED=true
API_SERVER_KEY=change-me-local-dev
# Optional: only if a browser must call Hermes directly
# API_SERVER_CORS_ORIGINS=http://localhost:3000

2. Start the gateway

hermes gateway

You'll see:

[API Server] API server listening on http://127.0.0.1:8642

3. Connect a frontend

Point any OpenAI-compatible client at http://localhost:8642/v1:

# Test with curl
curl http://localhost:8642/v1/chat/completions \
  -H "Authorization: Bearer change-me-local-dev" \
  -H "Content-Type: application/json" \
  -d '{"model": "hermes-agent", "messages": [{"role": "user", "content": "Hello!"}]}'

Or connect Open WebUI, LobeChat, or any other frontend — see the Open WebUI integration guide for step-by-step instructions.

Endpoints

POST /v1/chat/completions

Standard OpenAI Chat Completions format. Stateless — the full conversation is included in each request via the messages array.

Request:

{
  "model": "hermes-agent",
  "messages": [
    {"role": "system", "content": "You are a Python expert."},
    {"role": "user", "content": "Write a fibonacci function"}
  ],
  "stream": false
}

Response:

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1710000000,
  "model": "hermes-agent",
  "choices": [{
    "index": 0,
    "message": {"role": "assistant", "content": "Here's a fibonacci function..."},
    "finish_reason": "stop"
  }],
  "usage": {"prompt_tokens": 50, "completion_tokens": 200, "total_tokens": 250}
}

Streaming ("stream": true): Returns Server-Sent Events (SSE) with token-by-token response chunks. For Chat Completions, the stream uses standard chat.completion.chunk events plus Hermes' custom hermes.tool.progress event for tool-start UX. For Responses, the stream uses OpenAI Responses event types such as response.created, response.output_text.delta, response.output_item.added, response.output_item.done, and response.completed.

Tool progress in streams:

Chat Completions: Hermes emits event: hermes.tool.progress for tool-start visibility without polluting persisted assistant text.
Responses: Hermes emits spec-native function_call and function_call_output output items during the SSE stream, so clients can render structured tool UI in real time.

POST /v1/responses

OpenAI Responses API format. Supports server-side conversation state via previous_response_id — the server stores full conversation history (including tool calls and results) so multi-turn context is preserved without the client managing it.

Request:

{
  "model": "hermes-agent",
  "input": "What files are in my project?",
  "instructions": "You are a helpful coding assistant.",
  "store": true
}

Response:

{
  "id": "resp_abc123",
  "object": "response",
  "status": "completed",
  "model": "hermes-agent",
  "output": [
    {"type": "function_call", "name": "terminal", "arguments": "{\"command\": \"ls\"}", "call_id": "call_1"},
    {"type": "function_call_output", "call_id": "call_1", "output": "README.md src/ tests/"},
    {"type": "message", "role": "assistant", "content": [{"type": "output_text", "text": "Your project has..."}]}
  ],
  "usage": {"input_tokens": 50, "output_tokens": 200, "total_tokens": 250}
}

Multi-turn with previous_response_id

Chain responses to maintain full context (including tool calls) across turns:

{
  "input": "Now show me the README",
  "previous_response_id": "resp_abc123"
}

The server reconstructs the full conversation from the stored response chain — all previous tool calls and results are preserved. Chained requests also share the same session, so multi-turn conversations appear as a single entry in the dashboard and session history.

Named conversations

Use the conversation parameter instead of tracking response IDs:

{"input": "Hello", "conversation": "my-project"}
{"input": "What's in src/?", "conversation": "my-project"}
{"input": "Run the tests", "conversation": "my-project"}

The server automatically chains to the latest response in that conversation. Like the /title command for gateway sessions.

GET /v1/responses/{id}

Retrieve a previously stored response by ID.

DELETE /v1/responses/{id}

Delete a stored response.

GET /v1/models

Lists the agent as an available model. The advertised model name defaults to the profile name (or hermes-agent for the default profile). Required by most frontends for model discovery.

GET /health

Health check. Returns {"status": "ok"}. Also available at GET /v1/health for OpenAI-compatible clients that expect the /v1/ prefix.

System Prompt Handling

When a frontend sends a system message (Chat Completions) or instructions field (Responses API), hermes-agent layers it on top of its core system prompt. Your agent keeps all its tools, memory, and skills — the frontend's system prompt adds extra instructions.

This means you can customize behavior per-frontend without losing capabilities:

Open WebUI system prompt: "You are a Python expert. Always include type hints."
The agent still has terminal, file tools, web search, memory, etc.

认证

Bearer token auth via the Authorization header:

Authorization: Bearer ***

Configure the key via API_SERVER_KEY env var. If you need a browser to call Hermes directly, also set API_SERVER_CORS_ORIGINS to an explicit allowlist.

:::warning

:::

配置

环境变量

Variable	Default	Description
API_SERVER_ENABLED	false	Enable the API server
API_SERVER_PORT	8642	HTTP server port
API_SERVER_HOST	127.0.0.1	Bind address (localhost only by default)
API_SERVER_KEY	(none)	Bearer token for auth
API_SERVER_CORS_ORIGINS	(none)	Comma-separated allowed browser origins
API_SERVER_MODEL_NAME	(profile name)	Model name on/v1/models. Defaults to profile name, orhermes-agentfor default profile.

config.yaml

# Not yet supported — use environment variables.
# config.yaml support coming in a future release.

Security Headers

All responses include security headers:

X-Content-Type-Options: nosniff — prevents MIME type sniffing
Referrer-Policy: no-referrer — prevents referrer leakage

CORS

The API server does not enable browser CORS by default.

For direct browser access, set an explicit allowlist:

API_SERVER_CORS_ORIGINS=http://localhost:3000,http://127.0.0.1:3000

When CORS is enabled:

Preflight responses include Access-Control-Max-Age: 600 (10 minute cache)
SSE streaming responses include CORS headers so browser EventSource clients work correctly
Idempotency-Key is an allowed request header — clients can send it for deduplication (responses are cached by key for 5 minutes)

Most documented frontends such as Open WebUI connect server-to-server and do not need CORS at all.

Compatible Frontends

Any frontend that supports the OpenAI API format works. Tested/documented integrations:

Frontend	Stars	Connection
Open WebUI	126k	Full guide available
LobeChat	73k	Custom provider endpoint
LibreChat	34k	Custom endpoint in librechat.yaml
AnythingLLM	56k	Generic OpenAI provider
NextChat	87k	BASE_URL env var
ChatBox	39k	API Host setting
Jan	26k	Remote model config
HF Chat-UI	8k	OPENAI_BASE_URL
big-AGI	7k	Custom endpoint
OpenAI Python SDK	—	OpenAI(base_url="http://localhost:8642/v1")
curl	—	Direct HTTP requests

Multi-User Setup with Profiles

To give multiple users their own isolated Hermes instance (separate config, memory, skills), use profiles:

# Create a profile per user
hermes profile create alice
hermes profile create bob

# Configure each profile's API server on a different port
hermes -p alice config set API_SERVER_ENABLED true
hermes -p alice config set API_SERVER_PORT 8643
hermes -p alice config set API_SERVER_KEY alice-secret

hermes -p bob config set API_SERVER_ENABLED true
hermes -p bob config set API_SERVER_PORT 8644
hermes -p bob config set API_SERVER_KEY bob-secret

# Start each profile's gateway
hermes -p alice gateway &
hermes -p bob gateway &

Each profile's API server automatically advertises the profile name as the model ID:

http://localhost:8643/v1/models → model alice
http://localhost:8644/v1/models → model bob

In Open WebUI, add each as a separate connection. The model dropdown shows alice and bob as distinct models, each backed by a fully isolated Hermes instance. See the Open WebUI guide for details.

Limitations

Response storage — stored responses (for previous_response_id) are persisted in SQLite and survive gateway restarts. Max 100 stored responses (LRU eviction).
No file upload — vision/document analysis via uploaded files is not yet supported through the API.
Model field is cosmetic — the model field in requests is accepted but the actual LLM model used is configured server-side in config.yaml.

Proxy Mode

The API server also serves as the backend for gateway proxy mode. When another Hermes gateway instance is configured with GATEWAY_PROXY_URL pointing at this API server, it forwards all messages here instead of running its own agent. This enables split deployments — for example, a Docker container handling Matrix E2EE that relays to a host-side agent.

See Matrix Proxy Mode for the full setup guide.

Continue Exploring

继续探索

这不是课程式的上一篇下一篇，而是从当前节点向外继续漫游。

消息平台

Open WebUI

Open WebUI（126k★）是最受欢迎的自托管 AI 聊天界面。借助 Hermes Agent 内置的 API 服务器，你可以将 Open WebUI 作为 Agent 的精美 Web 前端——包含会话管理、用户账户和现代聊天界面。 Open WebUI 连接到 Hermes Agent 的 API 服务器，就像

消息平台

Matrix

Hermes Agent integrates with Matrix, the open, federated messaging protocol. Matrix lets you run your own homeserver or use a public one like matrix.org — eithe

核心功能

工具与工具集 (Tools & Toolsets)

Tools are functions that extend the agent's capabilities. They're organized into logical toolsets that can be enabled or disabled per platform.

核心功能

记忆系统 (Memory System)

Hermes Agent has bounded, curated memory that persists across sessions. This lets it remember your preferences, your projects, your environment, and things it h

核心功能

技能系统 (Skill System)

技能是 Hermes 的可复用知识模块。每个技能都是一个 Markdown 文件，在激活时注入到 Agent 的上下文中——为其提供持久的工作流、领域知识和行为指南，而无需将这些内容塞入系统提示中。技能是可热插拔的：你可以在会话中途安装、创建、编辑和切换技能。它们在 CLI、消息平台和 Gateway 后台任务中均可

核心功能

MCP 集成 (MCP Integration)

MCP 让 Hermes Agent 连接到外部工具服务器，使 Agent 能够使用 Hermes 本身之外的工具——GitHub、数据库、文件系统、浏览器栈、内部 API 等。如果你曾想让 Hermes 使用一个已经存在于其他地方的工具，MCP 通常是最简洁的方式。 - 无需先编写原生 Hermes 工具即可访问外

Core Features

核心功能

Hermes 的能力核心：工具、记忆、技能、委派、自动化、语音、插件与浏览器控制。

31 篇文档30 个节点

当前节点

API 服务器 (API Server)

返回分馆回到知识引擎

同主题继续探索

工具与工具集 (Tools & Toolsets)

Tools are functions that extend the agent's capabilities. They're organized into logical toolsets that can be enabled or disabled per platform.

记忆系统 (Memory System)

Hermes Agent has bounded, curated memory that persists across sessions. This lets it remember your preferences, your projects, your environment, and things it h

技能系统 (Skill System)

MCP 集成 (MCP Integration)

ACP 编辑器集成 (ACP Editor Integration)

Hermes Agent 可以作为 ACP 服务器运行，让 ACP 兼容的编辑器通过 stdio 与 Hermes 通信，并渲染： - 聊天消息 - 工具活动 - 文件差异 - 终端命令 - 审批提示 - 流式思考 / 响应片段当你希望 Hermes 像编辑器原生的编程 Agent 一样工作，而不是独立的 CLI 或

Honcho 记忆 (Honcho Memory)

Honcho is an AI-native memory backend that adds dialectic reasoning and deep user modeling on top of Hermes's built-in memory system. Instead of simple key-valu

API 服务器 (API Server)

快速开始

1. Enable the API server

2. Start the gateway

3. Connect a frontend

Endpoints

POST /v1/chat/completions

POST /v1/responses

Multi-turn with previous_response_id

Named conversations

GET /v1/responses/{id}

DELETE /v1/responses/{id}

GET /v1/models

GET /health

System Prompt Handling

认证

配置

环境变量

config.yaml

Security Headers

CORS

Compatible Frontends

Multi-User Setup with Profiles

Limitations

Proxy Mode

继续探索

Open WebUI

Matrix

工具与工具集 (Tools & Toolsets)

记忆系统 (Memory System)

技能系统 (Skill System)

MCP 集成 (MCP Integration)

核心功能

知识引擎 AI 问答