Multi-Agent Swarm

The swarm orchestrator engine ships in v0.1: router, 5 specialist agent types, the agent-to-agent message bus, budget + depth + loop safety controls, SQLite-backed swarm workspace, distributed trace propagation, and ~40 packages of supporting code are all in the binary and pass their unit tests. What’s not shipping in v0.1 is a cookbook of verified real-world swarm flows — we haven’t put this through enough production use across enough task types to mark it ready.

If you want to try swarm mode in v0.1: set settings.agents_mode: swarm in config and start asking complex multi-step questions. The engine runs. But if you hit a rough edge, that’s expected — file an issue and help push swarm toward ready for v0.2. The community help item in CONTRIBUTING.md titled “Build the multi-agent swarm cookbook” is specifically about validating real delegation flows end-to-end so the v0.2 docs can include live screenshot captures like the rest of the site.

What swarm mode does

When a single agent isn’t enough for a task, the swarm orchestrator decomposes the task into subtasks, fans them out to specialist agents, and synthesizes the results. Each specialist has a focused capability set so it can’t wander — the Coding agent can edit files and run shell, the Research agent can search the web and read memory, the Home agent can read IoT sensors and schedule tasks, and so on.

The mental model:

User: "Analyze my project's dependencies, check for CVEs, and
       write a short security report I can share with the team"

Router Agent:
  │ Classify intent: "security review" → needs coding + research + synthesis
  │
  ├── Subtask 1 (Coding agent):    parse go.mod, list all deps
  │   └── returns: { direct: [...], indirect: [...], versions: [...] }
  │
  ├── Subtask 2 (Research agent):  check each dep against CVE databases
  │   └── returns: { vulnerable: [...], clean: [...], unknown: [...] }
  │
  └── Subtask 3 (General agent):   merge results into a team-friendly
                                     markdown report with severity ratings

Synthesis: structured report delivered to the originating channel.
Audit: every subagent call is logged with cost, duration, tool calls.
Budget: total spend across all subagents ≤ configured cap.

Every subagent runs with its own capability set, its own memory slice, and its own budget cap. The parent agent cannot delegate an action to a subagent that the parent itself doesn’t have permission for — delegation is a restricted operation, not a capability escalation.

Specialist agents (5 built-in)

Agent	Capabilities	What it does best
Coding	file., shell.exec, git.	File edits, builds, test runs, git operations, codebase archaeology
Research	web.search, web.fetch, memory.search, memory.graph	Gathering + synthesizing external and internal information
Home	iot.sensor.read, iot.device.control, schedule.*	Smart-home status, automation setup, sensor reporting
Creative	image.generate, video.generate, audio.synthesize (all beta)	Prompt optimization, content generation tasks
General	Safe default tool set (memory, channels, basic utilities)	Catch-all for tasks that don’t fit a specialist

Each specialist is defined by a role spec — name, model preference, allowed tools, a system prompt — and can be added or customized via configurations.agents.custom_roles. See Configuration Reference for the full role-definition syntax.

Router agent — three-layer classification

The router decides which specialist gets each task. Three layers run in order, fastest first, to keep routing cost near-zero for the common case:

Explicit directive — if the user says “ask the coding agent to review this”, the task routes directly to that specialist with no classification cost.
Keyword matching — a small rule set maps common phrasings to specialists ("test" | "build" | "commit" → Coding; "weather" | "calendar" | "remind me" → Home; etc.). Zero LLM calls.
LLM fallback — for ambiguous messages that don’t match either of the above, a small + cheap classifier model (Haiku by default) picks the best specialist. Only runs when the first two layers don’t fire.

This three-layer design keeps 90%+ of routing decisions free of inference cost. Only genuinely ambiguous messages pay the LLM fallback tax.

Agent-to-agent message bus

Subagents can publish and subscribe to typed messages on an in-process pub/sub bus. This enables coordination patterns like “the research agent found something the coding agent should know about”:

// Research agent publishes a finding
bus.Publish("dep.cve-found", researchAgentID, map[string]any{
  "package": "github.com/some/dep",
  "version": "v1.2.3",
  "cve":     "CVE-2024-12345",
  "severity": "high",
})

// Coding agent is subscribed to dep.cve-found
// and can queue a fix task when one fires

The bus is scoped to a single swarm run — messages don’t leak across separate invocations. See internal/agent/bus/ for the implementation.

Safety controls — why swarm won’t bankrupt you

Every swarm run is bounded by five independent safety controls that run before the LLM ever sees the budget-approving keys:

Control	Default	What it does
Budget cap	$5.00 USD per run	Hard abort if total spend across all subagents exceeds this. Configurable per-run or globally.
Depth limit	4 hops	Prevents unbounded A→B→C→D→E delegation chains.
Loop detection	Agent-chain header	Blocks A→B→A cycles before they start.
Tool call limit	50 per subagent	Single subagent can’t spin on tool use forever.
Timeout	5 minutes per subagent	Kill stuck subagents; parent gets a clear timeout error.

All five are enforced at the orchestrator level, not the agent level — which means a prompt-injected LLM cannot “talk its way” past them. The budget cap in particular is a hard arithmetic check that runs on every cost-tracked call.

Configuration

configurations:
  swarm:
    enabled: true
    max_depth: 4                  # hop limit
    budget_usd: 5.0               # total $ cap per run
    workspace_ttl_hours: 24       # clean up old swarm workspaces after N hours

settings:
  agents_mode: swarm              # or: auto | single | multi | custom

The agents_mode setting controls routing behavior:

single — one agent handles everything; no specialization
multi — parallel execution where tools can run concurrently, but no task decomposition
swarm — full orchestrator with decomposition + specialists + message bus
custom — user-defined role routing via configurations.agents.custom_roles
auto (default) — use single until task complexity crosses a threshold, then escalate to swarm

What’s on the v0.2 roadmap for swarm

Swarm cookbook — verified end-to-end flows with live screenshots for the top 10 common task types (dependency audit, bug triage, code review, research report, comparative analysis, etc.)
Workspace UI — a live view of running swarm tasks in the web chat’s Home panel (currently only visible via the dashboard API)
Cross-session swarms — long-running swarms that span multiple conversations and survive restarts (today a swarm run is scoped to a single session)
Specialist customization UI — visual editor for custom roles, instead of hand-writing the YAML
External A2A peer delegation — swarm subtasks delegated to A2A-compliant peers, not just local specialists (the handler is ready, see A2A Protocol)

Contributing to swarm

The single highest-impact contribution for swarm right now is running real task flows and filing reports. If you have a non-trivial multi-step task you’d normally split across several chat messages, try phrasing it as one message with agents_mode: swarm and see what happens. File an issue with:

The original message
The resulting decomposition (from the dashboard API’s /api/dashboard/swarm/runs endpoint)
What worked, what didn’t
What you expected vs what you got

That’s the fastest path from beta to ready. A dozen real-world reports across different task types gives us the confidence to promote swarm out of beta for v0.2.

A2A Protocol — delegate to external agents (not just local specialists)
Agent Modes — single vs multi vs swarm vs custom
Security Model — capability flow across subagents