Skip to content

Multi-Agent Swarm

When a single agent isn’t enough for a task, the swarm orchestrator decomposes the task into subtasks, fans them out to specialist agents, and synthesizes the results. Each specialist has a focused capability set so it can’t wander — the Coding agent can edit files and run shell, the Research agent can search the web and read memory, the Home agent can read IoT sensors and schedule tasks, and so on.

The mental model:

User: "Analyze my project's dependencies, check for CVEs, and
write a short security report I can share with the team"
Router Agent:
│ Classify intent: "security review" → needs coding + research + synthesis
├── Subtask 1 (Coding agent): parse go.mod, list all deps
│ └── returns: { direct: [...], indirect: [...], versions: [...] }
├── Subtask 2 (Research agent): check each dep against CVE databases
│ └── returns: { vulnerable: [...], clean: [...], unknown: [...] }
└── Subtask 3 (General agent): merge results into a team-friendly
markdown report with severity ratings
Synthesis: structured report delivered to the originating channel.
Audit: every subagent call is logged with cost, duration, tool calls.
Budget: total spend across all subagents ≤ configured cap.

Every subagent runs with its own capability set, its own memory slice, and its own budget cap. The parent agent cannot delegate an action to a subagent that the parent itself doesn’t have permission for — delegation is a restricted operation, not a capability escalation.

AgentCapabilitiesWhat it does best
Codingfile., shell.exec, git.File edits, builds, test runs, git operations, codebase archaeology
Researchweb.search, web.fetch, memory.search, memory.graphGathering + synthesizing external and internal information
Homeiot.sensor.read, iot.device.control, schedule.*Smart-home status, automation setup, sensor reporting
Creativeimage.generate, video.generate, audio.synthesize (all beta)Prompt optimization, content generation tasks
GeneralSafe default tool set (memory, channels, basic utilities)Catch-all for tasks that don’t fit a specialist

Each specialist is defined by a role spec — name, model preference, allowed tools, a system prompt — and can be added or customized via configurations.agents.custom_roles. See Configuration Reference for the full role-definition syntax.

Router agent — three-layer classification

Section titled “Router agent — three-layer classification”

The router decides which specialist gets each task. Three layers run in order, fastest first, to keep routing cost near-zero for the common case:

  1. Explicit directive — if the user says “ask the coding agent to review this”, the task routes directly to that specialist with no classification cost.
  2. Keyword matching — a small rule set maps common phrasings to specialists ("test" | "build" | "commit" → Coding; "weather" | "calendar" | "remind me" → Home; etc.). Zero LLM calls.
  3. LLM fallback — for ambiguous messages that don’t match either of the above, a small + cheap classifier model (Haiku by default) picks the best specialist. Only runs when the first two layers don’t fire.

This three-layer design keeps 90%+ of routing decisions free of inference cost. Only genuinely ambiguous messages pay the LLM fallback tax.

Subagents can publish and subscribe to typed messages on an in-process pub/sub bus. This enables coordination patterns like “the research agent found something the coding agent should know about”:

// Research agent publishes a finding
bus.Publish("dep.cve-found", researchAgentID, map[string]any{
"package": "github.com/some/dep",
"version": "v1.2.3",
"cve": "CVE-2024-12345",
"severity": "high",
})
// Coding agent is subscribed to dep.cve-found
// and can queue a fix task when one fires

The bus is scoped to a single swarm run — messages don’t leak across separate invocations. See internal/agent/bus/ for the implementation.

Safety controls — why swarm won’t bankrupt you

Section titled “Safety controls — why swarm won’t bankrupt you”

Every swarm run is bounded by five independent safety controls that run before the LLM ever sees the budget-approving keys:

ControlDefaultWhat it does
Budget cap$5.00 USD per runHard abort if total spend across all subagents exceeds this. Configurable per-run or globally.
Depth limit4 hopsPrevents unbounded A→B→C→D→E delegation chains.
Loop detectionAgent-chain headerBlocks A→B→A cycles before they start.
Tool call limit50 per subagentSingle subagent can’t spin on tool use forever.
Timeout5 minutes per subagentKill stuck subagents; parent gets a clear timeout error.

All five are enforced at the orchestrator level, not the agent level — which means a prompt-injected LLM cannot “talk its way” past them. The budget cap in particular is a hard arithmetic check that runs on every cost-tracked call.

configurations:
swarm:
enabled: true
max_depth: 4 # hop limit
budget_usd: 5.0 # total $ cap per run
workspace_ttl_hours: 24 # clean up old swarm workspaces after N hours
settings:
agents_mode: swarm # or: auto | single | multi | custom

The agents_mode setting controls routing behavior:

  • single — one agent handles everything; no specialization
  • multi — parallel execution where tools can run concurrently, but no task decomposition
  • swarm — full orchestrator with decomposition + specialists + message bus
  • custom — user-defined role routing via configurations.agents.custom_roles
  • auto (default) — use single until task complexity crosses a threshold, then escalate to swarm
  • Swarm cookbook — verified end-to-end flows with live screenshots for the top 10 common task types (dependency audit, bug triage, code review, research report, comparative analysis, etc.)
  • Workspace UI — a live view of running swarm tasks in the web chat’s Home panel (currently only visible via the dashboard API)
  • Cross-session swarms — long-running swarms that span multiple conversations and survive restarts (today a swarm run is scoped to a single session)
  • Specialist customization UI — visual editor for custom roles, instead of hand-writing the YAML
  • External A2A peer delegation — swarm subtasks delegated to A2A-compliant peers, not just local specialists (the handler is ready, see A2A Protocol)

The single highest-impact contribution for swarm right now is running real task flows and filing reports. If you have a non-trivial multi-step task you’d normally split across several chat messages, try phrasing it as one message with agents_mode: swarm and see what happens. File an issue with:

  • The original message
  • The resulting decomposition (from the dashboard API’s /api/dashboard/swarm/runs endpoint)
  • What worked, what didn’t
  • What you expected vs what you got

That’s the fastest path from beta to ready. A dozen real-world reports across different task types gives us the confidence to promote swarm out of beta for v0.2.