13 Jun 2026 Field notes Identity Software Operations

How the agent fleet is wired, in five layers

From a typed request to a guarded action against a real system, a walk down the stack that turns Claude Code into a fleet of homelab operators: the CLI, the shared agent-core, the agents, their skills, and the actions they perform.

We run a small fleet of AI agents in the homelab, one each for systems, network, security, identity, and endpoints. People ask how that actually hangs together: is it one big prompt? Five chatbots? Something with a queue? None of those. It’s a short, deliberate stack, and every request walks down the same five layers, from a sentence I type to a change on a real box, then the results walk back up.

Here’s the whole thing on one page.

Requests flow down; results flow back up the same path. One framework, five agents, many skills, one place where writes are gated.

Layer 1: the CLI is the orchestrator

There’s no bespoke router or message bus. The front door is Claude Code, and it already holds every agent’s tools at once under the canonical mcp__<agent>__<skill>__<tool> naming. When I ask “is VMID 120 healthy and is its image CVE-clear?”, Claude Code is the thing that knows the infra agent answers the first half and the security agent the second. Cross-agent work happens here, one layer above the agents, not in plumbing between them. That keeps the agents independent and means there’s no orchestration code of mine to maintain.

Layer 2: agent-core does the boring, hard parts once

Every agent is a thin shell around the shared jitc-agent-core framework. The streaming chat loop, prompt caching, the tool-call loop, the token-budgeted history, the BaseSkill base class, config loading, and the dynamic skill discovery that wires it all up at startup: all of that lives in one place, pinned by commit SHA in each agent’s requirements. Nobody copies a chat loop into an agent repo. Fix a bug in the loop once, bump the SHA, and all five agents inherit it.

Layer 3: an agent is a prompt plus a skill set

What’s actually in an agent repo is small: an entry point with a system prompt and a list of default skills. That’s the whole personality. The infra agent knows Proxmox and Docker; the network agent knows UniFi, DNS, and Traefik; security knows Wazuh, CVEs, and reputation lookups; identity knows Authentik and Infisical; endpoint knows package and fleet management. Skills are discovered at startup, and a single SKILL_<NAME>=true/false flag flips one on or off per deployment. The same agent also runs as an MCP stdio server with no API key needed, which is exactly why Claude Code in Layer 1 can call it.

Layer 4: skills are where a tool meets a backend

A skill is one directory: a skill.py that declares its tools (each a name, a description, and a JSON Schema), a client.py of about a hundred lines of standard library against a vendor API, and a config.py. Every tool name is prefixed with the skill name, so there’s never a collision across the fleet. The rule that matters most here: skills are read-only by default. A skill ships its read tools first; any tool that changes state sits behind a safety guard (a kill-switch env flag, a two-phase confirm=true, a blast-radius check) and writes are off unless you deliberately turn them on.

Layer 5: the action, and the gate in front of it

The bottom of the stack is a real change on a real system: a VM rebooted, a DNS record written, an agent quarantined, a user disabled. Two things are always true at this layer. A write never fires on an implied “yes.” It needs an explicit confirm=true (the same kind of gate as GitHub making you type the repo name before it deletes it) after the agent has shown you precisely what it’s about to do. And every write is logged, so a session ties back to an action ties back to a backend change. The read path is open and fast; the write path is narrow and accountable. That asymmetry is the entire safety model.

Why it’s shaped like this

Each layer has exactly one job, and the seams between them are clean: the CLI orchestrates, the framework runs the loop, an agent is a prompt, a skill is a typed tool over a client, and the action is gated. You can add a skill without touching the framework, add an agent without touching the others, and reason about “can this thing change my infrastructure?” by looking at a single layer. It’s less clever than a mesh of agents talking to each other. And that’s the point. The boring stack is the one I trust to hold a write token.

(A companion piece, Skills, MCP, and where the credentials belong, digs into the layer underneath this one: how those backend credentials should be brokered rather than parked in a .env.)