Skip to content

writing

Make the model the exception, not the loop

How a 24/7 AI agent fleet stays affordable on one subscription: deterministic code handles every tick, and the model only runs on real signals.

· 7 min read

  • #ai-agents
  • #automation
  • #llmops
  • #self-hosting

I run a small fleet of AI agents around the clock. Three recycled laptops, each operated by its own headless Claude Code agent; a self-hosted stack of about 22 containers behind a private overlay network — photos, media, DNS-level ad-blocking, private chat, local models; a status hub every machine reports to once a minute. The agents watch all of it, talk to me over chat, and quietly fix what they safely can while I sleep.

The first question anyone asks: doesn’t that burn a fortune in tokens?

It would — if the model ran the loop. It doesn’t. The whole system stays affordable on a single subscription because of one rule, applied everywhere: deterministic code handles every tick, and the model is invoked only on a real signal. The model is the exception, not the loop.

The expensive way to say “all good”

I know the anti-pattern intimately because I built it first. An early version of the fleet had a scheduled task that fired every few minutes and unconditionally piped a prompt into a headless agent session: check whether everything is okay. It worked. It was also waste in its purest form, because the expensive path ran on the no-op case. More than 99% of those invocations concluded “nothing happened.” Paying a frontier model to say nothing happened, hundreds of times a day, on every machine, forever, is the fastest way to make 24/7 agents unaffordable — and it adds latency and a new failure surface to a loop that needed neither.

The fix wasn’t a cheaper model. It was a structural inversion: put a cheap deterministic gate in front of every model call, and invoke the model only when the gate opens.

288 runs a day, zero model calls

The clearest example is the watchdog. Every five minutes, a plain PowerShell script runs seven structural checks — no tokens involved:

  1. All 17 expected containers are running.
  2. No container reports itself unhealthy.
  3. Key endpoints answer (the photo server’s ping, the media server’s health route).
  4. DNS and ad-blocking resolve.
  5. Disk free space is above threshold.
  6. The media drives are mounted.
  7. The chat listener process is actually alive.

On a healthy tick it writes one log line and exits:

2026-06-09 22:50:08  OK - all 17 services healthy, 58.9 GB free

That’s it. In normal operation the watchdog runs 288 times a day and never once invokes the model. Container down? Restart it, post an alert — none of that needs intelligence. It needs docker ps and an if-statement.

Alerting is a state machine, not a conversation

When a check does fail, the alert path is still deterministic. A problem posts to a dedicated alerts room in our self-hosted chat, and the incident is latched to a small JSON state file: when it started, a summary, how many pings have been sent, whether an email has gone out.

The latch enforces three behaviors that people usually reach for an LLM to get:

  • Capped pings. At most 3 chat messages per incident, then silence. Nobody needs the 40th notification that the same container is still down.
  • Auto re-arm. The moment the problem clears, the watchdog posts one recovery line and deletes the flag. No human acknowledgment, no stale alarms.
  • Honest delivery. The chat post only counts as delivered when the server hands back an event ID. A timeout isn’t “probably fine” — it’s a recorded delivery failure.

Deduplication, rate-limiting, and recovery detection are flag files and counters, not model judgment. The model is a terrible (and expensive) dedup engine.

The model enters through a double-gated door

So when does the model run? On the watchdog path, only when every cheaper option has provably failed:

  1. A real problem exists, and
  2. the chat alert could not be delivered — the server never confirmed it, so I cannot have seen it — and
  3. no email has gone out for this incident yet.

Only then does the watchdog spin up a headless agent session, scoped to read-only diagnosis plus a single email tool, with a prompt that hard-limits the job: diagnose and notify — never repair, never change anything, send exactly one email. The model is the fallback of the fallback, and even then it gets a narrow lane.

Every scheduled job in the system has the same shape. The household reminders script checks a local database and exits silently if nothing is due — the model only runs when a reminder actually fires. A stock-availability monitor has a no-model tier (HTTP fetch plus parsing) that runs hourly, and only the rendered, bot-protected sites fall through to a model tier — on the cheapest model that can do the job. The nightly usage report is pure Python; the model is summoned only on a security-anomaly exit code.

Once a day, the model earns its keep

There is exactly one place a model run is scheduled, and it’s deliberate: the 9 PM digest. That’s not a violation of the rule — it’s the rule applied honestly, because at the end of a day there is guaranteed work: a day’s worth of incidents to triage and a report to write.

Even there, determinism goes first. The script gathers everything before the model sees anything: the day’s log lines, current container status, disk headroom, and the last 24 hours of incidents from the status hub — the same incident stream the watchdog has been feeding all day.

Then the agent triages each open incident, read-only diagnosis first, and applies a fix only if it’s on a tight safe-allowlist:

  • restart an exited or unhealthy container, and verify it comes back healthy;
  • re-seed an expired credential using the documented bootstrap, only for the expired-token symptom;
  • clear a stale alarm flag, only after confirming its cause is actually gone;
  • reload a wedged service.

And the hard nevers: no editing config or code, no touching data, no rotating secrets, no spending money, no messaging anyone new. Anything outside the allowlist is left open, diagnosed, and flagged for me. The prompt carries the reasoning in one line: a wrong unattended change at night is worse than an unfixed incident. Then one warm email lands in my inbox — what broke, what got fixed and how, what’s waiting for my hands. Most incidents are resolved before I ever read about them.

The principles

Pulling the pattern out of my basement and into something general — if you’re building agent automation on a subscription budget, these five rules are most of it:

  1. Gate every model call with a deterministic check. Invoke on signals — a message, a state change, an anomaly, a due item — never because a timer fired. Code the common no-op path in bash, PowerShell, or Python.
  2. Do alert hygiene in code: latch, cap, auto-clear. Dedup and rate-limiting are counters and flag files. Don’t pay a model to suppress notifications.
  3. Escalate up a cost ladder, and demand proof at each rung. Chat before email, email before anything heavier — and gate each hop on evidence the cheaper channel actually failed, not on vibes.
  4. When the model acts, hand it an allowlist, not discretion. Enumerate the safe fixes; enumerate the nevers (config, data, secrets, spend); everything else gets flagged, not fixed. That boundary is what lets you sleep while it works.
  5. Schedule the model only where work is guaranteed. A nightly digest has real triage to do every single night. A five-minute “check in with the LLM” timer is a subscription bonfire.

The point isn’t just the bill

The economics work — three agents run continuously on one flat subscription, with the cheapest capable model picked per task. But the discipline pays twice. Every model call you don’t make is also latency you don’t add, nondeterminism you don’t debug, and a failure mode you don’t inherit. Deterministic code is fast, testable, and boring in exactly the way infrastructure should be.

The model is genuinely good at the parts I reserved for it: reading a messy day of incidents, deciding which of four safe fixes applies, writing me a report a human would want to read. So let it do that — and only that. Make it the exception. Keep it out of the loop.