May 19, 2026 · Tim Fraser
Two models, two jobs
Most AI agent systems run every call through the most capable model the budget allows. It's the simplest thing to build and the easiest thing to explain. It's also the reason their unit economics behave the way they do.
I argued in February that AI costs behave like infrastructure, and that the answer is tiered routing: frontier models for the work that benefits from them, cheap capacity for everything else. This is what that looks like in practice when you build it.
The pattern is two models doing two jobs. One cheap and fast for discovery. One capable for analysis. They don't share a prompt, they don't share a success criterion, and they don't share a budget. They're two different shapes of work that happen to live in the same agent loop.
Discovery is breadth
Discovery walks the surface area. It lists what exists, tags what looks worth a second look, and stops.
The model doing this work doesn't need to be smart. It needs to be cheap enough that you can let it look at everything without flinching. If discovery costs five cents a run, you run it on every region, every service, every account, every resource type, and the bill is still negligible. If discovery costs fifty cents a run, you start cutting corners. You skip the regions that "probably don't matter." You sample instead of enumerate. Coverage gets thin, and the thing you missed is exactly the thing you needed to find.
What discovery produces is a shortlist. Not an explanation, not a recommendation, just a list of things flagged for closer attention with enough context that the next step knows where to look. Region, resource ID, the signal that caused the flag, maybe a timestamp. That's it.
Enumeration isn't the hard part. boto3 lists resources. Cloud Custodian applies rules to them. The cheap model does the part neither of those can: deciding which listings look worth flagging without a rule for every anomaly you'd want to catch.
Discovery should be wrong sometimes. It should over-flag. A discovery pass that returns zero flags every week is either looking at a perfectly quiet environment or, more likely, not looking hard enough. The whole point of running a cheap model across everything is that you can afford to be generous with what it surfaces, because the expensive model is going to filter the noise out in the next step.
The first time I tuned a discovery pass for high precision and got a clean week, I assumed the system was working. It wasn't. Tuning back to over-flag surfaced three things on the next run that the precise version had filtered out.
Analysis is depth
Analysis takes the shortlist and works out what each flag actually means.
This is where the capable model earns its cost. It pulls supporting data for each flagged item, reasons about whether the flag matters, weighs the evidence, and produces a verdict with the reasoning behind it. An anomalous resource gets investigated against its history. A cost spike gets traced to the change that caused it. A misconfiguration gets evaluated against the surrounding context to see whether it's a real risk or a known exception.
Being wrong here is expensive in a way being wrong in discovery isn't. A false positive from discovery costs you a few cents of analysis time. A false positive from analysis ends up in a report that a human acts on. Different stakes, different model.
Analysis doesn't need to be broad. It only ever sees the shortlist, which is usually one or two percent of what discovery looked at. The capable model gets to spend its reasoning budget on a small number of items, deeply. That's the trade. You pay frontier prices on the small slice that benefits from frontier reasoning. The rest of the surface area was handled by something an order of magnitude cheaper.
The collapse failure mode
A single capable model doing both discovery and analysis is slower, costlier, and worse at the discovery half. It over-reasons about things that didn't need reasoning. It produces narrative when it should produce a list. It spends frontier tokens deciding whether a resource is worth investigating, which is the exact decision a cheap model could have made for a fraction of the cost.
The first version of plainfra ran everything through Sonnet. Discovery passes that should have been a list of region and resource pairs came back as paragraphs of analysis nobody asked for, at frontier-tier cost per run. Splitting the roles out wasn't an optimisation, it was an admission that I'd been asking one model to do two different jobs.
Most implementations marketed as "tiered" are really one model with a discount mode. The cheaper tier kicks in when load spikes or budgets tighten, and the system degrades gracefully. That's not tiering. That's hedging. Real tiering uses both models on every run, on purpose, because they're doing different jobs.
The pattern extends across vendors
The two-model split isn't AWS-specific. It's a way of structuring agent work that holds anywhere you have a read-only API surface worth walking.
A discovery model walking AWS does the same job as one walking Sumo Logic: enumerate what's there, flag what looks worth investigating, stop. The analysis model on each side does the same job too: pull the context, reason about it, produce a verdict.
Once you've run both vendors through the pattern, their findings arrive at analysis in a common shape. From there, a single report can combine them. A cost anomaly in AWS sits alongside a query volume spike in Sumo Logic, and the analysis model can reason about whether they're related without caring which vendor produced which signal.
This is the leverage the two-model pattern gives you that single-model agents can't match. Discovery is cheap enough to run broadly across every vendor you care about. Analysis is focused enough to synthesise across them. The expensive model never has to walk a Sumo Logic API or an AWS API directly. It sees the shortlist from each, and it works on the shortlists.
What this is, in one line
Two models, two jobs, on every run. Cheap for breadth, capable for depth, across whatever surface area you point them at.
Once you have that, the question stops being which model to use and starts being which surface to walk next.
Disclosure: I'm building plainfra, a read-only agentic ops assistant for AWS that uses this pattern. The discovery role is called Hawk in the codebase, the analysis role is called Sentinel. Both names earn their keep: a hawk sweeps wide and spots movement, a sentinel stands at a post and watches one thing closely. You can see plainfra at plainfra.com.