I've been watching AI agents handle real incidents for long enough now to see a pattern. Both our own demos and what customers do once they get their hands on the product. The agents that resolve incidents fastest aren't the ones with the smartest model or the fanciest prompt. They're the ones whose data is closest to them.
Not geographically close. Semantically close. One schema, one query language, one access model. That's it. That's the difference.
Everyone is connecting AI agents to observability platforms over MCP right now. The protocol is elegant, the ecosystem is growing fast, and you can wire an agent up to a metrics backend or a log store in an afternoon. But elegance at the protocol layer doesn't tell you much about what happens when an agent is actually under pressure. That's where I keep seeing the same thing: the agent that has to reach across platforms loses to the one that doesn't.
I'd call this data locality for agents, and I think it's the most underappreciated decision in the whole agentic observability space.
What actually slows agents down
An agent under incident pressure has a pretty tight budget: one context window, a limited number of tool calls, and a clock ticking toward resolution. Anything it does that isn't reasoning about the incident is wasted.
Now picture a typical federated setup. The agent queries metrics in PromQL, logs in another DSL, and traces through a third interface. Each backend has its own field names (service.name in one, service_name in another, a label in the third). The agent has to learn three schemas, translate between three query languages, and reconcile three response formats. Every step is a tool call. Every mismatch is a retry. The context window fills up with API payloads, and there's no room left to think.
I've watched this play out enough times that it stopped feeling like an edge case and started feeling like the default. The agent isn't bad at reasoning. It just never gets to do any.
A platform with semantic locality takes this off the table. One schema means the agent orients with a single discovery call. One query language means syntax validation is a single tool. Correlating a log pattern with a metric anomaly becomes a query parameter change, not three API calls and a reconciliation step in context.
This isn't really about where the data physically lives. You still collect it with distributed agents at the edge, process it through pipelines, and enrich it along the way. The point is that by the time the AI teammate actually queries it, everything speaks one language.
The tradeoffs worth being honest about
There are real tradeoffs here, and I don't want to hand-wave them.
Cost and duplication. If hot-path telemetry goes to the hub but you also archive it in a cheaper tier, yes, there's duplication. But the hub only needs hours to days of retention for agent investigations, not months. That's bounded, and slower investigations usually cost you more than some duplicated storage.
Freshness. Stale mirrored data can be worse than remote live data. If the hub adds meaningful ingestion latency or eventual-consistency lag, the agent ends up reasoning over outdated signals. This is a pipeline problem. Solvable, but you have to actually measure it, not assume it away.
What happens when the hub is degraded during the very incident you're investigating? Fair question, and the answer is the same as for any critical dependency: redundancy, health checks, graceful degradation to cached results or human escalation. The hub pattern doesn't eliminate resilience engineering. It just concentrates where you apply it.
Where this stops being theory
Everything I've said so far is about semantic locality in the abstract. A well-designed federation layer with strong schema normalization could probably get you most of the way there. Where it can't follow is execution.
This is the part I'm most excited about, honestly.
When an investigation on Edge Delta's AI Team produces more data than fits in the model's context window, the agent spins up an isolated execution environment with its own file system. The raw data lands there. Then the agent writes Python and Bash against it: grouping by arbitrary fields, computing distributions, correlating patterns across signal types, cloning repos and grepping through full codebases. Only the distilled findings come back to the model.
The first time I watched this happen on a real investigation, it didn't look like an AI tool at all. It looked like an engineer sitting at a terminal with a local copy of the data, iterating through hypotheses. Which is basically what it was.
The agent isn't stuck with whatever return schema a pre-defined tool hands it. It has the same freedom as that engineer. Each sandbox is isolated to one investigation and deprovisioned when it ends.
This is the point where data locality stops being a metaphor. The data is on a file system. The agent writes code to process it. The computation is local.
You can normalize schemas across platforms. You can build beautiful federation layers. What you can't do is drop an agent into a shell session with the raw data of a third-party backend. That's not a federation problem. It's an architecture problem, and it only gets solved when the data plane and the AI plane are the same platform.
What I'd actually measure
If you're evaluating agentic observability, whether it's ours or anyone else's, these are the metrics I'd look at:
- Tool calls per investigation. Fragmented setups need more of them.
- Malformed query rate. Drops sharply when there's a unified query language.
- Context window utilization. How much is the agent reasoning, and how much is raw API payload crowding it out.
- Sandbox offload ratio. How much data is processed locally versus loaded into context.
- Human handoff rate. When remediation tools live at the hub, this drops too.
Track these and the direction is pretty consistent: the closer the data is to the agent, the faster and more reliable the investigation.
The hub is the product
Edge Delta started as a telemetry pipeline company. The agents, the schema, the query language, the pipeline engine, all of it was built by the same team. AI Team inherits that data plane relationship, and honestly, that's not something I fully appreciated until I started seeing what agents could actually do with it. The MCP tools aren't wrappers over somebody else's API. They're primitives of the same platform that defines the schema and runs the pipelines. And when even that isn't close enough, there's a sandbox.
This is what Observability 3.0 actually looks like in practice: the platform that processes your telemetry is the platform that reasons over it. Once you've seen an agent operate that way, it's hard to go back.