Edge Delta Benchmarks

Open, neutral benchmarks for AI incident reasoning and observability pipeline performance. Every model gets the same data and the same tools. We measure the reasoning.

Meet the AI Teammates

Blast Radius Bench

61% top pass rate

Live

Can AI reconstruct the failure chain?

When five services are on fire at once, can a model separate the root cause from the blast radius and recover the directed path the failure took?

View benchmark →GitHub

Noise Bench

92% top pass rate

Live

Can AI tell a real incident from alert noise?

It's 2am and twenty alerts just fired. A few are real, most are noise. Can a model decide who to wake up, catching every real incident without drowning in the flaps?

View benchmark →GitHub

RCA Bench

100% top pass rate

Live

Can AI find the commit that broke prod?

Drops a frontier model into a frozen incident and asks the only question that matters at 3am: which commit caused this?

View benchmark →GitHub

Pipeline Performance Bench

4.6x faster

Live

Can a pipeline handle petabytes of data?

HTTP log ingestion throughput compared across Edge Delta, Cribl, the OpenTelemetry Collector, and Fluentd under identical conditions.

View benchmark →GitHub

Field notes

Same bug, four minds

One production panic paged four frontier-model agent teams at the same instant. Same diagnosis, same one-line fix, four different definitions of done. A behavioral teardown of a single real incident, model by model.

Read the teardown →

See Edge Delta in Action

Get hands-on in our interactive playground environment.

Activate Agents

Cookie Settings

Essential Cookies

Analytics Cookies

Marketing Cookies

Edge Delta Benchmarks

Blast Radius Bench

Can AI reconstruct the failure chain?

Noise Bench

Can AI tell a real incident from alert noise?

RCA Bench

Can AI find the commit that broke prod?

Pipeline Performance Bench

Can a pipeline handle petabytes of data?

Same bug, four minds

See Edge Delta in Action