🤖 Meet OnCall AI, our observability copilot that makes troubleshooting easy. Read announcement.

Product

Streamline Your Detection, Alerting, and Troubleshooting Experience

Dec 19, 2022 / 5 minute read

Detect every issue, trigger alerts in real time, and simplify your troubleshooting processes.

DevOps and developer teams need to move quickly to keep up with deployment velocity and customer demands. However, it remains manual and slow to detect, alert on, and troubleshoot issues.

Teams have to manually create monitors to detect anomalies and define alert conditions. This can be time-consuming, and makes it difficult to catch every issue.
Alerts fire only after data has been centralized and analyzed. Larger datasets lead to higher latency and slower mean time to identify (MTTI).
When alerts fire, teams have to manually sift through raw logs, correlating the data to the alert to determine the root cause. This is time-consuming in itself, plus teams might not have access to all of the data they need.

Since log volumes have scaled so much over the years, it makes sense for today’s detection, alerting, and troubleshooting strategies to adjust accordingly. Modern observability practices should be built in a way that enables teams to identify all issues and determine their root cause in a matter of minutes.

Sounds too good to be true?

Edge Delta Has Entered the Chat…

Edge Delta automatically detects every anomaly as soon as it occurs, so you don’t have to manually build monitoring logic to catch issues that arise. Using machine learning, Edge Delta automatically establishes a baseline for your applications. It understands what behaviors are normal and which ones are anomalous. This way, even unknown behaviors don’t go unnoticed and teams aren’t spending excessive time on manual tasks.

Alerts are triggered at the infrastructure level with Edge Delta. This means that you don’t have to wait for data to be centralized and analyzed to be notified of an issue. Alerts are then based on sentiment scoring that ensures teams aren’t experiencing alert fatigue (more on this later).

When you receive an alert, Edge Delta provides context into the issues at hand. You can easily see where and why the issue is occurring without having to manually sift through data line by line.

Let’s explore these capabilities in greater depth.

Request a Demo

Anomaly Detection

When logs are created, Edge Delta…

Automatically aggregates repetitive messages into groups called Patterns
Extracts metrics from your log data

Both of these capabilities are used to automate anomaly detection (along with providing other benefits).

Once logs are sorted into Patterns, Edge Delta automatically detects abnormal spikes in “negative sentiment” patterns. These are based on negative language within a pattern, such as “failed,” “error,” or “exception.” Patterns are also be helpful in troubleshooting because they can help you quickly identify new behaviors without necessarily knowing what to look for.

At the same time, Edge Delta baselines all log metrics and alerts on any that are outside of normal range. So, you don’t have to guess the threshold of anomalous behavior. Additionally, these baselines account for date and time – what’s normal on a Tuesday morning might be anomalous on a Saturday evening.

Alerting

As anomalies are detected, alerts are triggered at the infrastructure level. So, you’re notified as soon as irregular behavior occurs rather than having to wait for data to arrive in your downstream observability platform. Edge Delta reduces false positives by basing alerts on sentiment scoring. This way, teams are only seeing alerts with a high probability of anomalous behavior. In one example, a leading travel technology company was able to reduce alert fatigue by 7% when using Edge Delta – a significant improvement given the scale of their environment.

Anomalies

Our Anomalies feature aids in the troubleshooting process by providing context into an issue. When an issue occurs, Anomalies provides the exact window of time the issue happened and any relevant data that contributed to the anomaly. It also communicates what systems, components, or services may have been affected. This way, when an alert fires, users no longer have to manually sift through their log data in order to figure out what happened and what was affected.

The Anomalies feature proactively highlights all the key information for quicker troubleshooting – it’s as if someone has already looked at your log data and told you exactly what you needed. As Justin Head, VP of DevOps at Super League Gaming noted, “Edge Delta surfaced [anomalies] almost immediately, and we were able to pinpoint the exact service that was throwing the errors, so developers could start looking into it and digging in,”

Kubernetes Overview

The Kubernetes Overview page allows you to observe your environment at a high level. So, you can quickly understand if every resource is behaving normally – and drill into individual components if not. Each resource is represented by a circle. The bigger the circle, the more logs the resource is creating. If a circle is red, there is an issue occurring. Here you can quickly access the insights needed to troubleshoot issues quickly.

Log Search

With Edge Delta’s new Live Search feature, you have the option to access all raw logs when you need to. All full-fidelity log data is stored using a column-oriented backend. This provides a cost-effective alternative to traditional log indexes, so teams no longer have to filter out valuable data.

Immediate Results at Zero Cost

Edge Delta streamlines detection, alerting, and troubleshooting processes. Customers are seeing clear results:

When Super League Gaming adopted Edge Delta, they were able to resolve issues within an hour – a process that previously took them upwards of six hours.
Webscale was able to reduce debugging times by over 3x when they made the switch Edge Delta.

In both cases, our customers have better anomaly detection capabilities, auto-generated insights, interpretable behavior analytics, and more. Wasting time on repetitive tasks related to detection and troubleshooting is a thing of the past. Tasks that were once manual are now done for you, empowering them to respond to issues faster and stay ahead of release cycles.