Agi Increases Alert Speed and Reduces Logging Costs
Agi is able to detect issues without having to anticipate them in advance, enabling them to provide a positive user experience. Before we dive in, let’s provide some background on the company.
Agi is a financial technology company based in Brazil that serves over 4.5 million clients. It provides several banking products and services, including personal loans, international debit and credit cards, insurance, and online banking services. In 2022, it was named one of CB Insights’ Fintech 250, recognizing the most promising companies in the space.
To ensure the health of its core offerings, Agi relies on several observability and monitoring tools, including Amazon OpenSearch, Dynatrace, and Prometheus. Agi also uses Edge Delta to support the monitoring and alerting of its offers engine – a Kubernetes-based application that extends special offers, discounts, and other benefits from financial providers to customers. The offers engine consists of over 800 services running in a Kubernetes cluster.
The Challenge: Creating Alerts Was Time-Consuming and Issues Were Hard to Predict
Prior to adopting Edge Delta, the Cloud Engineering team would work with application owners to build alerts across each individual service. Since each service had different behaviors and usage patterns, it quickly became difficult to build alerts in a repeatable manner.
The Cloud Engineering and application teams would work together to:
Determine what behaviors to alert on
Instrument code to support custom metrics
Configure the alert within their incident response tool
Test the alert to ensure it worked in production
The process of creating alerts could take anywhere from one to five days, which was a significant drain on all parties involved.
Moreover, it was impossible for the teams to predict every problematic behavior across so many services. Sometimes users would report issues to the team, as the alert system failed to detect issues or identified them too late.
How Agi Improve Alerting with Edge Delta
Agi’s IT Infrastructure and SRE Manager, Rafael Salerno de Oliveira, and his team were determined to resolve this problem. He enlisted Bruno da Silva Verch, a Cloud Engineer Specialist, to help evaluate potential solutions. Together, they sought a monitoring tool that would enable them to anticipate issues as much as possible while also removing the burden of manually configuring alerts at scale. That’s when the team began evaluating Edge Delta.
The team started with a proof-of-concept deployment and it quickly became clear how simple it was to create alerts with Edge Delta. When you start using Edge Delta, it analyzes every ingested logline in two primary ways:
Grouping together repetitive logs into common patterns
Extracting metrics from log data
These foundational analytics run out of the box. Additionally, Edge Delta baselines each dataset, so it can automatically decipher which behaviors are normal and which are anomalous. So, if there is an abnormal spike in negative sentiment logs or if a tracked metric falls outside of normal ranges, Edge Delta will trigger an alert immediately.
This approach has dramatically simplified alert creation for the Agi team for a couple of reasons. First, they could base the alerts on existing logs – there was no longer a need to instrument code to extract custom metrics. As Verch explains, “if the team has information logs, it takes just a few minutes for alerts to start working.”
Secondly, Edge Delta would automatically identify any issues that the team hadn’t anticipated in advance. As a result, the team could resolve a wider range of issues and more confidently deliver a great customer experience.
Additional Benefits of Using Edge Delta
Easy to Deploy and Manage
Once Agi deployed Edge Delta into production, the team started experiencing success. Previously, Agi’s monitoring platforms required extensive configuration, but Edge Delta was easy to set up and realize value from. “We don’t need a specific observability team working to configure Edge Delta. It’s easy to set up and it just works,” notes Verch.
In addition to accelerating alert creation, Edge Delta helped Agi receive alerts faster and more accurately than its other monitoring tools. Edge Delta is deployed at the source – typically within the customer’s compute environment. As a result, it can analyze data and derive insights faster than if you were to send data downstream to a traditional monitoring platform. This attribute can dramatically increase mean time to detect (MTTD).
Verch recalls one specific instance, “We had a case two days ago, where we received an alert from Edge Delta five minutes faster than our other tool. Both were reporting the same exact issue.” These faster alerts help the team address issues before they impact their customers, providing a huge advantage given the number of users it supports.
Reduced Logging Costs
On top of these benefits, the Cloud Engineering team also gained more control over their log data. Previously, it was difficult to understand which log data was critical to their team and which data wasn’t. With Edge Delta, the team now has visibility into every log pattern and can decipher which data is required for troubleshooting. As a result, the team has more control over what they route to their OpenSearch instance.
Agi continues to deliver innovative banking solutions to its customers. Edge Delta’s accuracy and ease of use will help the company as it moves forward. “Edge Delta helps us see how everything is working faster, so we can ensure our customers receive the best possible experience,” explains Salerno.