Qualiti Improves Time-to-Triage by Up to 70%
Founded in 2021, Qualiti is the first company to provide end-to-end application testing enabled by artificial intelligence (AI). The core challenge Qualiti solves for its customers is delivering as much testing coverage across an application as possible. As Principal DevOps Engineer, Derrick Walton explains, “You can have a fleet of QA engineers and take hours and hours to validate everything from top to bottom. We’re looking to plug in and assist those QA departments.”
Challenge: Poor Search Experience Slows Remediation
Qualiti’s core application runs natively on Amazon Web Services (AWS) and is built using Kubernetes and AWS Lambda. When the application was first released, all of the company’s log data was ingested directly into Amazon CloudWatch, which presented a usability problem. Walton explains, “It was painful for an engineer to look through and find useful information quickly.”
More specifically, Amazon CloudWatch uses a concept called “log streams,” which creates a sequence of log events that share the same source. When a log stream surpasses a volume threshold, Amazon CloudWatch will automatically create a new stream. Dividing logs up in this manner meant that an engineer had to sift through multiple data streams when they needed to locate specific logs. “When ingesting Kubernetes logs, we could see multiple streams per minute.” Walton continues, “Or we’d see application logs split across multiple streams, so we’d have to look through each one.”
This process resulted in two challenges. First, it took the Qualiti team longer to triage and ultimately resolve issues. Second, developers often required help from operations to pinpoint log data. “I wanted to ease the burden on our development team of locating the logs they needed,” explains Walton. “I didn’t want to have to jump in and continuously locate those logs for them.”
Solution: Edge Delta Streamlines Qualiti’s Search Experience
To overcome these challenges, Walton and his team began evaluating new observability tools to support logging. Soon into the process, the team started piloting Edge Delta. “The simplicity of install and performance were enough for us to give it a go,” notes Walton.
After deploying, the team saw several advantages of using Edge Delta Log Search & Analytics. Most notably, Edge Delta streamlined the Qualiti team’s search experience, making it easier to locate the events they needed. “If I need to do a quick search on my logs, I can surface the events, map those events to a time range, and pull other logs ingested from the same time,” Walton notes. “It’s been really helpful to look into any oddities in our stack.”
Moreover, the team has seen a lot of value from Edge Delta’s Anomalies. This feature gives the team more context into irregular events by providing the time window and data contributing to the anomaly. It also highlights affected systems, components, or services. “Edge Delta surfaces real anomalies. When they rattle through, we can just click into the anomaly and see the logs that pertain to the event,” says Walton. “Our ability to quickly identify something in logs and get an idea where to start has improved drastically.”
The operations team also leverages Kubernetes Overview to understand the sizing of their Kubernetes resources at a glance – without jumping into a command line terminal. “The Kubernetes Overview screen provided a really good snapshot of what our footprint looked like on each individual component,” explains Walton. “I can get a t-shirt size of what everything looks like based on namespace.”
As a result of using Edge Delta, the Qualiti team has fixed its usability problem and seen dramatically faster resolution times. “We’ve reduced the time to triage by 40% on average and up to 70% on some of the more difficult issues,” exclaims Walton. “Things that took an hour to fix now take us 15 minutes.”
What’s Next
Now that the team has solved its core challenges, it plans to upgrade its Edge Delta agent to v3. This will enable the team to create facets (along with providing other functionality), which will further optimize the search experience. “We’ll be able to pre-populate queries and have one-click searches.” Additionally, the team plans to begin extracting metrics from the logs it can monitor over time.