Webscale Networks Reduces Debugging Times by 91%
Webscale Networks is an end-to-end technology provider supporting thousands of e-commerce businesses worldwide. Traditionally, e-commerce companies support their business with several disparate technology providers – one for hosting infrastructure, another for content delivery, and so on. Then they’d compete to build a team of top-notch engineers to orchestrate everything internally.
Webscale resolves the pain and complexity of running an e-commerce technology stack by providing a one-stop-shop, offering everything from a multi-cloud, high-availability infrastructure platform to auto-scaling to security. It delivers all of this as a service.
Simply put, Webscale gives its customers peace of mind regarding their technology stack, so they can focus on running their business.
Monitoring Tool Sprawl Consumes Engineering Hours
Being an infrastructure provider, availability and performance are Webscale’s top priorities. As Nithyanand Mehta, Webscale’s VP of Support and Services, explains, “Every second of latency and every minute of downtime impacts our customers’ customers.” However, ensuring availability and performance was an arduous task for Webscale’s DevOps and SRE teams. The group maintained a sprawl of monitoring tools, including:
Out-of-the-box traffic analytics platform
Home-grown logging tools
Checkmk for infrastructure monitoring
Cloud-native monitoring tools: Amazon CloudWatch, Azure Monitor, and Google Cloud Monitoring
In total, the Webscale team had over half a dozen tools in place for monitoring generating roughly several TBs of data per day. These systems created alerts whenever there was a problem, which were sent to FreshDesk, Webscale’s ticketing platform.
In a given month, Webscale would receive over 2,000 alerts in FreshDesk, each taking upwards of 15 minutes to debug. This time was primarily due to all of their disparate data sources. “To debug an issue, our engineers had to work across different sources conducting cross-analysis, from metrics to logs to APM,” explains Dhanush J. Suhas, Webscale’s SRE Lead. This process added over 500 man-hours per month – time not spent on running queries or infrastructure operations.
Simply put, it needed to come down, and Mehta and Suhas established a goal to reduce this 15-minute process dramatically. They also wanted to ensure they used a scalable, cost-effective approach. “We wanted to make sure we had the most granular insights, and at the same time kept costs in check,” says Mehta.
Evaluating Centralized Monitoring Solutions to Enhance Visibility and Stay Within Budget
The Webscale team evaluated centralized monitoring systems, including both enterprise and open-source offerings, to consolidate their data. However, after assessing these platforms, the Webscale team found that centralizing their data sources into these platforms would create exorbitant costs. The group remained focused on gaining 100% visibility into the several TB of data they generated each day, but needed to do so while staying within budget.
That’s when the Webscale team started evaluating Edge Delta to pair with their prospective centralized monitoring platform. Edge Delta analyzes Webscale’s raw observability data at their different sources and centralizes the insights, statistics, and aggregates within their preferred observability platform. This approach gives the Webscale team a consumable yet complete view of their data sources. Furthermore, by decoupling where Webscale analyzes data from where it’s stored, Edge Delta allows the team to keep its SIEM tool spending in check.
Given these factors, it was clear Edge Delta would solve Webscale’s challenge. The team elected to use Edge Delta in tandem with Sumo Logic, eliminating the need for their engineers to work across a disjointed set of monitoring tools whenever there was an alert.
The Impact of Starting Observability at the Source
Once the Webscale team decided to move forward with Edge Delta, onboarding was straightforward. Suhas explains, “The Customer Success team has been great, whether it’s been basic onboarding or troubleshooting – or even delivering a feature request to help us move forward.”
Since implementing Edge Delta, the Webscale team has seen a noticeable improvement in its debugging process. Suhas remarks, “All of our logs and metrics are being optimized and centralized into a single platform, in turn helping our engineers quickly diagnose problems in one or two minutes, versus 15 or 20.”
The value of the platform goes beyond the debugging process. As Edge Delta analyzes raw observability data, it automatically surfaces anomalies and statistical deviations. In one instance, Webscale detected a credential stuffing attack in one of their customer’s environments. Edge Delta started alerting on the issue as it occurred, highlighting both the root cause and the impacted systems. Automated anomaly detection gives both Webscale and its customers a deeper trust that the platform is secure and will remain available.
Furthermore, Edge Delta has allowed Webscale to give its customers a level of transparency unmatched by most technology providers. “We have reached a state where we have centralized monitoring in place, and we have given customers access to these environments, so they have complete visibility,” Mehta says.
With Edge Delta in place, the Webscale team has reduced debugging times to less than five minutes per alert on average – a 91% improvement. Moving forward, Suhas and Mehta will work with the Edge Delta team to further accelerate this process. Additionally, the Webscale team will continue embracing innovative technologies to ensure the performance and availability of their customers’ e-commerce websites.