🤖 Meet OnCall AI, our observability copilot that makes troubleshooting easy. Read announcement.

Skip to content
Back to Videos
Events & Webinars

Deep Dive: Visual Pipelines Webinar

Jul 20, 2023 / 32:06

Recently, we launched Visual Pipelines – a single, point-and-click interface to manage your observability pipelines.

  • Different use cases for Visual Pipelines, including building, testing, and monitoring pipelines

  • How the feature will drive developer self-service and autonomy

  • What’s next for the Edge Delta Observability Pipelines product

Transcript

Riley Peronto:

My name is Riley Peronto. I'm the Director of Product Marketing here at Edge Delta. And I'm joined by Zach Quiring. Zach, you wanna take a minute to introduce yourself?

Zach Quiring:

Yeah, sounds good. My name's Zach Quiring, Director of Product for the Visual Pipelines and Agents kind of experience of Edge Delta. I've been here for about four years now and looking forward to walking through some of the updates.

Riley Peronto:

Awesome. Yeah, so as Zach said, today's webinar, we're gonna talk about Visual Pipelines, which is our new experience really for working with observability pipelines within Edge Delta. We've made some really great updates and so we're excited to walk through those with you. Agenda for today, we've got about five to 10 minutes of slides, marketing slides that I'll walk through and I promise to keep it quick and we'll get to the main event which is Zach walking through the product. But with that, let's dive into the talk track. And if you have any questions along the way, feel free to just drop those in the chat.

So to start, let's define Visual Pipelines. So what is this update we've made to Edge Delta? We have changed the observability pipelines experience to be a single point and click interface for you to manage your observability pipelines. So taking a step back, I think it's helpful to compare what observability pipelines look like in a lot of other products. So typically what you'll see is that the user will have to build and maintain different config files that reference one another and dictate where data is collected, how that data is processed, and where the data is streamed. As you can imagine, and I'm sure has many of you have experienced, managing these config files can quickly get complex. There are verbose files, and it's really hard to track changes between them. So we really wanted to rethink this experience, provide something that's a lot easier for our customers to work with.

So with that, the first thing I wanted to touch on is go through, like the three key actions you're actually going to be doing with this experience. So with that, the first thing, of course, is you're going to be building pipelines. So from this interface, you can configure your pipelines end to end. And we again, focused heavily on simplicity. So everything's done in a point and click drag and drop manner with a few text forms here and there. But again, as you can see in this GIF, you're able to onboard new data sources and connect them to the appropriate destination and a very intuitive workflow that's easy for your whole team to do.

Second part here is data processing. Of course, as you would expect with any pipelines products at launch, we have over 15 different processors ranging from analytics processors. So, maybe you want to extract metrics from your logs. Maybe you want to cluster them into similar patterns before they're shipped downstream. We have processors that help you shape and transform your data. We have processors that help you optimize your data footprint. So you're storing less data in your downstream systems. One of the things we really wanted to improve with this release was. is if you think about being a new user using kind of a config file first product, you might not know everything at your disposal. And it might be hard to understand, hey, what's the best way to work with this data set and what's even possible? So by presenting everything in this much simpler way, we're hoping that new users that you're onboarding on your team can kind of see what processors are available to them and pick a more best practices approach that's suitable for the given data source.

And then the third key action here is testing and iterating. So what we saw a lot with again, this kind of like old way of doing things that I keep referencing is that you might build your configuration, you might deploy that into some test environment to see if it works the way you thought it was gonna work. You've run some data through it and maybe it works, maybe it doesn't. And you kind of have to bounce between your pipelines product and the data that's flowing through it and see if things are working the way you want to. And it creates this kind of guess and check workflow that's not a time to fund a work through. So Zach and the team worked really hard to embed that whole process directly into the interface for managing pipelines. So you don't have to go anywhere. You can kind of build your pipeline, apply it to a given data set and make sure it's working as the way you want it to before you deploy it in production. So as I went through those slides, I referenced several challenges throughout them, but I think it's important to take a step back and understand like in the big picture why this is different and what things look like and how it's better now. 

What we saw by nature of this more complex way of shaping your data, processing your data before it goes somewhere else, is it creates this kind of tribal knowledge where what we saw is a small number of team members really understood the product inside and out And that might be one DevOps engineer, it might be a handful. But these guys were experts on it. And the rest of the team kind of became dependent on this small number of users to help them meet the needs of the entire organization. As I'm sure you can guess, and maybe some of you have experienced, this makes it really hard to scale pipelines across the organization because you have so many teams kind of dependent on this small number of people to get all processing functions done. And so as a byproduct of that, the org as a whole isn't experiencing the whole benefit they thought they were going to achieve. The as kind of a byproduct of that as an extension of that is if you're pointing a newer user at this more complex interface, it's really hard to expect them to just be successful right away and get it in this self service manner. So we really wanted to improve that whole experience end to end and make it a lot more approachable for the whole team to start collecting, processing and routing data the way that they needed to. 

So how's Visual Pipelines different? These are kind of what I like to refer to as our three key north stars of the release. And I've touched on a lot of these, but I really wanted to just emphasize these points again. So the first part here is simplicity. So this thing that used to be kind of complex to do wasn't easy to adopt across the organization. We've made it kind of dead simple to treat your data and the way you want it to and manage it from a single interface. You don't have to kind of work across different tools or do anything that's too complex. You kind of have everything you need to do in one interface and you can use clicks and text forums to get things done. 

Second part of yours is transparency. So again, referencing this old way of doing things. If you're looking at a bunch of different config files, it's hard to understand which components involved in a given pipeline. How are they working with one another? How are we processing data? We're providing more of a 10,000 foot view of all your pipelines. So you can see that at a glance, demonstrating to team members and even onboard new people and expect them to be successful right away. Another extension of that is when you get this 10,000 foot view of your pipeline and how data is flowing, it's really easy to understand the health of your pipelines and trust that things are working the way that you expect them to. And then the last part here is self-service. So as a byproduct of making things simpler and making everything more transparent, you can now confidently point your developers or your security experts at Edge Delta and confidently expect them to build pipelines and process data according to their needs. So it's a lot easier to drive that kind of self-service motion where everyone's not entirely dependent on this small number of experts get things done. So those are the main key benefits of using Visual Pipelines. 

At this time I'm going to hand it off to Zach to walk through the demo. Zach, before we do that, while you're pulling the demo up, is there anything you want to add to this?

Zach Quiring:

No, I think that sounds good.

Riley Peronto:

Excellent. I'll give screen sharing to you and you can take it away when you're ready. 

Zach Quiring:

Sounds good. Let me go ahead and share out my screen. should be able to see that coming through. Yep, that looks good. Go for it. Perfect. Yeah, so really appreciate you going into detail about some of the kind of basics, the concepts and some of the foundations as to why we developed and released the Visual Pipelines experience. So far it's been, you know, a lot of really good feedback and really uplevel the experience of working with processing, managing observability, telemetry data. So in this demo, basically just want to go through a quick walkthrough of some of the core capabilities, walk through a couple of different use cases and kind of show the basics and fundamentals of how the pipeline experience works. So starting with a few key concepts, Essentially at a high level, there are what we call inputs, processors, and outputs. So inputs are going to be all of the various methods and different data sets that are being collected. So here we can see we have a Kubernetes logs input that's defined. When we go into our input catalog here, we can see all the other input types that are supported. So anything from you know, collecting from files or from various ports or, you know, Kubernetes, Docker, HTTP, a lot of flexibility in that regard. The center section here, processors, this is basically any function that interacts with data in some type of way, performs some type of manipulation. So those are anything from, you know, it could be as basic as some type of filter or routing logic to more extensive shaping transformation and Richmond, all the capabilities there. Looking at the library here, we'll see things from various transformation functions and Richmond functions, extracting analytics from log data, which is a core aspect of the Edge Delta platform, performing different optimizations. So these are typically used for use cases where the goal is to optimize and make sure the data sets that are being routed to downstream tools or as efficient and valuable as possible, and then various filter options. 

Going into the output section, this is basically anywhere we might want to stream or feed data into. So in this example, we have Datadog destination for our metrics data. We have a Splunk destination for log data, but then also things like archive destination. So this would typically be low cost storage for sending a copy of any or all the raw data that's processed by the pipeline. It's also something that powers the kind of data rehydration aspect of the platform. Streaming destinations are going to be the kind of typically observability platforms that you might be using in your environment today. So things like Datadog, Splunk, Sumo, Logic, Elastic, and a lot of others there. And then, Edge Delta destinations as well. So from a platform perspective, we have a back end for consuming everything from log data and performing log searches and queries, interacting with that data to metrics and patterns. And I'll go into patterns in a little bit, and then trigger destinations for alerting off of interesting findings from the pipeline right off the bat. So recap of the setup that we're looking at here. We have our Kubernetes logs input. We can drill in and see what the configuration of this looks like. In this case, we're basically saying, hey, let's collect everything from all of our Kubernetes namespaces, but exclude a subset based on various Kubernetes metadata. In this case, pods and namespaces, but there's a lot of flexibility about how that works. From there, a couple of different processor nodes. So this mask, no type for mass social security numbers, we can come in here and see the definition. So basically we're extracting a key social security pattern and replacing any of the incoming data that might contain a sensitive string. In this case, social security with the string redacted. We can also drill in a little bit further. I'll go into this in more detail in a second in terms of how we can interact and test and iterate as we're producing some of these things. And then a couple, we call log to metric nodes. So these are for capturing any useful, summarized kind of metrics or analytics out of the raw logs that are being processed. So in this example, we're taking basically the entire stream of our raw logs and producing a metric that captures the error rate based on a lot of different dimensions and metadata, as well as the rate of exceptions. And in this case, feeds that downstream to Datadog. And then lastly, this, we just call it simple regex filter. So this is basically just saying, hey, if the logs contain, they're basically a trace level log message. We want to negate and drop those from coming through. 

So I'll go through a couple of different key pieces to be aware of from a pipeline perspective, starting with some of the testing and iteration process. So if we were, let's say, building one of these processors from scratch here, we can come in and maybe we update part of the pattern and we wanna test and make sure that that's aligned with the sample data that we're working with. Here we can simply see, it's able to capture the various patterns and apply some matching logic to make sure what we're building is working is expected. And then there's also this test processor function as well. So what's cool about this is it'll basically take any of the sample data that we're working with and simulate running that through all of the components of the pipeline up until the one that we're testing. So in this case, we're working with this mask SSN processor, which is actually just kind of the first node in the pipeline. So any sample data that we're working with, it would come off of this input and basically hit our mask SSN as is. But we can see when we test this processor, kind of the constructs of what we call a data item, which is sort of the structure of the events as they're flowing through. And then, this one, which is replacing the SSN, we can see that the incoming data into this node had that value in plain text and we can see that there. But the outgoing data that would hit all the subsequent components of the pipeline, we can see that the SSN component’s redacted. So, it's a very quick and easy way to ensure what you're creating, testing, is working as expected. 

It's also very helpful for just playing around with different processors, understand how they work, what functions they perform, and getting a feel for the platform there. And then I think going into that testing concept a little bit further, if we go into this drop trace function that we can start, drop trace log filter that we have here. In this case, this one is being routed from the mask SSN. So the expectation is all of the data that hits the mask SSN and then eventually this drop trace level and gets fed downstream to Splunk should have the mask applied to it. So in this case, when we test this processor, we'll see that the incoming data into this node already had the SSN, you know, the social security number redacted. And then from an outgoing perspective, we can see that there were, you know, these four sample logs that made it through, but this fourth one, which contained the trace, that's no longer in there. So we're kind of confirming that, you know, everything's behaving and looking as expected. 

So those are, you know, a few core constructs of some of the fundamentals, I’ll go into some of the process of, you know, kind of building out more complex pipeline logic here. So we might want to do a few things. I'll start with adding a archive destination. So let's say we want to add our, you know, some type of, let's say our S3 archive, from there we'll call this something like our archive rehydration bucket. In this case, we're also routing US West and we can come back and then populate some other details. So now we have our archive destination there. This might be a case where we say, after all the data has the SSN removed or maybe straight off the source, we wanna route that in. But in this case, I'll just pull this over and add that to our configuration. So we can see that we've kind of made a couple updates. If this was maybe the only change that we wanted to make to the pipeline, we can review our changes. That'll quickly at a glance show us exactly what was modified as part of the pipeline. In this case, a couple nodes were added, the S3 output and this link from the SSN to the S3. Well, we could go and deploy that. And if this was kind of an active pipeline running our infrastructure, within a few seconds that update would get rolled out, we'd see the changes. Our S3 bucket would start receiving this copy of the raw data on top of that. 

Now, if we wanted to start to kind of layer on some additional functions to the pipeline, maybe there's been a request to, I don't know, temporarily drop debug logs, if maybe something's spamming the backend and we want to get ahead of that. There's different interactions for adding elements to the pipeline. In this case, the raw traffic, let's say that's going into Splunk, we might want to inject something here to filter out a given dataset. We can do that in a few different ways. One useful interaction is just directly in line wherever we want to go. We can come in and perform. In this case, we might want to do another regex filter here. What’s cool about this menu and kind of this interaction experience is it has awareness of sort of what node types and what interactions are supported in different areas. So if you were trying to inject something that maybe wasn't valid based on where it lived in the pipeline, what came before it, what comes after it, all that context is automatically kind of displayed for you. So you don't have to sort of think through that logic or anything there. But in this case, I'll add a, we'll say something like, you know, drop, debug, and then if we go another area to kind of dig into is this concept of like validation. So we understand that things are working as expected as we're building this out. So it's telling us, you know, hey, this pattern is the pattern field is required for a filter. Let's say we left that off temporarily. When we go to review our changes, we would also be notified that, hey, this node that you added needs to be looked at before you can deploy this, maybe a required field's missing, or how you populated some of the parameters are invalid, all of those details are presented to you. So it's a very intuitive and experience and provides you the confidence that if you go and make a change or you are giving maybe another team member or another organization access to manipulate the pipeline, having the confidence that no one will go in and break anything or make any changes that cause problems. So having all the guardrails in place to kind of support that. 

But we can come back here. We'll say something like debug or debug. We can go and test this, make sure, hey, is this matching what we would expect, basically showing us that what we have looks good, review the changes, and now we can deploy the basics there. But yeah, a couple other things we might want to quickly do, just one concept that's key to some of the Edge Delta capabilities is this concept of pattern analytics. So I'll add an Edge Delta destination here for our patterns data, and we could say, everything that comes off the mask SSN. We want to pull that in. I'll actually go and add a, let's do an analytics, we'll create a logged pattern. What this does at a high level is basically take any of the incoming raw data, run this automated clustering and pattern analysis algorithm against it to summarize the log events that were seen, extract different types of analytics in terms of things like sentiment analysis. Are they negative log types? Are they potentially like errors or exceptions? So there's a ton of automated function that's baked into this, but setting it up is super simple. We could go and basically just create a single node, maybe route the data that's been masked and pull this into our patterns. And now we've included that. So we have a summarized dataset of all of our logs, which is from a data optimization standpoint, can be very significant. And we might also say, hey, let's also send this into Splunk as we wanna see that data there, review the changes and make sure everything looks good. 

So yeah, I think just wanted to walk through some of the core capabilities from, the fundamentals of the pipeline, adding and updating processors, testing them, validation. But yeah, this is a quick overview. I think we're only scratching the surface of some of the capabilities and pieces that are being built out. So the first release was just a few weeks ago. But there is a significant amount of updates coming down the pipeline. No pun intended there that we're super excited for. And the feedback and the implementations from our design partners and early beta customers, they've been really excited, but a lot of cool stuff we're excited to release pretty soon here. So hopefully that was informative and covers some of the basics, but I think that's the basics from a demo perspective.

Riley Peronto:

No, that's great, Zach, I appreciate it. We do have a couple of questions coming through in the Q&A section, so I think we can jump into that. And I see quite a few people online right now, if you have more questions, feel free to jump in and just leave a chat here. 

First one, I'll touch on this because you were just kind of showing it on the screen there. But of course, Edge Delta can be used not just as a pipeline to route data to other platforms, but also we have log search and analytics capabilities and you can use us for that as well. One of the questions coming in was, “How would I use Visual Pipelines with Edge Delta Log Search?” 

Zach Quiring:

Yeah, that's a great question. So I think in a lot of different ways. So basically the core of the Visual Pipelines experience is designed to solve a variety of different challenges. One of those is just key to observability pipelines, which is just simply transforming, shaping, enriching, basically getting incoming log data, which could hit the pipeline in any shape or format, or whatever it looks like, getting that into a desired, more valuable, more useful, intuitive format, whether it's adding different tags to it, maybe removing pieces that aren't needed, parsing out fields that are relevant and things like that. So in terms of the Edge Delta Log Search experience, we typically use the pipeline experience to basically take any incoming data, which could be a variety of shapes and formats and get it into a structure so that it's optimized by not only our search experience, but whatever downstream tool a customer might be using. So it's kind of the same use cases that if the destination was a Splunk, an Elastic, a Sumo, same type of deal, but it is something that is used to kind of power and shape data feeding into our own platform, which exposes facets and search labels and attributes and things like that. But yeah, kind of a generic regardless of the destination there. 

Riley Peronto:

That's great. So you're getting a lot of the same value and you're getting it to the, you're getting your data to the right format to make the most use of it, whether that's an Edge Delta or in Splunk or another platform. That's great. Second question here is a little bit more tactical. So this person saying “We use GitOps practices, APIs for automation, in version control systems for configs. How does that work with Visual Pipelines?”

Zach Quiring:
Yeah, no, it's a great question. So from a context perspective, under the hood, what is driving and powering the pipelines experience, there's two pieces to it. There are an Edge Delta, basically worker agents that run the pipeline that do this processing typically on the infrastructure that a customer is running. In this example, we're looking at data collected from Kubernetes. That means in a lot of cases, we would have the edge delta worker agents deployed as a daemon set. And they're really the location that this pipeline is running. What powers that would drive the pipeline under the hood is a YAML configuration file and kind of a set of components that build up that file. So the visual experience, although it can be powered from kind of a user interface and everything else, under the hood, it's still creating configuration files with various templating and variable placement and all those key components. So from a, you know, GitOps infrastructure as code standpoint, basically all of the same APIs, all of the same YAML constructs, et cetera, are all supported with this experience. So we absolutely have customers who, you know, have all of this rolled out and baked into their version control systems and, you know, use the APIs to kind of update and edit different components and for validation. But we do see that even those organizations tend to still leverage visual components of the pipeline experience to be able to kind of test and iterate into different things. But, you know, use their core automation tooling for the actual rollout and deployment and management of, you know, what gets deployed under the hood. 

Riley Peronto:

No, it's a great summary. Then the last question we had here is just simple, “Where are the processors actually running? Is there centralized infrastructure we're running on, or where is the processors actually running?”

Zach Quiring:

Yeah. So that goes back to one of the core constructs of Edge Delta and the reason we have “edge” within our name is basically, the core of the actual processing and everything from the collection of the data, the processing, manipulating, shaping, routing, transformation, and then eventually sending that to the various locations in, I would say the majority of deployments, that process is running on our Edge Delta, basically, like worker agents that you deploy. So in the case of something like Kubernetes, you're basically deploying an Edge Delta daemon set. So there's one pod per node within the cluster. And that in a fundamental nature is basically distributed off the cluster. Each agent is processing the incoming data that it sees, collecting that, processing it as well as routing it to the various destinations. So I would say that's the most common architecture. It's typically the Edge Delta agents that are performing this processing. There are cases where we'll take a more, we would call like centralized architecture where maybe it's a centralized pod of basically agents that are consuming incoming data, performing this processing and shipping it out. We also have a concept of basically hosted agents or pipelines where data can be shipped to our APIs and backend process there and then route it where it needs to go. But at the end of the day, it's our agents that are performing this function and the majority of use cases, those are natively distributed across infrastructure, which allows them to keep a very, very light footprint, when compared to a number of other kind of industry standard agents in the space that perform similar functions. So things like, if it's open source, like a Fluent D, Fluent Bit, File B, and then there's other vendor, native agents, like a Splunk Universal Forward or others. That's something that we have had a massive emphasis on from day one is making sure that because in a lot of cases we're deploying our agents to perform these functions, that there is lightweight as minimal resource as possible. And there's a lot of benchmarking that we do to make sure in the majority of cases, they actually come in less than the existing agent and pipeline components that might be deployed there. 

Riley Peronto:

That's great. No, great summary there. So that's all the questions we had for today. I'm sure a lot of you are wondering what's the best way to get started with Visual Pipelines. So we have a couple of different options here. So the simplest, most straightforward way is to go to our website, Edgedelta.com. And on the top of the screen, there is a button to get started with Edge Delta Free. Within this experience, we've exposed a few of our most powerful processors so that you can start experimenting with those, start deriving value. Within that experience too, you'll see as you start kind of exploring the capabilities that aren't supported in Edge Delta Free, for a limited time, we are offering what we're calling a white glove onboarding experience. So if you want to do more, if you want to test this out for free, we’ll work with you, onboard you and help you get value from the whole feature set and test it out to see if it meets your use case. So two experiences here, you could just go to Edge Delta Free, try it for yourself, or if you'd like to work with our engineering team and get more of a personalized onboarding, we can support that as well. And again, we're offering that for free for a limited time to help you figure out if this meets the needs of your use case. So with that, we’ll follow up with a recording of today's webinar. Really appreciate everyone joining today. And if there's no other questions that I think we can hop. Zach, anything else you want to add? 

Zach Quiring:

No, I think that sounds good and thanks everyone for attending. 

Riley Peronto:

Awesome. Have a good rest of the day everyone. Take care.Â