Engineering

Monitoring Applications Using Edge Delta Agent + Prometheus

In this post (Part 2/2), we will add the missing piece, the Edge Delta Agent.
Ozgur Batur
Senior Engineering Manager
Sep 9, 2020
7 minute read
Share

See Edge Delta in Action

In the first part of this series we deployed the Prometheus Operator Helm chart to start building up our monitoring suite. With the metrics, alerting server and integrated dashboards, we can monitor cluster health, pods, nodes and Kubernetes workloads resource usage and statuses. However, we have yet to monitor more interesting and relevant things such as production applications and services running for customers that provide business value.

As a reminder we deployed Prometheus Operator under Kubernetes namespace “monitoring” and release name was “promop”.

In this post (Part 2/2), we will add the missing piece, the Edge Delta Agent (link to part 1/2 here). Edge Delta’s Agent is the distributed “edge” component of the Edge Delta observability solution.

The Agent system:

  • Tracks applications, services, devices at the edge where telemetry data is generated
  • Keeps track of metrics
  • Intelligently analyzes log data
  • Automatically generates baselines against analyzed telemetry data
  • Detects anomalies
  • Collects contextual information when an anomaly occurs
  • Generates alerts and notifications

These additional capabilities make detecting issues and fixing problems much more efficient.

On top of these abilities, the Edge Delta Agent can also function as a Prometheus exporter (on steroids) for organizations who want to also keep their existing monitoring pipeline. So let’s deploy the Edge Delta Agent and really monitor our application.

Subscribing

Edge Delta Agent is connected to a cloud managed service. First we need to register in a couple of steps for installation.

First visit https://app.edgedelta.com.

Follow the prompts to create an account:

.
.
.

Create an organization where you can manage your agents by clicking ADD NEW ORGANIZATION.

.

Fill in Organization Name and Description and click CREATE ORGANIZATION.

.

After creating the organization we are greeted by the dashboard, which is unsurprisingly empty (as we currently have no agents deployed). Let’s click Configurations on the left navigation menu.

Configuration

Configurations have three sample configurations to monitor common platforms. Since we want to monitor our Kubernetes pods, click CREATE CONFIGURATION.

Here is a simple configuration to collect logs from all pods in “myapp” namespace. You can also use “default” if you do not run your services in a namespace. Regardless, you will need to update “namespace=myapp” in line 10 to a namespace that exists in your cluster, alternatively you can specify a pod or set of pods via “pod=pod-prefix*”

The agent collects logs and analyzes them to create log clusters, which are log patterns that are generated, giving valuable insight into the status of your app without further application specific configuration.

version: v2
agent_settings:
 tag: prometheus-k8s-demo
inputs:
 kubernetes:
   - labels: myapp
     include:
       - "namespace=myapp"
processors:
 cluster:
   name: clustering
   num_of_clusters: 50
   samples_per_cluster: 2
   reporting_frequency: 30s
workflows:
 myapp-wf:
   input_labels:
     - myapp
   processors:
     - clustering

Reminder: Make sure to change line 10 (namespace)

Paste above config after changing namespace, tag and description as you desire as seen below:

.

Click the Save button, and bottom of the Configurations table you can see the new agent configuration we created.

.

The UUID starting with 0f4… is the generated API Key for my configuration, yours will be a different key. Keep that key and we will set it as a secret Kubernetes environment variable for your Edge Delta agent deployment on your cluster.

First we need to deploy the agent itself. The below command will deploy the agent without requiring any change:

Run Command:

kubectl apply -f https://edgedelta.github.io/k8s/edgedelta-agent.yml

Expected Output:

namespace/edgedelta created
daemonset.apps/edgedelta created
clusterrolebinding.rbac.authorization.k8s.io/edgedelta created
clusterrole.rbac.authorization.k8s.io/edgedelta created
serviceaccount/edgedelta created

This will deploy the agent pods in edgedelta namespace but the agent will not yet start properly since we are missing the ed-api-key secret.

Let’s create the secret, ed-api-key should be set to your API Key fetched from UI above:

Run Command:

kubectl create secret generic ed-api-key \
   --namespace=edgedelta \
   --from-literal=ed-api-key="0f415c6c-83c3-4a02-8532-092abf12d1bc"

Expected Output:

secret/ed-api-key created

Wait a few seconds, then check the pod status:

Run Command:

kubectl get pods -n edgedelta

Expected Output:

NAME              READY   STATUS    RESTARTS   AGE
edgedelta-dpcd8   1/1     Running   0          5s
edgedelta-wdnl8   1/1     Running   0          12s

I have 2 log generating pods running under myapp namespace.

Check the status page.

.

As seen above, this 2 node Kubernetes cluster is visible and events from myapp namespace are successfully flowing.

To be able to guide Prometheus to get metrics locally from Edge Delta Agent we need one more step. Download edgedelta-prom-servicemonitor.yml

You need to update “promop” in value in edgedelta-prom-servicemonitor.yml with your helm release name for prometheus, if you give another release name:

release: promop

You can check helm release names as below:

Run Command:

helm ls --all-namespaces

Expected Output:

NAME   NAMESPACE REVISION UPDATED                             STATUS   CHART                     APP VERSION
promop monitoring 1       2020-08-28 19:49:30.516141 +0300 +03 deployed prometheus-operator-9.3.1 0.38.1

Also you need to update service monitor name to match the release name in same file:

# name should match default servicemonitors
# kubectl get servicemonitor -n monitoring
name: promop-prometheus-operator-edgedelta

Now we are ready to apply service and servicemonitor configuration for prometheus.

Run Command:

kubectl apply -f edgedelta-prom-servicemonitor.yml

Expected Output:

servicemonitor.monitoring.coreos.com/promop-prometheus-operator-edgedelta created
service/edgedelta-metrics created

After waiting a few seconds, we can connect to our Grafana dashboard. As you recall from the Part 1 here is the way to do that:

Run Command:

kubectl port-forward svc/promop-grafana 8080:80 -n monitoring

Expected Output:

Forwarding from 127.0.0.1:8080 -> 3000
Forwarding from [::1]:8080 -> 3000

Monitoring Log Clusters

Open your browser to http://localhost:8080/

.

Default Grafana username is admin and password is prom-operator.

.

Click + icon and select Dashboard to create a new Dashboard

.

In our new click + Add new panel. And add the query sum by (pattern) ({metricType=”cluster”}) which gives the number of log clusters collected and analyzed.

For better visualization set Legend as {{Pattern}} in same panel.

.

Set panel name as Log Clusters and click Apply at the top right corner.

.

And we have our application logs clustered depending on its variance within the application context automatically. Since my log generating app generates web server access logs, logs are grouped according to request types, protocol(http or https) and coming user agents. You might see a very different distribution depending on your application.

.

Instead of a graph you can also view cluster groupings as a table in Grafana. Click Add Panel disk icon again, paste the same query sum by (pattern) ({metricType=”cluster”}) and legend {{Pattern}}. This time select the Table visualization at the panel.

.

To get the cluster distribution table at latest reported instant, enable Instant toggle below the query and click Value table header to sort log patterns in decreasing order in table.

.

Click Apply.

.

Now we have a table where we can see the grouped log patterns and easily check the most frequent or least frequent events easily. These are usually the most interesting ones to figure application behavior or identify an issue.

Click the disk icon at the top right to save the Dashboard so that you can refer it whenever you need.

.

Monitoring Errors

Cluster visualization is really helpful to have an idea but you might want to get actual metrics and detecting anomalies from your application.

Consider this demo application which generates lots of web access logs. I want to monitor the response code of request so that I can identify internal errors, or unauthorized access attempts. Here is a sample log from my demo application:

56.122.20.238 - bruen7530 [01/Sep/2020:09:22:50 +0000] "GET /collaborative/whiteboard HTTP/1.1" 403 29347 "http://www.chiefefficient.com/24/7/infrastructures/empower" "Mozilla/5.0 (X11; Linux i686) AppleWebKit/5322 (KHTML, like Gecko) Chrome/40.0.865.0 Mobile Safari/5322"

Status codes are found between HTTP/..” and the next space.

Let’s match the status code via simple regexes below:

  • Unauthorized: HTTP.\* 403
  • Internal Errors: HTTP.\* 500

You can use https://regex101.com with your log sample to find the correct pattern. Select Golang to make sure you use the right dialect of regex.

.

Let’s update our config with this pattern by defining http-status regex processor and adding it to our workflow as below:

version: v2

agent_settings:
 tag: prometheus-k8s-demo

inputs:
 kubernetes:
   - labels: myapp
     include:
       - "namespace=myapp"
processors:
 cluster:
   name: clustering
   num_of_clusters: 50
   samples_per_cluster: 2
   reporting_frequency: 30s    
 regexes:
   - name: auth-error
     pattern: HTTP.* 403
   - name: internal-error
     pattern: HTTP.* 500

workflows:
 myapp-wf:
   input_labels:
     - myapp
   processors:
     - clustering
     - auth-error
     - internal-error

To update the configuration you need to open your config in Configurations, and click the edit button with pencil icon.

.

Paste the config and click save

.

After waiting a few seconds for the agent to fetch the new configuration and start collecting metrics we can add our new metrics to Grafana.

For authentication failures we will monitor rate of change with Prometheus query:

sum(irate(auth_error_count[30s]))

For internal errors we will use query Prometheus query:

sum(irate(internal_error_count[30s]))

.

Here is the dashboard board with new metrics.

.

Instead of trying to figure out trends in changes manually you can simply replace query with auth_error_anomaly1 and internal_error_anomaly1 to monitor the anomaly scores automatically generated by Edge Delta agent.

.

In addition to this Grafana Dashboard we can check anomalies in Edge Delta Dashboard:

.

We have successfully achieved a high level of visibility about our cluster health and resources, general state of our applications, and specific metrics that can be collected from application logs!

Although this was just scratching the surface, Edge Delta has lots of options and configurations for different use cases. My next blog post is going to delve deeper into the more of the Edge Delta functionality, stay tuned!

If you don’t want to wait – you can submit a support request or contact the team for additional scenarios like configuring alerts and getting anomaly contexts, tracing transactional operations, endpoint security anomaly detection and more.

References:

Prometheus Aggregation Operators

Grafana Queries

Related Posts

Enhancing Your Datadog Workflow

Enhancing Your Datadog Workflow

Aug 17, 2022
7 minute read

Stay in Touch

Sign up for our newsletter to be the first to know about new articles.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.