Kubernetes Quick Start Guide Part I




VMware Aria Operations for Applications provides an integration for monitoring the health and performance of your Kubernetes environments (including TKG, OpenShift, etc). 


Download and Installation

  1. Go to your TO Instance https://CLUSTER.wavefront.com/ and click on the integrations button in the top menu bar

  2. Click on the Kubernetes tile (you can find it in the Featured section)

  3. Click on the Setup tab and click on "Add Integration"

  4. Select the option that is most applicable to you and click Next.
    • Install in Tanzu Cluster - select this if you are using TKG
    • Install in OpenShift Cluster - select this if you are using OpenShift
    • Install in Kubernetes Cluster - select this for any other flavor of Kubernetes

  5. Let's walk through the steps for Install in Kubernetes Cluster. First, ensure that you have Helm installed. You can verify that Helm is installed by running this command on the command line:
    helm version

    You should see an output similar to the following:
    version.BuildInfo{Version:"v3.8.2", GitCommit:"6e3701edea09e5d55a8ca2aae03a68917630e91b", GitTreeState:"clean", GoVersion:"go1.18.1"}

  6. Run the following command to ensure that the Wavefront repository is configured within Helm:

    helm repo add wavefront https://wavefronthq.github.io/helm/ && helm repo update

    You should see an output similar to the following:
    "wavefront" has been added to your repositories
    Hang tight while we grab the latest from your chart repositories...
    ...Successfully got an update from the "wavefront" chart repository
    Update Complete. ⎈Happy Helming!⎈

  7. Enter your cluster name into the specified field. If you are installing on a test cluster that does not have authentication configured, click on "Additional Settings" and check off "Use Kubelet's read-only port". Then, copy the corresponding kubectl command that is generated.

  8. Paste the command into your command line and run it. You should see an output similar to the following:
    NAME: wavefront
    LAST DEPLOYED: Tue May 10 19:29:32 2022
    NAMESPACE: wavefront
    STATUS: deployed
    Wavefront is setup and configured to collect metrics from your Kubernetes cluster. You
    should see metrics flowing within a few minutes.

    You can visit this dashboard in Wavefront to see your Kubernetes metrics:


  9. Installation is now complete. After a couple of minutes, you should see that the Kubernetes integration tile is marked with a green checkmark to indicate that data is flowing successfully.

  10. See the Out-of-the-Box Dashboards section to learn more about exploring the available out-of-the-box dashboards.


Out-of-the-Box Dashboards

Once the integration is complete you are now ready to monitor your Kubernetes environment using Out-of-the-Box Dashboards.

You can find these dashboards in the Kubernetes integration tile:

  1. After logging in click on Integrations in the top menu bar.

  2. Search for the Kubernetes integration and click it.

  3. Click the Dashboards tab.

Here is a summary of the available dashboards:

Dashboard Description

Kubernetes Summary

Health summary of all Kubernetes clusters and workloads



Kubernetes Clusters

Detailed health overview of cluster-level components 

  1. Clusters
  2. Nodes
  3. Namespaces 
  4. Pods
  5. Containers

Kubernetes Nodes

Detailed health of Nodes 


Kubernetes Pods

Detailed health of your pods broken down by node and namespace. 


Kubernetes Containers

Detailed health of your containers broken down by namespace, node, and pod.


Kubernetes Namespaces

Details of your pods or containers broken down by namespace. 


Wavefront Collector for Kubernetes Metrics

Internal stats of the Wavefront Collector for Kubernetes. 


Kubernetes Control Plane





Out-of-the-Box Alerts

In order to access the Out-of-the-Box alerts:

  1. After logging in click on Integrations in the top menu bar.

  2. Search for the Kubernetes integration and click it.

  3. Click the Alerts tab and then click on the green Install All button.mceclip2.png

  4. Once the alerts are installed, there will be an "edit" link next to each alert.
    Note: There will also be a message in the top right corner indicating that the Alerts were installed without targets 

  5. Click edit on the alert of interest, scroll to the "Recipients" field and add alert target(s) to specify where notifications should go:

  6. Click save in the upper right corner.

  7. Once an alert fires, the notification will include a link to the Alert Viewer page, which allows you to investigate the alert. 


Example: Investigating an Alert

One key out-of-the-box alert ("K8s too many pods crashing") tracks when too many pods start to crash. In this section, we'll walk through investigating this alert as an example of how you can use the out-of-the-box dashboards and alerts to monitor your Kubernetes environments.

This alert triggers when the following condition has been met:

count(ts(kubernetes.pod.status.phase, phase="Running" or phase="Succeeded"), cluster) / count(ts(kubernetes.pod.status.phase), cluster) < 0.8
  1. This is an example of the notification email sent when the alert triggers. This will serve as the starting point of our investigation.


  2. From the notification, we know that the affected cluster is "mk" (this is specified in the Sources/Labels Affected field). Using the "Kubernetes Pods" out-of-the-box dashboard, we can filter for the "mk" cluster and quickly examine all the pod states in this cluster. This allows us to identify which pods are not in the running state. Locate the "Pods Pending" and "Pods Pending" charts on the dashboard.

    In this example, the alert was triggered because these pods are in the Pending State.

  3. From the previous chart, we notice that the pods are pending because 0/1 nodes are available and 1 node is unschedulable. However, why didn't the pods get scheduled on other nodes? Using the "Kubernetes Summary " out-of-the-box dashboard and filtering for the "mk" cluster, we find that this is because there is only one node in the cluster.


  4. With the out-of-the-box alerts and dashboards, we've been able to catch the fact that many pods were crashing. Very quickly, we were able to identify the affected cluster, the affected pods, and the affected node. We now know that the pods were crashing because the cluster only has one node and that node is unschedulable. Armed with this information, you can pinpoint your further investigation into why that node is unschedulable and determine the appropriate remediation:
    • Is there an upgrade underway?
    • Is there hardware replacement in-process?
    • Why does this cluster have only a single node?

Learn More

If you would like to learn more about manual installation of the integration or making customizations to the integration, see Part II.
Was this article helpful?
0 out of 0 found this helpful
Have more questions? Submit a request