Often times your team leads and colleagues at Wavefront will recommend that you ‘understand your data shape’ to craft effective alerts and visualizations. This article explains in detail what is fundamentally meant by understanding your data shape.
To understand your data shape you need to investigate the following:
- What is the reporting interval of your metrics and does it change?
- Is there a lag in your metrics reporting?
- Are the metrics backfilling?
- Are your metrics gauges or counters?
Most of the times when you land on Wavefront or are querying your metrics, you start with a 2 hour time window and a ‘line plot’. This by all means is a great view of your recent received metrics but, does not clearly show the underlying live behavior of your metrics.
To get a clear view of your data shape, we suggest that you change the chart type to ‘point plot’ and time window to ’10 minute’ in a ‘live’ view. In a point plot, each and every point that you see is a metric in its raw form as it was received. In a 10 minute window, under most scenarios, your points will be separate enough that you can identify the different times they were reported at on the x-axis. Live view shows you as the metrics come in live.
What is the reporting interval of your metrics and does it change?
Now that we are in a 10 minute live view window with a ‘point plot’ selected, we can determine what the reporting interval is. The reporting interval is simply the gap between the reported metrics. Hover over each point to determine the time and the associated gap between the metrics. Keep in mind that the gap may look different if advanced functions are applied in the query, the suggestion is to look at the raw metric. For example, if your query is align(5m, sum(ts("my.metric"))), then you want to examine the raw metric, or just ts("my.metric").
In most use cases and by default, Wavefront expects the points to be separated by 1 min. This means that Wavefront expects a metric to come in each minute or your ‘reporting interval’ to be 60 seconds. However, there can be times when metrics are separated by more than a minute and 10 min live view of point plot will reflect that.
Here are the common reporting interval scenarios:
- Metrics are reported minutely
- Metric is reported more than once in a minute
- Metrics are reported at an interval of X minutes where X is more than 1 ( i.e 5 minutes)
- The reporting interval is not constant
Understanding your reporting interval helps you determine if you will need missing data functions to craft performant alerts and visualizations.
Is there a lag in your metrics reporting?
In a 10 minute live view, you will see your metrics come in to Wavefront in real time. Observe towards the right part of the chart, as the time window slides forward, you will notice a new metric come in. If your metrics have a reporting interval of 1 minute, the assumption normally is that your metric will come in the next minute. While true in most cases, it is not the reality in some cases which can often lead to misfiring alerts if not properly accounted for with functions.
As an example, let’s examine the following chart:
We can see the gap between the points is 1 minute but, in live view, we are not seeing the most recent metrics to be present. This means that there is a lag in the reporting of metrics. For performant alerting and dashboarding, user will need to take this lag in reporting into account. Wavefront has a set of ‘missing data functions’ to help in this scenario.
Are the metrics backfilling?
Let’s say there is a lag in your metrics coming in however, another aspect to note is when the metrics do come in, do they report for up to the current minute or are back filled up to a certain time.
As an example, let’s say that there is a delay in reporting of about 10 minutes and, when the metrics make it to Wavefront, they only come in for the first 5 minutes of that lag. In this scenario, your live metrics at the current minute would be missing.
This again needs attention while crafting alerts and visualizations and missing data functions will help.
Are your metrics gauges, counters or delta counters?
After considering the reporting interval and any potential delays in the metrics, the other aspect of determining the data shape is to understand what the metric values represent. There are 3 main types of metrics and following is a brief description of them:
Gauge is a metric where the value represents an actual value of the measurement such as memory used at a certain time or the CPU utilization at a specific time. The key to note with gauges is that the value can go up and down and usually is within a range.
Counter metric is a metric where the value is expected to be a cumulative of previous values. So, the value either increases or stays the same. There can be a counter reset which represents the value dropped to 0 and is starting to accumulate again from that point forward. As an example, think of system up time, the value of time will keep increasing until there is a system restart or ‘counter reset’.
Other examples would be the number of requests served or the number of errors received. With counter metrics, it is often beneficial to use functions such as rate(), ratediff() or mdiff() as they help calculate the change over time.
Delta counters are different from traditional counters as their value represents the delta or the change in value over a specific time. In Wavefront, Delta Counters bin to a minute timestamp and writes to the same bin are treated as deltas. These are helpful for calculating burst of events as there can be collisions if a traditional gauge or counter metrics are used.
To learn more about Counters and Delta Counters, please visit this doc.
To understand your metrics data shape, you need to understand the following aspects:
- The reporting interval of your metrics
- The reporting cadence of your metrics
- Understand if the metrics are current or are backfilling
- Understand if the metric is gauge, counter or delta counter
To get these answers, its helpful to view your metrics in a ‘point plot’ chart, in 10 minutes live view. Observe as the new metrics get reported and that will assist you with the understanding your metrics data shape.
Now that you understand your data shape and are concerned about why or why not your alerts are firing, here is a KB article for this topic.
If Wavefront documentation and/or the information included in this documentation did not answer your questions, then please reach out to Support for help