This article applies to:
• Visualizing Data/Querying
• Product edition: Current
• Feature Category: Dashboards/Charts
Overview:
This article describes how to monitor and investigate metrics that have the potential to cause performance issues.
Typically quick bursts of high cardinality metrics would be cause for investigation. Using metric names, source names or point tags to store high-cardinality data like timestamps, web session ids, login ids etc., can eventually cause performance issues when querying the data.
Wavefront Usage alerts.
During ingestion, Wavefront assigns an ID to each newly added metric name, span name, source name, and key=value string of a point tag or span tag. A new ID generally indicates that a new time series has been introduced. The Wavefront usage integration includes predefined alerts to monitor new metrics being sent, which may include metrics with potentially problematic data quality.
To install the predefined Wavefront usage alerts, click the Install All button under the Wavefront usage integration tab. After the alerts are installed an alert target need to be assigned to receive notification of the alert being triggered. In order to make any changes to the predefined alerts, they need to be cloned first.
High rate of host IDs observed:
Predefined alert that is designed to keep track of any bursts of new source IDs. The newly created source IDs may be due to an expected flow in the metrics pipeline, (new containers spinning up etc.), or due to a configuration issue that needs investigation.
High rate of metric IDs observed:
Predefined alert that is designed to keep track of new metric IDs sent in per minute over a 10 minute moving average. See example of data shape below that would trigger this alert. The metrics shown, have a session id included in the metric name; this doesn't add any value as it is essentially in the format of an event instead of a continuous time series. The querying of this type of metric data would require a wildcard(*) and could experience performance problems due to the poor quality data shape.
"http.client.requests.clientName.api-test.us-west-2.aws.method.GET.outcome.SUCCESS.status.200.uri.-v2-order_carts-cc1d3049-c318-4db2-93b7-4c473a5f-delivery_times-.count_95 2669831779"
"http.client.requests.clientName.api-test.us-west-2.aws.method.GET.outcome.SUCCESS.status.200.uri.-v2-order_carts-c9ecefad-79d1-4f87-8f25-a7b79976-.upper 2669831781"
High rate of string IDs observed:
Predefined alert that is designed to keep track of new string IDs or point tags. A high rate of new point tags added to a metric, if left unchecked, can have a detrimental affect on the performance of the queries associated with the metric. See example below, which has a string of characters in the point tag, which would quickly ramp up cardinality.
Point Tag: com.docker.swarm.task.name=aci-665-amp.1.dawqwykkaldwm62tekj675ut8a 1610985074
Point Tag: com.docker.swarm.task.name=aci-665-amp.1.i5dsafbi9capxsf76z6zy877i 1610985074
Spying new IDs
The Wavefront instance includes an HTTP endpoint that can be leveraged to provide a window into the current stream of new IDs. Using this endpoint can quickly expose problematic new ID data points.
To get a list of new ID assignments, open a browser and use the following endpoint; replace <cluster> with the name of the Wavefront instance: https://<cluster>.wavefront.com/api/spy/ids
The Api Spy will sample the new ID assignments and print the metrics associated with them out on screen. See Api Spy documentation for further options.
Additionally we can utilize another utility Wftop that once downloaded and installed locally to a machine, will also allow for ID exploration. Please see the Wftop documentation for pre-requisites, steps and details of installation.
Wftop can also be used to spy on the new IDs being ingested. Once connected to the instance with wftop, change the configuration to spy on 'Id Creations' with a sampling rate of 1.0 (100%) and then select one of the metric, host, histogram or span types and the new IDs will be displayed in the Wftop pane.
For further details and information please see:
Comments