Where's My Data? Troubleshooting for Missing Data

This article applies to:

Browsing Data/Querying

Product edition: All

Feature Category:  Query


Overview:


On occasion, you may expect to see certain data in Tanzu Observability but, for some reason, it doesn't show up!  This can be a frustrating and confusing moment when you require the data quickly. Tanzu Observability retains raw data for 18 months and doesn't otherwise delete data. Why else might data not show up?

To further investigate and understand on the possible causes and scenarios for which data may not be present, please review the topics found with this knowledge base article. 

 

Contents

Incorrect Time Window

While this may seem elementary, it sometimes happens that the chart window is not set to the proper window for which the expected data is timestamped. Often, this is a result of mis-timestamped data. Try expanding the time window further out and see if the data of interest shows up.

 

Obsolete Data

Time series that have not had any data points ingested in 28 or more days are categorized as "obsolete". This means that this data will not, by default, show up in charts. If you are viewing a chart in a time window that is 28 or more days in the past, try checking off the Include Obsolete Metrics checkbox in the Advanced settings tab of the chart.

 

Screen_Shot_2021-03-04_at_5.04.34_PM.png

 

See Where is My Old Data? for more details and tips.

 

Sampling or Filtering is Enabled

Both the UI and the query language itself have options for sampling and filtering data. It's possible that the data you are expecting to see is being sampled or filtered.

Sampling

  • The UI includes toggles for applying sampling to the data displayed on charts. This helps with making data easier to view in situations where there are many different time series. To check if Sampling is happening, look for any of the following:
      • Under the Chart setting tabs:mceclip0.png

        At the top right corner when viewing a chart:
        mceclip1.png

 

  • Query Functions
    The Tanzu Observability Query Language includes a variety of functions, some of which can be used to return a sampling of data, rather than all underlying data. Check if functions like the following are applied to the query:
      • downsample
      • limit
      • random
      • sample

Filtering

  • User Interface
    The UI includes functionality for filtering the data that is displayed on charts. To check if filtering is happening, look for something similar to the following at the top of charts or dashboards:
    mceclip2.png

  • Query Functions
    The Tanzu Observability Query Language includes a variety of functions, some of which can be used to filter the data that is returned and displayed on the chart. Check if functions like the following are applied to the query:
      • align
      • bottom
      • bottomk
      • filter
      • globalFilter
      • lowpass
      • highpass
      • retainSeries
      • removeSeries
      • top
      • topk

 

 

Delayed Data

One of the most common reasons data may not appear when expected in Tanzu Observability is ingestion delays. In general, ingestion delays result in data eventually showing up in Tanzu Observability. However, due to the delays, there will be a discrepancy between the timestamp of the data point and when the data point is actually visible on a chart. For example, a data point may be timestamped for 12:00:01 UTC but it may not show up on charts until 12:02:35 UTC due to delays in the ingestion process. There are several possible causes for ingestion delays:

Proxy Queues

When queues are built up at the Tanzu Observability Proxy, the data in those queues will be ingested with a delay. This is because the Proxy prioritizes live, incoming data and processes data in the queues (backlog) when possible. There are several reasons for queues at the Proxy and the Queuing Reasons chart in the Wavefront Service and Proxy Data dashboard is especially helpful for identifying the cause at different points in time:

  • Pushback from Backend
  • Proxy Rate Limit
  • Bursty Data
  • Memory Buffer Overflow
  • Network Latency
  • Memory Pressure


    -Rate of data ingestion is higher than backend limit -

    The backend limit is typically set in relation to the commit rate specified in your contract. When you attempt to ingest data at a higher rate than the backend limit, the backend will "push back". This results in data getting queued up at the Proxy.

    Troubleshooting & Further Investigation:

    Look for "pushback" in the Queuing Reasons chart. In addition, utilize the query in the Data Ingestion Rate (Points) chart of the Wavefront Service and Proxy Data dashboard to keep track of your ingestion rate and ensure that it is within contractual limits to avoid overages. It is possible to also request that the backend limit be raised however, this may result in overages.



    -Rate of data ingestion is higher than the rate limit set at the Proxy -

    If you don't already have access to the Proxy configuration, the Proxy Rate Limiter Active chart in the Wavefront Service and Proxy Data dashboard will provide insight into whether the rate limit at the Proxy is being breached. If it is, data will be queued up whenever the rate limit is exceeded.

    Troubleshooting & Further Investigation:
    Confirm whether the Proxy's rate limit configuration (pushRateLimit) is needed and if so, look into ways to reduce your data rate. Keep track of the ingest rate at the Proxy in question using the Received Points/Distributions/Spans Per Second charts in the Wavefront Service and Proxy Data dashboard. You can also use the Filter feature at the top of each dashboard or chart or specifying a specific source name in the underlying queries to filter for the specific Proxy you are interested in.


    - Bursty data -

    This can be related to either of the two points above. If your rate of data is very bursty, you may also experience queueing. "Burstiness" means that data is sent in bursts rather than being sent evenly over time. For instance, the average PPS (points per second) over a minute may be 1000. That could be the result of 1000 data points sent for each of the 60 seconds within that minute. But, that could also be the result of 60,000 data points sent at one particular second within that minute and no data sent for the rest of the minute. Since rate limits are set assuming a steady rate, that burst of 60,000 PPS for that one second will result in data being queued.

    Troubleshooting & Further Investigation:
    The Received Points/Distributions/Spans Max Burst Rate (top 20) charts in the Wavefront Service and Proxy Data dashboard provides insight into the burstiness of your data rate. The queuing ability of the Proxy normally helps smooth out the data rate through momentary queuing. However, if you find that the Proxy queues sustain and continue to grow, the overall data ingest rate is too high. You can either reduce the ingest rate or request that the backend limit be raised (possibly resulting in overages).


    - The rate of data ingestion is so high that the memory buffer fills too quickly -

    The Proxy can hold a certain number of data points in memory (set through the pushMemoryBufferLimit property) before having to store data points on disk. As data in the memory buffers is processed, space is freed up for new incoming points. When the rate of ingest is so high that the buffer fills up more quickly than it is being worked down, more and more data points will get queued up. 

    Troubleshooting & Further Investigation:
    Look for "bufferSize" in the Queuing Reasons chart. Consider lowering the rate of ingest or distributing the load among several Proxies. It is not typically necessary to adjust the pushMemoryBufferLimit Proxy property. However, if you choose to do so, understand that raising this value results in higher memory usage while lowering this value results in more frequent spooling to disk.


    - Network issues resulting in slowness sending data to the backend -

    Queues can also fill up when the rate of ingest is higher than the rate at which data is sent to the backend.

    Troubleshooting & Further Investigation:
    Look at the Network Latency charts in the Proxy Troubleshooting section of the Wavefront Service and Proxy Data dashboard. The underlying metric tracks the amount of time from when the Proxy sends out a data point to when it receives an acknowledgment from the backend. This should typically in the range of hundreds of milliseconds. Once it reaches the range of seconds, there may be a network latency issue. 


    - Memory is running low on Proxy host -

    The Proxy has a property (memGuardFlushThreshold) that can be set to protect against out of memory situations. If heap usage on the Proxy exceeds this threshold, data in memory will be flushed to disk, thereby, being queued.

    Troubleshooting & Further Investigation:
    Look for "memoryPressure" in the Queuing Reasons chart. Consider increasing memory limits for the host server.

 

Data Pipeline

If your data travels through a pipeline before reaching the Tanzu Observability Proxy or before being direct ingested to the Tanzu Observability backend, the pipeline itself may introduce delays to the ingestion process. 

Troubleshooting & Further Investigation:
One area that may provide a clue is looking at the Data Received Lag charts in the Proxy Troubleshooting section of the Wavefront Service and Proxy Data dashboard. This helps if the data points are timestamped at or near the source of the data. The underlying metric used in these charts tracks the difference between the system time of the Proxy host and the timestamp of data points. This difference could provide some insight into how long it takes a data point to traverse through the data pipeline and reach the Proxy. Every pipeline will inherently have its own latencies. Understanding this will help with expectations on when data should show up in charts. It will also help with crafting queries so that this latency is taken into account.

 

High Rate of New IDs

Components of each data point are converted into IDs at the backend prior to persistence in storage. These components include metric name, source name, and the point tag key and value combination. Whenever a new name is detected by the backend, a new ID will need to be generated. This adds to the ingestion time. This is negligible when the rate of new IDs is relatively low. However, when the rate is high, this can lead to a backlog of items that need an ID. This backlog results in delays in ingestion.


Troubleshooting & Further Investigation:
The Wavefront Usage integration includes several alert examples that can be used to catch when there is a high rate of new IDs. A high rate of new IDs could indicate a cardinality issue with your data shape. For instance, if a timestamp was included as a point tag, this would lead to a high number of unique point tags, inflating the cardinality of the applicable timeseries. This would be problematic when it comes time to query that data. See Tanzu Observability Data Naming Best Practices for best practices.

 

Blocked Data

Data may be blocked at the Proxy or at the backend for a variety of reasons. When this happens, the data is dropped and will not be ingested. If data is blocked at the Proxy, the Proxy log will include a message indicating the reason.


Incorrect Timestamps

By default, the Proxy allows data points that are timestamped between 8760 hours (1 year) ago and 24 hours (1 day) ahead of the current time. This functionality is to allow scenarios where it is necessary to back-fill old data or pre-fill data. These settings can be changed although that is not common. Make sure that the timestamp of your data points is within this range since anything outside this range will be rejected at the Proxy.

Similarly, the backend has these settings as well, and any data points outside this time range will not be ingested.

 

Invalid Data Format

The Proxy supports a variety of data formats. Typically different ports would be set up to support different formats. Ensure that data is being sent to the proper port.

For data that is in the Tanzu Observability Data Format, see this page for more information on what is and is not valid. To summarize, each component of the data point has a set of allowed characters and length limits. There is also a limit, by default, of 20 point tags per data point.

 

Proxy Preprocessor Rules

The Proxy supports setting up custom preprocessor rules to allow or block certain data. Ensure that your data meets all the rules set-up at the Proxy. You may need to reach out to the team that manages the Proxy and/or those rules.

 

 

Was this article helpful?
0 out of 0 found this helpful
Have more questions? Submit a request

Comments

Powered by Zendesk