Debugging Open Telemetry

Published by Bill on (Updated: )

Debugging Open Telemetry

I wanted to send metrics from an application platform to an observability solution using Open Telemetry (OTel). I believed everything was set-up but metrics weren’t flowing from the platform to our observability tooling.

Diagram showing the components in my Open Telemetry set-up and a cross indicating where we lost sight of metrics.

This post outlines how I was able to replace our observability tooling with a second instance of the OTel collector and use it to debug the flow of metrics.

My Approach

I wanted to rule out the platform as the source of the problem. If I could show that metrics were leaving the platform as expected then I’d know that the problem lay in the observability toolchain.

To do this, I added a second instance of the OTel collector in place of my observability tool chain. This collector was initially configured to log all metrics to a file for debugging.

Diagram showing the components in my OTEL set-up. The components in green are described in this post.

Having metrics logged to a file like this was really useful to be able to demonstrate that metrics were flowing and give the observability team a sample of metric contents.

It was then possible to configure the OTel collector to send metrics to our observability toolchain alongside writing them to the file before turning off file writing when we had confined metrics were reaching the intended destination.

I had assumed that the Open Telemetry collector was only used at the start of a metrics pipeline. However, the flexibility of the OTel collector means it is a great tool to have in your toolbox for debugging metric flows. It can be used as a tap at any point in the chain to confirm that metrics are flowing and that they have the desired format.

OTEL Configuration - receive

This is the configuration I used to listen for incoming OTel metrics (without TLS) and log them to a file.

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: '10.20.30.40:4317'

processors:
  batch: null

exporters:
  file:
    path: /tmp/otel.json

service:
  pipelines:
    metrics:
      receivers:
        - otlp
      processors:
        - batch
      exporters:
        - prometheus

OTEL Configuration - send

This is the ‘exporters’ configuration I passed to our application platform. The endpoint here refers to the IP address and port of your OTEL collector (receive).

otlp:
  endpoint: '10.20.30.40:4317'
  tls:
    insecure: true

Generating Metrics

If you need a tool to generate sample metrics for testing your pipeline, I recommend looking at a tool called telemetrygen. Part of the Open Telemetry Collector project, telemetrygen is a utility for generating traces, metrics, and logs and submitting them to the OTel collector.

telemetrygen metrics --otlp-endpoint 10.20.30.40:4317 --otlp-insecure

Useful Documentation