Debugging Open Telemetry
I wanted to send metrics from an application platform to an observability solution using Open Telemetry (OTel). I believed everything was set-up but metrics weren’t flowing from the platform to our observability tooling.
This post outlines how I was able to replace our observability tooling with a second instance of the OTel collector and use it to debug the flow of metrics.
My Approach
I wanted to rule out the platform as the source of the problem. If I could show that metrics were leaving the platform as expected then I’d know that the problem lay in the observability toolchain.
To do this, I added a second instance of the OTel collector in place of my observability tool chain. This collector was initially configured to log all metrics to a file for debugging.
Having metrics logged to a file like this was really useful to be able to demonstrate that metrics were flowing and give the observability team a sample of metric contents.
It was then possible to configure the OTel collector to send metrics to our observability toolchain alongside writing them to the file before turning off file writing when we had confined metrics were reaching the intended destination.
I had assumed that the Open Telemetry collector was only used at the start of a metrics pipeline. However, the flexibility of the OTel collector means it is a great tool to have in your toolbox for debugging metric flows. It can be used as a tap at any point in the chain to confirm that metrics are flowing and that they have the desired format.
OTEL Configuration - receive
This is the configuration I used to listen for incoming OTel metrics (without TLS) and log them to a file.
receivers:
otlp:
protocols:
grpc:
endpoint: '10.20.30.40:4317'
processors:
batch: null
exporters:
file:
path: /tmp/otel.json
service:
pipelines:
metrics:
receivers:
- otlp
processors:
- batch
exporters:
- prometheus
OTEL Configuration - send
This is the ‘exporters’ configuration I passed to our application platform. The endpoint
here refers to the IP address and port of your OTEL collector (receive).
otlp:
endpoint: '10.20.30.40:4317'
tls:
insecure: true
Generating Metrics
If you need a tool to generate sample metrics for testing your pipeline, I recommend looking at a tool called telemetrygen
. Part of the Open Telemetry Collector project, telemetrygen
is a utility for generating traces, metrics, and logs and submitting them to the OTel collector.
telemetrygen metrics --otlp-endpoint 10.20.30.40:4317 --otlp-insecure