Why Logs, Metrics and Traces Still Don’t Give You Real Observability

Several years ago, the observability community reached what felt like a consensus: The three pillars — logs, metrics and traces. Instrument everything, ship it all to a central platform and you will finally understand what your system is doing.

It’s a tidy framework. Yet it turns out to be incomplete in ways that only become obvious once you’re actually trying to debug a production incident with it.

This article isn’t an argument against logs, metrics and traces; you need all three. However, there’s a growing set of failure modes in modern distributed systems that the three-pillar model struggles to explain — and understanding why is the first step toward building observability that actually works.

The Promise of the Three Pillars

Before we critique the model, let’s be precise about what it promises.

Logs give you a timestamped record of discrete events: A function was called, a request came in, an error was thrown. They’re rich in detail and easy to add to code. The challenge is volume — a high-traffic service can generate millions of log lines per minute, and correlating across services requires discipline and tooling.

Metrics give you aggregated numerical data over time: Request rate, error rate, latency percentiles, CPU usage. They’re cheap to store, easy to alert on and ideal for dashboards. The tradeoff is that aggregation loses information — a p99 latency of two seconds tells you something is slow, but not where or why.

Traces give you a causal record of how a single request moved through your system — which services it touched, how long each hop took, where errors occurred. Distributed tracing, using standards like OpenTelemetry, has matured considerably and can dramatically accelerate root cause analysis.

Together, these three tools cover a lot of ground. For the canonical failure modes — a slow database query, a misconfigured cache, a crashing pod — they work well. The question is what happens when the failure mode is less canonical.

What the Three Pillars Miss

The Known Unknowns Problem

Observability built on logs, metrics and traces is fundamentally a system of known unknowns. You instrument the things you think might go wrong. You define the metrics that seem important. You add trace spans around the code paths you care about.

However, production systems fail in ways you didn’t anticipate. When a new failure mode appears that doesn’t match your existing instrumentation, you’re blind. You’re correlating signals that weren’t designed to explain this kind of problem, often in a rush, in the middle of an incident.

The classic example: A subtle interaction between two microservices that each look perfectly healthy by their own metrics but are producing subtly wrong outputs when used together. No individual metric captures this. The logs from each service look normal. The traces show normal latencies. The failure is in the relationship between services, not in any one service’s behavior.

High-Cardinality Blind Spots

Traditional metrics systems have trouble with high-cardinality data. If you want to track latency by user ID or error rate by specific combination of API endpoint + region + tenant, most time-series databases either refuse or make it prohibitively expensive.

This is a real operational gap. Many production incidents are caused by a specific cohort of users, a specific region or a specific combination of request parameters that you can’t efficiently query for with low-cardinality metrics. You know something is wrong. You can see the aggregate signal degrading. But you can’t slice to the exact population affected.

Tools such as Honeycomb and Lightstep were built specifically around solving this with high-cardinality event data — each request as a structured event with arbitrary fields. This is a fundamentally different mental model than metrics-first observability, and it enables queries you simply cannot run with traditional tools.

The Context Collapse Problem

Traces give you a picture of how a request flows through your system. However, that picture is only as good as the context propagated through it. In practice, context collapse is endemic:

Asynchronous message queues break trace continuity unless you explicitly propagate trace IDs in message payloads.

Third-party services you call don’t participate in your tracing infrastructure.

Batch jobs and background workers often lack the proper instrumentation to connect to the traces that triggered them.

Lambda functions and serverless runtimes have historically had spotty tracing support.

The result is that your traces often have gaps right where you need them most. You can see the request arrive and you can see the response go out, but what happened in between — especially anything involving async work — is a black box.

Business Logic Blind Spots

Perhaps the deepest limitation of the three-pillar model is that it’s primarily an infrastructure observability model. It tells you about the technical behavior of your system: Latencies, error rates, resource consumption.

What it doesn’t tell you is whether your system is doing the right thing from a business perspective. A service can have perfect error rates and excellent latency while returning subtly wrong results — prices calculated incorrectly, recommendations served to the wrong users, inventory counts that don’t match reality.

This is the gap that the emerging concept of semantic observability is trying to fill: Instrumenting the business outcomes, not just the technical behavior, so you can detect this is producing wrong results rather than just this is slow or erroring.

What Real Observability Looks Like

Moving beyond the three-pillar model doesn’t mean abandoning it. It means being honest about its limits and layering additional practices on top.

Start With Structured Events

Wherever possible, emit structured events — JSON objects with meaningful fields — rather than unstructured log strings. This is the difference between:

[ERROR] Failed to process order 12345

and:

{

“level”: “error”,

“event”: “order_processing_failed”,

“order_id”: “12345”,

“user_id”: “u-789”,

“region”: “us-east-1”,

“payment_method”: “stripe”,

“error_code”: “STRIPE_TIMEOUT”,

“duration_ms”: 4823,

“trace_id”: “abc-def-ghi”

}

The second version is queryable in ways the first is not. You can ask “how many Stripe timeouts happened in us-east-1 in the last 10 minutes, broken down by payment method?” You cannot meaningfully answer that question from unstructured log strings at scale.

OpenTelemetry is the obvious standard to adopt here — it gives you a vendor-neutral way to emit logs, metrics and traces in a consistent format that works with tools from Grafana, Datadog, Honeycomb, Jaeger and many others.

Embrace High-Cardinality Querying

If your current observability stack can’t answer “show me p95 latency broken down by user plan tier and API endpoint for the last five minutes,” you have a meaningful blind spot. Evaluate whether your tooling supports this.

If it doesn’t, it’s worth looking seriously at event-based observability platforms — or at minimum, adding a tool like ClickHouse as a backend for high-cardinality queries on your structured event data.

Build Continuous Verification Into Your Pipelines

One of the most underused observability practices is continuous verification: Running lightweight synthetic checks in production that validate business correctness, not just technical health. Can a user complete checkout end to end? Is the price returned for product X correct? Is the recommendation engine returning results that match our expected logic?

Tools such as Steadybit, Gremlin and even custom health checks can serve this purpose. The goal is to detect wrong results before your users do.

Instrument for the Incident you Haven’t had Yet

After every production incident, do a deliberate review: What information would have made this faster to diagnose? Then add that instrumentation. This is a mundane practice with large compounding returns. Teams that do it consistently become visibly better at debugging over 12–18 months.

The Tooling Landscape

A few tools worth knowing about beyond the standard trio:

Tool	What it’s Good For
Honeycomb	High-cardinality event exploration, fast arbitrary slicing
Grafana Tempo	Scalable distributed tracing, integrates well with Loki + Prometheus
OpenTelemetry Collector	Vendor-neutral telemetry pipeline
Jaeger	Open-source distributed tracing
ClickHouse	Fast analytical queries on high-volume structured events
Robusta	Kubernetes-native alert enrichment and runbooks

None of these replace logs, metrics and traces. All of them extend what’s possible with them.

Conclusion

The three-pillar model is a good starting point, not an ending point. It’s a framework developed at a time when microservice architectures were simpler, cardinality was less of a concern and business logic observability wasn’t part of the conversation.

Modern distributed systems are messier, more dynamic and more business-critical. The observability practices that serve them well need to be messier too — high-cardinality, event-first, contextually rich and explicitly coupled to business outcomes.

If your team can answer the question “Did the system do the right thing?” and not just “Did the system stay up?”, you’re getting close to real observability. If you can only answer the second question, there’s work left to do.

Why Logs, Metrics and Traces Still Don’t Give You Real Observability

From AI Hype to AI Assurance: How Engineering Teams Can Safely Ship AI-Enabled Software

GitHub’s Redesigned PR Inbox Tackles the Review Bottleneck AI Created

IBM Bob Gets Multi-Agent Muscle and a Cost Dashboard for Enterprise Coding

Why Logs, Metrics and Traces Still Don’t Give You Real Observability

The Promise of the Three Pillars

What the Three Pillars Miss

The Known Unknowns Problem

High-Cardinality Blind Spots

The Context Collapse Problem

Business Logic Blind Spots

What Real Observability Looks Like

Start With Structured Events

Embrace High-Cardinality Querying

Instrument for the Incident you Haven’t had Yet

Conclusion

Related Posts

From AI Hype to AI Assurance: How Engineering Teams Can Safely Ship AI-Enabled Software

GitHub’s Redesigned PR Inbox Tackles the Review Bottleneck AI Created

IBM Bob Gets Multi-Agent Muscle and a Cost Dashboard for Enterprise Coding