Skip to content

Observability at a High Level

Getting software deployed is not the end of the story.

A running system still needs to be understood.

Observability

That is where observability comes in. It is the broad practice of making a system understandable from the outside by exposing useful information about what it is doing.

When a system is live, we want to be able to answer things like:

  • Is the application running?
  • Is it healthy?
  • Is it responding slowly?
  • Are requests failing?
  • Are errors increasing?
  • What happened before the crash?
  • Which part of the system is having trouble?

At an introductory level, observability is commonly discussed in terms of:

  • logs — records of events and errors
  • metrics — counts, timings, memory usage, request rates
  • traces — request flow across systems and services
Black Box Deploys

Deploying an application without logs or metrics is like driving a car at night with a shattered windshield and the dashboard taped over. You might be moving, but you won’t know you have a problem until you hit the wall.

For this course, logs are the easiest entry point.

If a Node app is running in a Docker container and writes to standard output, we can inspect that output with tooling rather than guessing blindly.

That is a baby step into operational visibility, and it matters.


New Relic: What is Observability?

We see the building blocks, but what happens when they are missing? We need to look at why invisible failure is so punishing to development teams.