Log messages are mostly for the people operating your software

utcc.utoronto.ca

115 points by todsacerdoti 5 days ago


LgWoodenBadger - 10 hours ago

All software should provide something meaningful for anybody to diagnose, if they’re inclined to. It’s particularly bad in the (Apple) mobile ecosystem, including AppleTV.

I have AdGuard Home but one of my spouse’s streaming services wouldn’t work. “There was a problem.” Gee thanks. Eventually figured out that I had to unblock a few hosts so it would work. Only found which ones by googling and finding some other poor soul who fixed it and documented it.

foresto - 4 hours ago

Rather than indulging the inevitable argument that most users never read log messages, I hope we can remember a more important fact:

Some users do read log messages, just as some users file useful bug reports. Even when they are a tiny minority, I find their discoveries valuable. They give me a view into problems that my software faces out there in the wilds of real-world use. My log messages enable those discoveries, and have led to improvements not only in my own code, but also in other people's projects that affect mine.

This is part of why I include a logging system and (hopefully) understandable messages in my software. Even standalone tools and GUI applications.

(And since I am among the minority who read log messages, I appreciate good ones in software made by other people. Especially when they allow me to solve a problem immediately, on my own, rather than waiting days or weeks get the developer's attention.)

hinkley - 6 hours ago

For years now I’ve been pushing for moving of all non actionable error messages and all aggregate-actionable error messages into telemetry data instead.

Not the least of which because log processing SaaS companies seem to be overcharging for their services even versus hosted Grafana services, and really many of us could do away with the rent seeking entirely.

The computational complexity of finding meaning in log files versus telemetry data leans toward this always being the case. It will never change except in brief cases of VC money subsidizing your subscription.

If an error shouldn’t trigger operator actions, but 1000 should, that’s a telemetry alert not a data dog or Splunk problem.

keithnz - 6 hours ago

From recent experience, I'm thinking logs need to be written for AI. Over the last few months, I've had a couple of issues where I took a bunch of logs from a bunch of interacting programs, pointed the AI at the logs and the source code and it's been really effective and finding the problems, often seeing patterns that would have been really hard for me to spot in all the noise.

nubinetwork - 6 hours ago

I recently went all-in on the systemd ecosystem as much as I could on some recent hardware installs, and my biggest pet peeve is the double timestamps and double logs I find in journalctl... it's like they never intended you to read the logs...

thangalin - 9 hours ago

Of possible interest:

* https://dave.autonoma.ca/blog/2022/01/08/logging-code-smell/

* https://dave.autonoma.ca/blog/2026/02/03/lloopy-loops/

Both of these posts discuss using event-based frameworks to eliminate duplicative (cross-cutting) logging statements throughout a code base.

My desktop Markdown editor[1], uses this approach to output log messages to a dialog box, a status bar, and standard error, effectively "for free".

[1]: https://repo.autonoma.ca/repo/keenwrite/tree/HEAD/src/main/j...

Etheryte - 10 hours ago

While I see the point the author is trying to make, I'm not really sure I agree. Most users don't even read error messages, never mind logs. At best, logs are something they need for compliance, for most, the concept doesn't exist at all. I do agree that the logs should help you understand what went wrong and why, but in that regard the principle is the same for both sysadmins and developers and I don't really see the difference?

majkinetor - 9 hours ago

Any group of people is target of specific log level. INFO for random folks, DEBUG for programmers etc.

hinkley - 6 hours ago

Well the useful ones are. The rest are screaming into the void, or rather, the operator’s ear.

mfuzzey - 9 hours ago

Depends a lot on the context and type of software.

For server side software where there is a sysadmin in charge of keeping it running I generally agree.

But for end user software (desktop, mobile, embedded) no one wil read the logs and there the logs can, and probably should, be aimed at the developers. Of course you can and should still provide usable and informative end user oriented error messages but they're not the same thing as logs

brianjlogan - 3 hours ago

Not really true for modern cloud architectures. If you have an appropriately tuned Observability stack you're probably pretty familiar with the logs.

justsomehnguy - 8 hours ago

> But if your software is successful (especially if it gets distributed to other people), most of the people running it won't be the developers, they'll only be operating it.

The biggest problem is what when you wrote a code for a 'totally obvious message' you yourself was in the context. Years, year, heck even weeks later you would stare at it and wonder 'why tf I didn't wrote something more verbose?'.

Anecdote: I wrote some supporting scripts to 'integrate' two systems three times - totally oblivious the second and the third times what I already did it. Both times I was somewhere 60% when I though 'wait I totally recognize this code but I just wrote it! What in Deja-vu-nation?!'.

ignoramous - 10 hours ago

For a FOSS Android app I co-develop, we go out of our way to make verbose logging efficient to collect & easy to share (one-click copy). I've seen users get good mileage out of asking an LLM just what has gone wrong. We are adding more structure to log messages and add in as much state (like callstack) as possible with each log line, and diagnostics from procfs on resources held (like memory, threads, fds).

jackfranklyn - 10 hours ago

[dead]

naomi_kynes - 10 hours ago

The interesting edge case with AI agents: the "operator" collapses into whoever owns the agent, and the log's job changes fundamentally.

When a regular app logs an error, it's a passive record — the operator investigates at leisure. When an agent logs "I'm about to delete these 47 files — is that right?", it's an active interrupt. The log becomes a decision request, not an event record. "Waiting for human approval" is a semantically different thing than "ERROR: something failed."

Most agent setups treat this badly — write to stderr, fire a webhook, hope the human checks Slack. There's no canonical "agent pausing for human input" primitive in most stacks. It's logging's open problem for the agentic era.

jauntywundrkind - 8 hours ago

This is a not-so subtle advantage JavaScript has over 90% of everything else: Chrome DevTools Protocol (CDP), which exists/is-great in-large-part thanks to JavaScript being an alive language. Of the Stop Writing Dead Programs variety (https://jackrusher.com/strange-loop-2022/, https://news.ycombinator.com/item?id=33270235). It's just astoundingly capable, so very richly exposes such a featureful runtime, across so many dimensions of tooling. REPL, logging, performance, heap, profile, storage, tracing and others, just for the core, before you get into the browser based things. https://chromedevtools.github.io/devtools-protocol/

This is such a core advantage to javascript: that it is an alive language. The runtime makes it very easy to change and modify systems ongoingly, and as an operator, that is so so so much better than having a statically compiled binary, in terms of what is possible.

One of my favorite techniques is using SIGUSR1 to start the node debugger. Performance impact is not that bad. Pick a random container in prod, and... just debug it. Use logpoints instead of breakpoints, since you don't want to halt the world. Takes some scripting to SSH port forward to docker port forward to the container, but an LLM can crack that script out in no time. https://nodejs.org/en/learn/getting-started/debugging#enable...

My cherry on top is to make sure the services my apps consume are attached to globalThis, so I can just hit my services directly from the running instance, in the repl. Without having to trap them being used here or there.

woeirua - 9 hours ago

I feel like this is an outdated point of view now. Logs are clearly going to be read primarily by agents very soon, if they're not already now.

For example, we're experimenting with having Claude Desktop read log files for remote users. It's often able to troubleshoot and solve issues for our users faster than we can, especially after you give it access to your codebase through GH MCP or something like that. It's wild.