Show HN: You don't need to adopt new tools for LLM observability

github.com

102 points by tomerf2 2 years ago


If you've built any web-based app in the last 15 years, you probably used something like Datadog, New Relic, Sentry, etc. to monitor and trace your app, right?

Why should it be different when the app you're building happens to be using LLMs?

So today we're open-sourcing OpenLLMetry-JS. It's an open protocol and SDK, based on OpenTelemetry, that provides traces and metrics for LLM JS/TS applications and can be connected to any of the 15+ tools that already support OpenTelemetry. Here's the repo: https://github.com/traceloop/openllmetry-js

A few months ago we launched the python flavor here (https://news.ycombinator.com/item?id=37843907) and we've now built a compatible one for Node.js.

Would love to hear your thoughts and opinions!

Check it out -

Docs: https://www.traceloop.com/docs/openllmetry/getting-started-t...

Github: https://github.com/traceloop/openllmetry-js https://github.com/traceloop/openllmetry

marcklingen - 2 years ago

Fully agree - even as a founder of an ‘LLM observability company’. Observability does not need to be reinvented to get detailed traces/metrics/logs of the LLM part of an application.

LLM Observability usually means: prompts and completions, which model was used, errors and exceptions (rate limits, network errors), as well as metrics (latency, output speed, time to first token when streaming, USD/token and cost breakdowns). All of this is well suited to be captured in the existing observability stack. OpenLLMetry makes this really easy and interoperable - chapeau.

In my view, observability is not the core value that solutions like Baserun, Athina, LangSmith, Parea, Arize, Langfuse (my project) and many others solve for. Developing a useful LLM application requires iterative workflows and tinkering. That's what these solutions help with and augment.

There are specific problems to building an LLM application such as managing/versioning of prompts, running evaluations, blending multiple different evaluation sources, collecting datasets to test/benchmark an application, helping with fine-tuning models on high-quality production completions, debugging root causes of quality/latency/cost issues, ...

Most solutions either replicate logs (LLM I/O) or traces at first, as they are a necessary starting point to then build solutions for the other workflow problems. As the observability piece gets more standardized over time, I can see how integrating with the standard makes a ton of sense. Always happy to chat about this.

epistasis - 2 years ago

I was looking to see what the actual metrics would be for a completion, to see if this is something of interest to me. So I tried to run the example here:

https://www.traceloop.com/openllmetry

Problem 1 (very minor): it's missing an `import os`

Problem 2: I need an API key.

Problem 3: The link that it tells me to go to for an API key is malformed: https://https//app.traceloop.com/settings/api-keys

Is there a way to see what the output is like without getting an account, and presumably also connecting to an observability platform like Grafana? I already made a venv and installed the package, so I'm not sure if I'm ready for even more steps just to see if this is something that might be useful to me.

Aqueous - 2 years ago

I thought Observability in this context means the ability to introspectively make sense of why the LLM output what it did, which is a difficult problem because the model parameters are effectively an unintelligible morass of numbers. Does this help with that and if so how?

a_wild_dandan - 2 years ago

What problem(s) does this solve? I have a ticket in my backlog. Your SDK unlocks the solution. What is that ticket's title? (I'm a bit thick, and need concrete examples for things to click.)

lmeyerov - 2 years ago

Re:python, if we are already doing otel, how would this interop? Eg, if we don't want to break our current imports, and control where the new instrumentation goes

(Fwiw, This is a great direction!)

tomgs - 2 years ago

Cool! Two questions:

1. Where do you see this observability for LLM thing going? What's the end game? Is it like in traditional observability where all formats eventually will converge to one format (which OpenTelemetry is trying to be)? I feel it might be a little bit early to tell, tho

2. I noticed you do auto-detection of the framework used, like LLamaIndex et al. Except for annotations, is there a deeper connection to the LLM framework used? This is auto-instrumentation, so I assume you do most of the heavy lifting, but should users of this framework expect some cool hidden eggs when they look at their telemetry?