Show HN: Okapi – a metrics engine based on open data formats

github.com

9 points by kushal2048 13 hours ago


Hi All I wanted to share an early preview of Okapi an in-memory metrics engine that also integrates with existing datalakes. Modern software systems produce a mammoth amount of telemetry. While we can discuss whether or not this is necessary, we can all agree that it happens.

Most metrics engines today use proprietary formats to store data and don’t use disaggregated storage and compute. Okapi changes that by leveraging open data formats and integrating with existing data lakes. This makes it possible to use standard OLAP tools like Snowflake, Databricks, DuckDB or even Jupyter / Polars to run analysis workflows (such as anomaly detection) while avoiding vendor lock-in in two ways - you can bring your own workflows and have a swappable compute engine. Disaggregation also reduces Ops burden of maintaining your own storage and the compute engine can be scaled up and down on demand.

Not all data can reside in a data-lake/object store though - this doesn’t work for recent data. To ease realtime queries Okapi first writes all metrics data to an in memory store and reads on recent data are served from this store. Metrics are rolled up as they arrive which helps ease memory pressure. Metrics are held in-memory for a configurable retention period after which it gets shipped out to object storage/datalake (currently only Parquet export is supported). This allows fast reads on recent data while offloading query-processing for older data. On benchmarks queries on in-memory data finish in under a millisecond while having write throughput of ~280k samples per second. On a real deployment, there’d be network delays so YMMV.

Okapi it is still early — feedback, critiques, and contributions welcome. Cheers !

shaheeraslam - 12 hours ago

This is amazing work man! Can we setup a demo with you guys.