Why Semantic Layers Matter (and how to build one with DuckDB)

motherduck.com

147 points by secondrow 2 days ago


btbuildem - a day ago

Impressive! An entire article about semantic layers, artfully avoids ever defining what a semantic layer is.

Let me take a swipe at it: a semantic layer helps express queries and their results in terms the end-consumers will care about / prefer to reason in, instead of whatever extremely correct and efficient atrocities the database nerds came up with.

Did I get that right?

sschnei8 - a day ago

I love a semantic layer as much as the next guy...

Pivoting a decent sized BI shop toward using one instead of splashing the same SQL all over the place is *tough*. It's one of those: "the analyst could have been building important report for director and you want them to create re-usable logic??? we'll do that later, get report done now. Just copy/paste that SQL over here"

This is how you end up with the the 1000 model, "the numbers don't match up", hot mess situations that gain momentum and are hard to slow down.

cool_dude85 - 5 hours ago

>defined once in a single source of truth

As one of the consumers of a "semantic layer" for many years now, I am firmly convinced that a "single source of truth" must either be useless or a lie.

Ok, the DBA has produced some joins that I can count up to decide how many "customers" we have. We immediately have the issue that a "customer count" from the semantic layer cannot always be the meaningful or relevant figure. In my experience, outside of the exllicit context it was written it, it cannot be the correct figure. So, I have my single source of truth customer count, but my revenue per customer needs to to use a different count that's slightly off. Another analyst needs to produce customer calls to our call center and that uses a slightly different definition. And so on, until the semantic layer is just a special database for pre-defined executive KPI dashboards and no more.

mritchie712 - a day ago

We built a transformation library[0] (think a simpler, more performant dbt) for duckdb and I'd really like to create a semantic layer as an extension for it at some point.

Limiting support to only duckdb would make some really useful features trivial to implement. e.g. duckdb has a `json_serialize_sql` function that would handle a lot of the tedious parts of building a semantic layer.

0 - https://github.com/definite-app/crabwalk

cryptonector - a day ago

Is a "semantic layer" nothing more than a fancy name for a SQL VIEW in a NoSQL?

aszen - a day ago

I like the idea of a semantic Layer but don't think defining it in yaml is the right way to go about it.

Semantic Layer needs proper language and tooling support which Malloy provides.

kovezd - a day ago

Nothing to do with linear, meaningful projections on embedding spaces, and everything to do with efficient maintenance of legacy data reporting systems.

whitten - a day ago

I think Common Logic ( https://en.m.wikipedia.org/wiki/Common_Logic - ISO/IEC 24707:2007) would be a good addition to any effort trying to add a semantic layer to any database.

This is a good write up that doesn’t require DuckDB as it isn’t specific to a particular database.

Demiurge - a day ago

Yeah, I think it's great that there are ARD formats and you can access bytes via low level s3 like protocol. This enables interesting tools like DuckDB which can abstract away some stuff, and be fastish and "serverless". However, clearly there is also some kind of marketing hype train and jargon built around it, and it seems like a concerted movement to displace some other "boring" and "uncool" products and technologies. I actually think it's great to displace proprietary services with open formats and protocols. I hope it takes out "data lakes" and co, but I'd love to keep MVC and not invent completely new terms, APIs and ORMs, for things that have been working fine, for a long time.

mousematrix - a day ago

hey all, another perspective that I have been thinking about is if semantic layers are like ORM for but BI dashboards. Actually, they I think its more than BI dashboards since a similar idea applies to Features. Features in ML land are nothing but a Measure + Entity metadata + TTL. So, really its about higher-order semantics and as we move up the stack, we need richer expression to describe our world.

Feature stores explored here: https://www.xorq.dev/blog/featurestore-to-featurehouse

I think my key takeaway building this is that we need better expression systems and Ibis is a great foundation to build yours..maybe you want to build a language for some other domain etc.

PS: I am one of the authors of bsl and co-founder of Xorq.

I am one of the authors of bsl and founder of xorq.

LargoLasskhyfv - a day ago

OT, but I really like the design of their site.