Launch HN: LlamaFarm (YC W22) – Open-source framework for distributed AI

github.com

106 points by mhamann 6 days ago


Hi HN! We're Rob, Matt, and Rachel from LlamaFarm (https://llamafarm.dev). We're building an open-source AI framework based on a simple belief: the future isn't one massive model in the cloud—it's specialized models running everywhere, continuously fine-tuned from real usage.

The problem: We were building AI tools and kept falling into the same trap. AI demos die before production. We built a bunch of AI demos but they were impossible to get to production. It would work perfectly on our laptop, but when we deployed it, something broke, and RAG would degrade. If we were running our own model, it would quickly become out of date. The proof-of-concept that impressed the team couldn't handle real-world data.

Our solution: declarative AI-as-code. One YAML defines models, policies, data, evals, and deploy. Instead of one brittle giant, we orchestrate a Mixture of Experts—many small, specialized models you continuously fine-tune from real usage. With RAG for source-grounded answers, systems get cheaper, faster, and auditable.

There’s a short demo here: https://www.youtube.com/watch?v=W7MHGyN0MdQ and a more in-depth one at https://www.youtube.com/watch?v=HNnZ4iaOSJ4.

Ultimately, we want to deliver a single, signed bundle—models + retrieval + database + API + tests—that runs anywhere: cloud, edge, or air-gapped. No glue scripts. No surprise egress bills. Your data stays in your runtime.

We believe that the AI industry is evolving like computing did. Just as we went from mainframes to distributed systems and monolithic apps to microservices, AI is following the same path: models are getting smaller and better. Mixture of Experts is here to stay. Qwen3 is sick. Llama 3.2 runs on phones. Phi-3 fits on edge devices. Domain models beat GPT-5 on specific tasks.

RAG brings specialized data to your model: You don't need a 1T parameter model that "knows everything." You need a smart model that can read your data. Fine-tuning is democratizing: what cost $100k last year now costs $500. Every company will have custom models.

Data gravity is real: Your data wants to stay where it is: on-prem, in your AWS account, on employee laptops.

Bottom line: LlamaFarm turns AI from experiments into repeatable, secure releases, so teams can ship fast.

What we have working today: Full RAG pipeline: 15+ document formats, programmatic extraction (no LLM calls needed), vector-database embedding, universal model layer that runs the same code for 25+ providers, automatic failover, cost-based routing; Truly portable: Identical behavior from laptop → datacenter → cloud; Real deployment: Docker Compose works now with Kubernetes basics and cloud templates on the way.

Check out our readme/quickstart for easy install instructions: https://github.com/llama-farm/llamafarm?tab=readme-ov-file#-...

Or just grab a binary for your platform directly from the latest release: https://github.com/llama-farm/llamafarm/releases/latest

The vision is to be able to run, update, and continuously fine-tune dozens of models across environments with built-in RAG and evaluations, all wrapped in a self-healing runtime. We have an MVP of that today (with a lot more to do!).

We’d love to hear your feedback! Think we’re way off? Spot on? Want us to build something for your specific use case? We’re here for all your comments!

jochalek - 6 days ago

Very cool to see a serious local first effort. Looking back at how far local models have come I definitely believe their usefulness combined with RAG or in domain specific contexts is soon to be (or already is) on par with general purpose gpt5-like massive parameter cloud models. The ability to generate quality responses without having to relinquish private data to the cloud used to be a pipedream. It's exciting to see a team dedicated to making this a reality.

A4ET8a8uTh0_v2 - 6 days ago

I am not sure if it is the future, but I am glad there is some movement to hinder centralization in this sector as much as possible ( yes, I recognize future risk, but for now it counts as hindering it ).

johnthecto - 6 days ago

So this sounds like an application layer approach, maybe just shy of a replit or base44, with the twist that you can own the pipeline. While there's something to that, I think there are some further questions around differentiation that need to be answered. I think the biggest challenge is going to be the beachead: what client demographic has the cash to want to own the pipeline and not use SaaS, but doesn't have the staff on hand to do it?

zackangelo - 5 days ago

Just a bit of feedback:

> Instead of one brittle giant, we orchestrate a Mixture of Experts…

“mixture of experts” is a specific term of art that describes an architectural detail of a type of transformer model. It’s definitely not using smaller specialized models for individual tasks. Experts in an MoE model are actually routed to on a per token basis, not on a per task or per generation basis.

I know it’s tempting to co-opt this term because it would fit nicely for what you’re trying to do but it just adds confusion.

gus_22 - 5 days ago

Would like to connect. I've got a YC colleague I have asked to setup an intro. Agree with John's point below about the chicken and egg scenario. Our community flagged your post. Fundamental problems like who cares, who pays reinforce why this hasn't already been done (well and at scale). And why this time might be the right time!

Eisenstein - 5 days ago

How do you deal with the space continually evolving? Like, MCP changed major ways over the course of a few months, new models are released with significant capability upgrades every month, inference engines like llamacpp get updated multiple times a day. But organizations want to setup their frameworks and then maintain them. Will this let them do that?

johnthecto - 5 days ago

Heya - one blue sea question.

Where are you on Vulkan support? Hard to find good stacks to use with all this great intel and non-rocm amd hardware. Might be a good angle too rather than chasing the usual Nvidia money train.

ivape - 5 days ago

We built a bunch of AI demos but they were impossible to get to production. It would work perfectly on our laptop, but when we deployed it, something broke, and RAG would degrade.

How did RAG degrade when it went to prod? Do you mean your prod server had throughput issues?

bobbyradford - 6 days ago

I'm a contributor on this project and am very excited to hear your feedback. We really hope that this will become a helpful tool for building AI projects that you own and run yourself!

serjester - 6 days ago

Congrats on the launch. YC 2022? I'm assuming this was a pivot - what lead to it and how do you guys plan on making money long term?

darkro - 6 days ago

I love the ethos of the project! I think your docs link might be broken, however. Looking forward to checking this out!

Unheard3610 - 6 days ago

but wait, why should I do this for my first home grown orchestration instead of something else? Like, if I want to set up a local LLM running on my old laptop for some kind of RAG on all my hard drives why is this best? Or if I want agentic monitoring of alarms instead of paying for simplisafe or ring or whatever.

smogs - 6 days ago

Looks great! Congrats on the launch. How is this different than llamaindex?

outfinity - 6 days ago

Hope you are right and we decentralize AI properly...

singlepaynews - 6 days ago

Very cool. I jumped in here thinking it was gonna be something else though: a packaged service for distributing on-prem model running across multiple GPUs.

I'm basically imagining a vast.ai type deployment of an on-prem GPT; assuming that most infra is consumer GPUs on consumer devices, the idea of running the "company cluster" as combined compute of the company's machines

bityard - 6 days ago

Open source but backed by venture capital, so what is your monetization strategy?

swyx - 5 days ago

yeah guys look i wish you well and respect for launching and all but this is just not going to ever be a venture scale startup and you should calibrate your expectations. you could be wasting the best years of your life being a wrapper of a wrapper of a wrapper and competing on developer experience in open source for no money, or you could be building agents.

build agents. please.

olokobayusuf - 6 days ago

This is super interesting! I'm the founder of Muna (https://docs.muna.ai) with much of the same underlying philosophy, but a different approach:

We're building a general purpose compiler for Python. Once compiled, developers can deploy across Android, iOS, Linux, macOS, Web (wasm), and Windows in as little as two lines of code.

Congrats on the launch!