AI founders will learn the bitter lesson
lukaspetersson.com311 points by gsky 19 hours ago
311 points by gsky 19 hours ago
There's only one core problem in AI worth solving for most startups building AI powered software: context.
No matter how good the AI gets, it can't answer about what it doesn't know. It can't perform a process for which it doesn't know the steps or the rules.
No LLM is going to know enough about some new drug in a pharma's pipeline, for example, because it doesn't know about the internal resources spread across multiple systems in an enterprise. (And if you've ever done a systems integration in any sufficiently large enterprise, you know that this is a "people problem" and usually not a technical problem).
I think the startups that succeed will understand that it all comes down to classic ETL: identify the source data, understand how to navigate systems integration, pre-process and organize the knowledge, train or fine-tune a model or have the right retrieval model to provide the context.
There's fundamentally no other way. AI is not magic; it can't know about trial ID 1354.006 except for what it was trained on and what it can search for. Even coding assistants like Cursor are really solving a problem of ETL/context and will always be. The code generation is the smaller part; getting it right requires providing the appropriate context.
This is why I strongly suspect that AI will not play out the way the Web did (upstarts unseat giants) and will instead play out like smartphones (giants entrench and balloon).
If all that matters is what you can put into context, then AI really isn't a product in most cases. The people selling models are actually just selling compute, so that space will be owned by the big clouds. The people selling applications are actually just packaging data, so that space will be owned by the people who already have big data in their segment: the big players in each industry. All competitors at this point know how important data is, and they're not going to sell it to a startup when they could package it up themselves. And most companies will prefer to just use features provided by the B2B companies they already trust, not trust a brand new company with all the same data.
I fully expect that almost all of the AI wins will take the form of features embedded in existing products that already have the data (like GitHub with Copilot), not brand new startups who have to try to convince companies to give them all their data for the first time.
Yup. And it’s already playing out that way. Anthropic, OpenAI, Gemini - technically not an upstart. All have hyperscalers backing and subsidizing their model training (AWS, Azure, GCP, respectively). It’s difficult to discern where the segmentation between compute and models are here.
> It’s difficult to discern where the segmentation between compute and models are here.
Startups can outcompete the Foundational Model companies by concentrating on creating a very domain specific model, and providing support and services that comes out of having expertise in that specific domain.
This is why OpenAI chose to co-invest in Cybersecurity startups with Menlo Ventures in 2022 instead of building their own dedicated cybersecurity vertical, because a partnership driven growth model nets the most profit with the least resources expended when trying to expand your TAM into a new and very competitive market like Cybersecurity.
This is the same reason why hyperscalers like Microsoft, Amazon, and Google themselves have ownership stakes in the foundational model companies like Anthropic, OpenAI, etc because at Hyperscalers size and revenue, Foundational Models are just a feature (an important feature, but a feature nontheless).
Foundational Models are a good first start, but are not 100% perfect in a number of fields and usecases. Ime, tooling built with these models are often used to cut down on headcount by 30-50% for the team using it to solve a specific problem. And this is why domain specific startups still thrive - sales, support, services, etc will still need to be tailored for buyers.
All of what you wrote is mostly true, except that "not 100% perfect in a number of fields and usecases" is quite an understatement. You mention the cybersecurity vertical. As a datapoint, I have put the simplest code security analysis question to ChatGPT (4o mini, for those who might say wait until the next one comes out). I made a novel vulnerable function, so that it would have never been seen before. I chose a very simple and easy vulnerability. Scores of security researchers in my vicinity spotted the vulnerability trivially and instantly. ChatGPT was more than useless, failing miserably to perform any meaningful analysis. The above is anecdotal data. Could be that a different tool would perform better. However, even if such models were harnessed by a startup to solve a specific problem, there is absolutely no way for present capabilities to yield a 30-50% HC reduction in this subdomain.
I agree. Foundational models suck at the high value security work that is needed.
That said, the easiest proof-of-value for foundation models in security today is automating the SOC function by auto-generating playbooks, stitching context from various signal sources, and being able to auto-summary an attack.
This reduces the need for hiring a junior SOC Analyst, and is a workflow that has already been adopted (or is in the process of being adopted) by plenty of F500s.
At the end of the day, foundational models cannot reason yet, and that kind of capability is still far away.
Interestingly, this is the exact opposite of the point the article makes — which is that over time, more general models and more compute are more capable, and by building a domain-specific model you just build a ceiling past which you can’t reach.
This is not the same as having unique access to domain-specific data, which becomes more valuable as you run it through more powerful domain-agnostic models. It sounds like this latter point is the one you say has value for startups to tackle
> This is not the same as having unique access to domain-specific data, which becomes more valuable as you run it through more powerful domain-agnostic models. It sounds like this latter point is the one you say has value for startups to tackle
Exactly!
by concentrating on creating a very domain specific model
I don’t disagree with this from an economics perspective (it’s expensive running an FM to handle domain specific queries). But the most accurate domain knowledge always tends to involve internal data. And then it becomes the issue raised above: a people problem involving internal knowledge and data management.
Incumbent hyperscalers and vendors like MS, Amazon, etc (and even third party data managers like snowflake) tend to have more leverage when it becomes this type of data problem.
>Startups can outcompete the Foundational Model companies by concentrating on creating a very domain specific model, and providing support and services that comes out of having expertise in that specific domain.
Well-put because the business is focused and to-the-point from the beginning.
For those applications where this gets you in the door to the domain, or gets you in sooner, this can be a competitive advantage. I think Lukas is pointing out the longer-term limitations of the approach though. I thought this would extend from 1980s electronics myself.
You could edit this however:
>Startups can [prosper] by concentrating on creating a very domain specific model, and providing support and services that comes out of having expertise in that specific domain.
And it may hold true anyway and you may have a lifetime of work ahead of you whether or not the more-generalized capabilities catch up or not. You don't always have to actually be competitive with capitalized corporations in the market if you are adding real value to begin with, and the sky can still be the limit.
>the most accurate domain knowledge always tends to involve internal data.
>Incumbent hyperscalers . . . tend to have more leverage when it becomes this type of data
That can help as a benchmark to gauge when a person or small team actually can occasionally outperform a billion-dollar corporation in some way or another.
I'm no Mr. Burns, but to this I have slowly said to myself "ex-cel-lent" similarly for decades.
It's good to watch AI approaches come and go and even better to be adaptable over time.
> AI will not play out the way the Web did (upstarts unseat giants)
Yes, I agree.I recently spoke to a doctor that wanted to do a startup one part of which is an AI agent that can provide consumers second opinions for medical questions. For this to be safe, it will require access to not only patient data, but possibly front line information from content origins like UpToDate because that content is a necessity to provide grounded answers for information that's not in the training set and not publicly available via search.
The obvious winner is UpToDate who owns that data and the pipeline for originating more content. If you want to build the best AI agent for medical analysis, you need to work with UpToDate.
> ...not brand new startups who have to try to convince companies to give them all their data for the first time.
Yes. I think of Microsoft and SharePoint, for example. Enterprises that are using SharePoint for document and content storage have already organized a subset of their information in a way that benefits Microsoft as concerns AI agents that are contextually aware of your internal data.> will instead play out like smartphones (giants entrench and balloon).
Someone correct me if I'm wrong, but didn't smartphones go the "upstarts unseat giants" way? Apple wasn't a phone-maker, and became huge in the phone-market after their launch. Google also wasn't a phone-maker, yet took over the market slowly but surely with their Android purchase.
I barely see any Motorola, Blackberry, Nokia or Sony Ericsson phones anymore, yet those were the giants at one time. Now it's all iOS/Android, two "upstarts" initially.
> Now it's all iOS/Android, two "upstarts" initially.
They weren't upstarts, they were giants who moved into a new (but tightly related) space and pushed out other companies that were in spaces that at first seemed closely related but actually were more different than first appeared.
Android and iOS won because smartphones were actually mobile computers with a cellular chip, not phones with fancy software. Seen that way Apple was obviously not an upstart, they were a giant that grew even further.
Google is perhaps somewhat more surprising since they didn't do hardware at all before, but they did have Chrome, giving them a major in on the web platform side, and were also able to leverage their enormous search revenue. Neither resource is available to an upstart/startup.
Neither Apple nor Google were giants in 2007.
Both were giants in 2007. Google was at peak post-IPO fame and Apple was full steam ahead with Jobs at the helm. They were huge players, far from startups or upstarts.
Seconded. For context, Google was about to launch Chrome and Apple was about to launch the Windows version of Safari. Both companies were clearly on the offense at the time.
(and the parent poster was competing with both :) )
"Not as giant as they are today" is not the same thing as "not giants". Let's compare revenue and net income for Google, Apple, and the other manufacturers that OP mentioned:
* Google: $16.5B revenue, $4.2B net income [0]
* Apple: $24B revenue, $3.5B net income [1]
* Motorola: $36B, -$49M net income [2]
* BlackBerry (RIM): $2B, $382M net income [3]
* Nokia: €51B, €7.9B net income [4]
* Sony: $48B, $799M net income (converted from yen at 158 yen/$) [5]
These numbers are absolutely in the same ballpark as the supposedly-larger players which they beat out, and in terms of profit substantially higher than all but Nokia.
[0] https://abc.xyz/assets/investor/static/pdf/2007_google_annua...
[1] https://d18rn0p25nwr6d.cloudfront.net/CIK-0000320193/4913a18...
[2] https://www.motorolasolutions.com/content/dam/msi/docs/en-xw...
[3] https://www.annualreports.com/HostedData/AnnualReportArchive...
[4] https://www.nokia.com/system/files/files/request-nokia-in-20...
[5] https://www.sony.com/en/SonyInfo/IR/library/ar/SonyAR07-E.pd...
The people selling models are actually just selling compute
Yes, fully agreed. Anything AI is discovering in your dataset could have been found by humans, and it could have been done by a more efficient program. But that would require humans to carefully study it and write the program. AI lets you skip the novel analysis of the data and writing custom programs by using a generalizable program that solves those steps for you by expending far more compute.
I see it as, AI could remove the most basic obstacle preventing us from applying compute to vast swathes of problems- and that’s the need to write a unique program for the problem at hand.
I think you're downplaying how well Cursor is doing "code generation" relative to other products.
Cursor can do at least the following "actions":
* code generation
* file creation / deletion
* run terminal commands
* answer questions about a code base
I totally agree with you on ETL (it's a huge part of our product https://www.definite.app/), but the actions an agent takes are just as tricky to get right.
Before I give Cursor, I often doubt it's going to be able to pull it off and I constantly impressed by how deep it can go to complete a complex task.
This really puzzles me. I tried Cursor and was completely underwhelmed. The answers it gave (about a 1.5M loc messy Spring codebase) were surface-level and unhelpful to anyone but a Java novice. I get vastly better work out of my intern.
To add insult to injury, the IntelliJ plugin threw spurious errors. I ended up uninstalling it and marking my calendar to try again in 6 months.
Yet some people say Cursor is great. Is it something about my project? I can't imagine how it deals with a codebase that is many millions of tokens. Or is it something about me? I'm asking hard questions because I don't need to ask the easy ones.
What are people who think Cursor is great doing differently?
My tinfoil hat theory is that Cursor deploys a lot of “guerilla marketing” with influencers on Twitter/LinkedIn etc. When I tried it, the product was not good (maybe on par with Copilot) but you have people on social media swearing by it. Maybe it just works well for specific types of web development, but I came away thoroughly unimpressed and suspicious that some of the “word of mouth” stuff on them is actually funded by them.
> Maybe it just works well for specific types of web development, but I came away thoroughly unimpressed and suspicious that some of the “word of mouth” stuff on them is actually funded by them.
You can pretty easily disprove this theory by noting the number of extremely well-known, veteran software developers who claim that Cursor (or other kinds of LLM-based coding assistants) are working for them.
Very doubtful any one of them can be bought off, but certainly not all of them.
I tell everyone cursor is awesome because for my use cases cursor is awesome.
I've never thrown 1.5M LoC at it, but for smaller code bases (10k LoC) it is amazing at giving summaries and finding where in the code something is being done.
This is a great question and easy to answer with the context you provided.
I don't think your poor experience is because of you, it's because of your codebase. Cursor works worse (in my experience) on larger codebases and seems particularly good at JS (e.g. React, node, etc.).
Cursor excels at things like small NextJS apps. It will easily work across multiple files and complete tasks that would take me ~30 minutes in 30 seconds.
Trying again in 6 months is a good move. As models get larger context windows and Cursor improves (e.g. better RAG) you should have a better experience.
Cursor's problem isn't bigger context, it's better context.
I've been using it recently with @nanostores/react and @nanostores/router.
It constantly wants to use router methods from react-router and not nanostores so I am constantly correcting it.
This is despite using the rules for AI config (https://docs.cursor.com/context/rules-for-ai). It continually makes the same mistakes and requires the same correction likely because of the dominance of react-router in the model's training. That tells me that the prompt it's using isn't smart enough to know "use @nanostores/router because I didn't find react-router".
I think for Cursor to really nail it, the base prompt needs to have more context that it derived from the codebase. It should know that because I'm referencing @nanostores/router, to include an instruction to always use @nanostores/router.
I don't think you can fix this with a little bit of prompting with current models, since they seem clearly heavily biased towards react-router (a particular version even, iirc from a year ago).
It might help if you included API reference/docs for @nanostores/router in the context, but I'm not sure even that would fix it completely.
Its for novices and youtube AI hucksters. Its the coding equivalent of vibrating belts for weight loss.
So isn’t cursor just a tool for Claude or ChatGpt to use? Another example would be a flight booking engine. So why can’t an AI just talk direct to an IDE? This is hard as the process has changed, due to the human needing to be in the middle.
So Isn’t AI useless without the tools to manipulate?
I’m very “bullish” on AI in general but find cursor incredibly underwhelming because there is little value add compared to basically any other AI coding tool that goes beyond autocomplete. Cursor emphatically does not understand large codebases and smaller (few file codebases) can just be pasted into a chat context in the worst case.
Is it really that different to Claude with tools via MCP, or my own terminal-based gptme? (https://github.com/ErikBjare/gptme)
I thought it's basically a subset of Aider[0] bolted into a VS Code fork, and I remain confused as to why we're talking about it so much now, when we didn't about Aider before. Some kind of startup-friendly bias? I for one would prefer OSS to succeed in this space.
--
[0] - https://aider.chat/
I tried aider and had problems having it update code in existing files. Aider uses a search and replace pattern to update existing code. So you often end up with
>>>SEARCH
}
>>>REPLACE
}, {'more': 'data'}
Of course aider will try to apply this kind of patch even when the search pattern matches several occurrences in the target file. Looking at the Github issues, this is a problem that was brought up several times and was never fixed because apparently it's not even problematic. I moved to cursor, which doesn't have this problem, and never looked back.For what it's worth, gptme will refuse non-unique matches (and ask the LLM to try again). I thought Aider did too (easy win after all), but apparently not.
For me this happened at the end of functions in vanilla JS; I used to work around it by putting "// end of foo()" comments after the closing brace. However, Aider has multiple modes for LLM editing, including diff, udiff, and whole file; you can switch between those when needed.
Somewhat cynically, maybe because there's VC in Cursor.
https://techcrunch.com/2024/12/19/in-just-4-months-ai-coding...
Thanks for spreading the word. I hadn’t heard of Aider before and I’m now going to give it a try today.
Aider is the single best tool I’ve tried. And I’d never heard of it until like 2 weeks ago when someone mentioned it here. I love aider.
The irony is, it's sort of a household name on HN for over a year now, being way ahead of what was available commercially on the market - and yet, it seems most people here haven't heard of it.
(The author used to post a lot of insightful comments here about LLMs and other generative models, too.)
> it's sort of a household name on HN for over a year now, being way ahead of what was available commercially on the market - and yet, it seems most people here haven't heard of it.
Can you clarify what you mean? If "most people here haven't heard of it" then it's probably not a household name.
The same is true of my own gptme, which has been pretty much at parity with Aider along the way.
Paul (Aider author) is a lot better at writing useful stuff than me though! (like the amazing blog posts)
Thanks for mentioning this, because I somehow managed to miss gptme all that time! I'll check it out now.
This is why I was asking. My own gptme is also just slightly different from Aider and has been around roughly as long.
I agree with you at this time, but there are a couple things I think will change this:
1. Agentic search can allow the model to identify what context is needed and retrieve the needed information (internally or externally through APIs or search)
2. I received an offer from OpenAI to give me free credits if I shared my API data with it, in other words, it is paying for industry specific data as they are probably fine tuning niche models.
There could be some exceptions to UI/UX going down specific verticals but eventually these fine tuning sector specific instances value will erode over time but this will likely occupy a niche since enterprise wants maximum configuration and more out of box solutions are oriented around SMEs.
It comes down to moats. Does OpenAI have a moat? It's leading the pack, but the competitors always seem to be catching up to it. We don't see network effects with it yet like with social networks, unless OpenAI introduces household robots for everyone or something, builds a leading marketshare in that segment, and the rich data from these household bots is enough training data that one can't replicate with a smaller robot fleet.
And AI is too fundamental of a technology that a "loss leader biggest wallet wins" strategy, used by the likes of Uber, will work.
API access can be restricted. Big part of why Twitter got authwalled was so that AI models can't train from it. Stack overflow added a no AI models clause to their free data dump releases (supposed to be CC licensed), they want to be paid if you use their data for AI models.
I wasn't referring to OAI, but rather:
1. Existing legacy players with massive data lock-ins like ERP providers and Google/Microsoft.
2. Massive consolidation within AI platforms rather than massive fragmentation if these legacy players do get disrupted or opportunities that do pop up.
In other words - the usual suspects will continue to win because they have the data and lock in. Any marginal value in having a specialized model, agent workflow, or special training data, ect. will not be significant enough to switch to a niche app.
It is indeed unfortunate and niches will definitely exist. What I am referring to is primarily in enterprise.
I don't think OpenAI have a moat in the traditional sense. Other players offer the exact same API so OpenAI can only win with permanent technical leadership. They may indeed be able to attain that but this is no Coca-Cola.
> Agentic search
All you've proposed is moving the context problem somewhere else. You still need to build the search index. It's still a problem of building and providing context.I disagree, these search indexes already exist, they just need to be navigated much how Cursor uses agentic search to navigate your codebase or you call Perplexity to get documentation. If the knowledge exists outside of your mind it can be searched agentically.
what do you think about these guys: https://exa.ai/
Crawling web data is ETL. I think the case stands: the winners in AI/LLM SaaS startup space are the ones that really do ETL well. Whether that's ETL is across an enterprise data set or a codebase.
The AI and LLM are just the the "bake" button. If you want anything good, you still have to prep and put good ingredients in.
To your first point, the LLM still can’t know what it doesn’t know.
Just like you can’t google for a movie if you don’t know the genre, any scenes, or any actors in it, and AI can’t build its own context if it didn’t have good enough context already.
IMO that’s the point most agent frameworks miss. Piling on more LLM calls doesn’t fix the fundamental limitations.
TL;DR an LLM can’t magically make good context for itself.
I think you’re spot on with your second point. The big differentiators for big AI models will be data that’s not easy to google for and/or proprietary data.
Lucky they got all their data before people started caring.
> Just like you can’t google for a movie if you don’t know the genre, any scenes, or any actors in it,
ChatGPT was able to answer "What was the video game with cards where you play against a bear guy, a magic guy and a set of robots?" (it's Inscryption). This is one area where LLMs work.
“Playing cards against a bear guy” is a pretty iconic part of that game… that you, as a human, had the wherewithal to put into that context. Agents don’t have that wherewithal. They’d never come up with “playing cards against a beat guy” if you asked it “what game am I thinking of”
Let’s do another experiment. Do the same for the game I’m thinking of right now.
There were characters in it and one of them had a blue shirt, but that’s all I can remember.
LLMs are really good at 20 questions, so if you give it a chance to ask some follow-up (which they will do if given such a prompt) it will probably figure it out pretty quick.
Sure, so if I, as a human, play 20 questions and add all that context to an LLM, it can perform.
That’s true. It’s why these things aren’t useless.
I’m saying that LLMs arent able to make context for themselves. It’s why these agentic startups are doomed to morph into a more sensible product like search, document QA, or automated browser testing.
You described all of those things to some extent, as much as they apply to video games. No magic here.
It’s not even just the lack of access to the data, so much hidden information to make decisions is not documented at all. It’s intuition, learned from doing something in a specific context for a long time and only a fraction of that context is accessible.
This is where Microsoft has the advantage, all those Teams calls can provide context.
Yes, this is definitely a big problem.
Anyone that's done any amount of systems integration in enterprises knows this.
"Let me talk to Lars; he should know because his team owns that system."
"We don't have any documentation on this, but Mette should know about it because she led the project."
Exactly. Sure, as soon as more humans are replaced by agents who leave the full trace in the logs this fades away but this will take a long time. It will take many tiny steps in this direction.
Context is important but it takes about two weeks to build a context collection bot and integrate it into slack. The hard part is not technical, AIs can rapidly build a company specific and continually updated knowledge base, it's political. Getting a drug company to let you tap slack and email and docs etc is dauntingly difficult.
Difficult to impossible. Their vendors are already working on AI features, so why would they risk adding a new vendor when a vendor they've already approved will have substantially the same capabilities soon?
because a vendor just using AI tools will not achieve the same capabilities as a vendor that either is OpenAI or is backed by OpenAI will achieve soon
I don't believe that to be true—OpenAI is plateauing on model capabilities and turning to scaling inference times instead. There's no moat to "just throw more tokens at the problem", and Meta and Anthropic are both hot on their heels on raw model capabilities. I see absolutely no evidence that OpenAI has a major breakthrough up their sleeve that will allow them to retake the lead.
In the end, models are fundamentally a commodity. Data is all that matters, and in the not too distant future you won't gain anything at all by sending your data to OpenAI versus just using the tooling provided by your existing vendors.
they’re plateauing on pretraining returns, quite possibly (if rumors are to be trusted)… but they are just getting more sophisticated at real world complex RL - which is still similar to throwing more tokens at the problem and is creating large returns.
i feel that the current artifact is already quite close to something that can operate in a competent manner if the downstream RL matches the task of interest well enough
This problem will be eaten by OpenAI et al. the same way the careful prompting strategies used in 2022/2023 were eaten. In a few years we will have context lengths of 10M+ or online fine tuning, combined with agents that can proactively call APIs and navigate your desktop environment.
Providing all context will be little more than copying and pasting everything, or just letting the agent do its thing.
Super careful or complicated setups to filter and manage context probably won't be needed.
Even if your context is a trillion tokens in length, the problem of creating that context still exists. It's still ETL and systems integration.
The model can take actions on the computer - give it access to the company wiki and slack and it can create its own context.
Yall really are just assuming this technology will stay still and not extrapolating from trends. A model that can get 25% on frontiermath is probably soon going to be able to navigate your company slack, that is not a more difficult problem than expert-level math proof development.
Context requires quadratic VRAM. It is why OpenAI hasn't even supported 200k context length yet for its 4o model.
Is there a trick that bypasses this scaling constraint while strictly preserving the attention quality? I suspect that most such tricks lead to performance loss while deep in the context.
I wouldn't bet against this. Whether it's Ring attention, Mamba layers or online fine tuning, I assume this technical challenge will get conquered sooner rather than later. Gemini are getting good results on needle in a haystack with 1M context length.
I suspect the sustainable value will be in providing context that isn't easily accessible as a copy and paste from your hard drive. Whatever that looks like.
Even subpar attention quality is typically better than human memory - we can imagine models that do some sort of triaging from shorter high-quality attention context and extremely long linear (or something else) context.
> Context requires quadratic VRAM
Even if this is not solved, there is so much economic benefit, tens of TBs of VRAM will become feasible.
> No matter how good the AI gets, it can't answer about what it doesn't know. It can't perform a process for which it doesn't know the steps or the rules
This is exactly the motivation behind https://github.com/OpenAdaptAI/OpenAdapt: so that users can demonstrate their desktop workflows to AI models step by step (without worrying about their data being used by a corporation).
I agree but do see 1 realistic solution to solve the problem you describe. Every product on the market is independently integrating a LLM right now that has access to their product’s silo of information. I can imagine a future where a corporate employee interacts with 1 central LLM that in turn understands the domain of expertise of all the other system-specific LLMs. Given that knowledge, the central one can orchestrate prompting and processing responses from the others.
We been using this pattern forever with traditional APIs but the huge hurdle is that the information in any system you integrate with is often both complex and messy. LLMs handle the hard work of handling ambiguity and variations.
There is a second, related problem: continuous learning. AI models won’t go anywhere as long as their state resets on each new session, and they revert to being like the new intern on their first day.
Startups can still win against big players by building better products faster (with AI), collecting more / better data to feed AI, and then feeding that into better AI automation for customers. Big players won't automatically win, but more data is a moat that gives them room to mess up for a long time and still pull out ahead. Even then, big companies already compete against one another and swallowing a small AI startup can help them and therefore starting one can also make sense.
There are not really any startups in the position to feed AI the great data they have.
I somewhat agree. The agent will be able to find the information autonomously. But some data will be proprietary and out of reach for the agent.
Startups should really try to get such a moat. Chapter 2 will cover this.
I found that fine-tuning and RAG can be replaced with tool calling for some specialized domains, e.g. real-time data. Even things like user's location can be tool called, so context can be obtained reliably. I also note that GPT-4o and better are smart enough to chain together different functions you give it, but not reliably. System prompting helps some, but the non-determinism of AI today is both awesome and a cure.
Tool calling is just systems integration with a different name. The job of the tool is still to provide context from some other system.
All of these comments are premised on this technology staying still. A model with memory and the ability to navigate the computer (we are already basically halfway there) would easily eliminate the problems you describe.
HN, i find, also has a tendency to fall prey to the bitter lesson.
AI code copilot like cursor provide a immersive context than most of other AI products.
And how does that differ from any person without that information?
It doesn't.
And that's why the teams that really want to unlock AI will understand that the core problem is really systems integration and ETL; the AI needs to be aware of the entire corpus of relevant information through some mechanism (tool use, search, RAG, graph RAG, etc.) and the startups that win are the ones that are going to do that well.
You can't solve this problem with more compute nor better models.
I've said it elsewhere in this discussion, but the LLM is just a magical oven that's still reliant on good ingredients being prepped and put into the oven before hitting the "bake" button if you want amazing dishes to pop out. If you just want Stouffer's Mac & Cheese, it's already good enough for that.
Wouldn't this just be foundational model + RAG in the limit?
RAG is the action of retrieval to augment generation. Retrieval of what? From where?
The process that feeds RAG is all about how you extract, transform, and load source data into the RAG database. Good RAG is the output of good ETL.
Yeah seems like context is the AI version of cache invalidation, in the sense of the joke that "there's only 2 hard problems in computer science, cache invalidation and naming things". It all boils down to that (that, and naming things)
Also, there's only one hard problem in software engineering: people.
Seems to apply to AI as well.
> There's only one core problem in AI worth solving for most startups building AI powered software: context.
Is this another way of saying "content is king"?
I think this argument only makes sense if you believe that AGI and/or unbounded AI agents are "right around the corner". For sure, we will progress in that direction, but when and if we truly get there–who knows?
If you believe, as I do, that these things are a lot further off than some people assume, I think there's plenty of time to build a successful business solving domain-specific workflows in the meantime, and eventually adapting the product as more general technology becomes available.
Let's say 25 years ago you had the idea to build a product that can now be solved more generally with LLMs–let's say a really effective spam filter. Even knowing what you know now, would it have been right at the time to say, "Nah, don't build that business, it will eventually be solved with some new technology?"
I don't think it's that binary. We've had a lot of progress over the last 25 years; much of it in the last two. AGI is not a well defined thing that people easily agree on. So, determining whether we have it or not is actually not that simple.
Mostly people either get bogged down into deep philosophical debates or simply start listing things that AI can and cannot do (and why they believe why that is the case). Some of those things are codified in benchmarks. And of course the list of stuff that AIs can't do is getting stuff removed from it on a regular basis at an accelerating rate. That acceleration is the problem. People don't deal well with adapting to exponentially changing trends.
At some arbitrary point when that list has a certain length, we may or may not have AGI. It really depends on your point of view. But of course, most people score poorly on the same benchmarks we use for testing AIs. There are some specific groups of things where they still do better. But also a lot of AI researchers working on those things.
What acceleration?
Consider OpenAI's products as an example. GPT-3 (2020) was a massive step up in reasoning ability from GPT-2 (2019). GPT-3.5 (2022) was another massive step up. GPT-4 (2023) was a big step up, but not quite as big. GPT-4o (2024) was marginally better at reasoning, but mostly an improvement with respect to non-core functionality like images and audio. o1 (2024) is apparently somewhat better at reasoning at the cost of being much slower. But when I tried it on some puzzle-type problems I thought would be on the hard side for GPT-4o, it gave me (confidently) wrong answers every time. 'Orion' was supposed to be released as GPT-5, but was reportedly cancelled for not being good enough. o3 (2025?) did really well on one benchmark at the cost of $10k in compute, or even better at the cost of >$1m – not terribly impressive. We'll see how much better it is than o1 in practical scenarios.
To me that looks like progress is decelerating. Admittedly, OpenAI's releases have gotten more frequent and that has made the differences between each release seem less impressive. But things are decelerating even on a time basis. Where is GPT-5?
>Let's say 25 years ago you had the idea to build a product
I resemble that remark ;)
>that can now be solved more generally with LLMs
Nope, sorry, not yet.
>"Nah, don't build that business, it will eventually be solved with some new technology?"
Actually I did listen to people like that to an extent, and started my business with the express intent of continuing to develop new technologies which would be adjacent to AI when it matured. Just better than I could at my employer where it was already in progress. It took a couple years before I was financially stable enough to consider layering in a neural network, but that was 30 years ago now :\
Wasn't possible to benefit with Windows 95 type of hardware, oh well, didn't expect a miracle anyway.
Heck, it's now been a full 45 years since I first dabbled in a bit of the ML with more kilobytes of desktop memory than most people had ever seen. I figured all that memory should be used for something, like memorizing, why not? Seemed logical. Didn't take long to figure out how much megabytes would help, but they didn't exist yet. And it became apparent that you could only go so far without a specialized computer chip of some kind to replace or augment a microprocessor CPU. What kind, I really had no idea :)
I didn't say they resembled 25-year-old ideas that much anyway ;)
>We've had a lot of progress over the last 25 years; much of it in the last two.
I guess it's understandable this has been making my popcorn more enjoyable than ever ;)
Agreed. There's a difference between developing new AI, and developing applications of existing AI. The OP seems to blur this distinction a bit.
The original "Bitter Lesson" article referenced in the OP is about developing new AI. In that domain, its point makes sense. But for the reasons you describe, it hardly applies at all to applications of AI. I suppose it might apply to some, but they're exceptions.
You think it will be 25 years before we have a drop in replacement for most office jobs?
I think it will be less than 5 years.
You seem to be assuming that the rapid progress in AI will suddenly stop.
I think if you look at the history of compute, that is ridiculous. Making the models bigger or work more is making them smarter.
Even if there is no progress in scaling memristors or any exotic new paradigm, high speed memory organized to localize data in frequently used neural circuits and photonic interconnects surely have multiple orders of magnitude of scaling gains in the next several years.
> You seem to be assuming that the rapid progress in AI will suddenly stop.
And you seem to assume that it will just continue for 5 years. We've already seen the plateau start. OpenAI has tacitly acknowledged that they don't know how to make a next generation model, and have been working on stepwise iteration for almost 2 years now.
Why should we project the rapid growth of 2021–2023 5 years into the future? It seems far more reasonable to project the growth of 2023–2025, which has been fast but not earth-shattering, and then also factor in the second derivative we've seen in that time and assume that it will actually continue to slow from here.
At this point, the lack of progress since April 2023 is really what is shocking.
I just looked on midjourney reddit to make sure I wasn't missing some new great model.
Instead what I notice is the small variations on the themes I have already seen a thousand times a year ago now. Midjourney is so limited in what it can actually produce.
I am really worried that all this is much closer to a parlor trick than AGI. "simple trick or demonstration that is used especially to entertain or amuse guests"
It all feels more and more like that to me than any kind of progress towards general intelligence.
> OpenAI has tacitly acknowledged that they don't know how to make a next generation model
Can you provide a source for this? I'm not super plugged into the space.
There's this [0]. But also o1/o3 is that acknowledgment. They're hitting the limits of scaling up models, so they've started scaling compute [1]. That is showing some promise, but it's nowhere near the rate of growth they were hitting while next gen models were buildable.
[0] https://www.wsj.com/tech/ai/openai-gpt5-orion-delays-639e769...
[1] https://techcrunch.com/2024/11/20/ai-scaling-laws-are-showin...
I think you're suffering from some survivorship bias here. There are lot of technologies that don't work out.
Computation isn't one of them so far. Do you believe this is the end of computing efficiency improvements?
Also office jobs will be adapted to be a better fit to what AI can do, just as manufacturing jobs were adapted so that at least some tasks could be completed by robots.
Not my downvote, just the opposite but I think you can do a lot in an office already if you start early enough . . .
At one time I would have said you should be able to have an efficient office operation using regular typewriters, copiers, filing cabinets, fax machines, etc.
And then you get Office 97, zip through everything and never worry about office work again.
I was pretty extreme having a paperless office when my only product is paperwork, but I got there. And I started my office with typewriters, nice ones too.
Before long Google gets going. Wow. No-ads information superhighway, if this holds it can only get better. And that's without broadband.
But that's besides the point.
Now it might make sense for you to at least be able to run an efficient office on the equivalent of Office 97 to begin with. Then throw in the AI or let it take over and see what you get in terms of output, and in comparison. Microsoft is probably already doing this in an advanced way. I think a factor that can vary over orders of magnitude is how does the machine leverage the abilities and/or tasks of the nominal human "attendant"?
One type of situation would be where a less-capable AI could augment a defined worker more effectively than even a fully automated alternative utilizing 10x more capable AI. There's always some attendant somewhere so you don't get a zero in this equation no matter how close you come.
Could be financial effectiveness or something else, the dividing line could be a moving target for a while.
You could even go full paleo and train the AI on the typewriters and stuff just to see what happens ;)
But would you really be able to get the most out of it without the momentum of many decades of continuous improvement before capturing it at the peak of its abilities?
> You seem to be assuming that the rapid progress in AI will suddenly stop.
> I think if you look at the history of compute, that is ridiculous. Making the models bigger or work more is making them smarter.
It's better to talk about actual numbers to characterise progress and measure scaling:
" By scaling I usually mean the specific empirical curve from the 2020 OAI paper. To stay on this curve requires large increases in training data of equivalent quality to what was used to derive the scaling relationships. "[^2]
"I predicted last summer: 70% chance we fall off the LLM scaling curve because of data limits, in the next step beyond GPT4.
[…]
I would say the most plausible reason is because in order to get, say, another 10x in training data, people have started to resort either to synthetic data, so training data that's actually made up by models, or to lower quality data."[^0]
“There were extraordinary returns over the last three or four years as the Scaling Laws were getting going,” Dr. Hassabis said. “But we are no longer getting the same progress.”[^1]
---
[^0]: https://x.com/hsu_steve/status/1868027803868045529
o1 proved that synthetic data and inference time is a new ramp. There will be more challenges and more innovations. There is a lot of room in hardware, software, model training and model architecture left.
> There is a lot of room in hardware, software, model training and model architecture left.
Quantify this please? And make a firm prediction with approximate numbers/costs attached?
It's not realistic to make firm quantified predictions any more specific than what I have given.
We will likely see between 3 and 10000 times improvement in efficiency or IQ or speed of LLM reasoning in the next 5 years.
> It's not realistic to make firm quantified predictions any more specific than what I have given.
Then do you actually know what you're talking about or are you handwaving? I'm not trying to be offensive but business plans can't be made based on a lack of predictions.
> We will likely see between 3 and 10000 times improvement in efficiency or IQ or speed of LLM reasoning in the next 5 years
That variance is too large to take you seriously, unfortunately. That's unfortunate because I was really hoping you had an actionable insight for this discussion. :(
If I, for instance, tell my wife I can improve our income by 3x or 1000x but I don't really know, there's no planning that can be done and I'll probably have to sleep on the couch until I figure out what the hell I'm doing.
> business plans can't be made based on a lack of predictions.
They can. It's called "taking a risk". Which is what startups are about, right?
It's hard to give a specific prediction here (I'm leaning towards 10x-1000x in the next 5 years), but there's also no good reason to believe progress will stop, because a) there's many low and mid-hanging fruits to pick, as outlined by GP, and b) because it never did so far, so why would it stop now specifically?
Why did we stop going to the moon and flying commercial supersonic?
Some things that are technologically possible are not economically viable. AI is a marvel but I'm not convinced it will actually plug into economic gains that justify the enormous investment in compute.
> If I, for instance, tell my wife I can improve our income by 3x or 1000x but I don't really know, there's no planning that can be done and I'll probably have to sleep on the couch until I figure out what the hell I'm doing.
For most people, even a mere 3x in the next 5 years is huge, it's 25% per year growth.
3x in 5 years is a reasonable low-ball for hardware improvements alone. Caveat: top-end silicon is now being treated as a strategic asset, so there may be wars over it, driving up prices and/or limiting progress, even on the 5-year horizon.
I'm unclear why your metaphor would have you sleeping on the sofa: If tonight you produce a business idea for which you can be 2σ-confident that it will give you an income 5 years from now in the range [3…1000]x, you can likely get a loan for a substantially bigger house tomorrow than you were able to get yesterday; in the UK that's a change slightly larger than going from the median average full-time salary to the standard member of parliament salary.
(The reason behind this, observed lowering of compute costs, has been used even decades ago to delay investment in compute until the compute was cheaper).
The arguments I've seen elsewhere for order-of-10,000x* cost improvements (which is a proxy for efficiency and speed if not IQ) is based on various different observations cost reductions** since ChatGPT came out — personally, I doubt that the high end of that would come to pass, my guess is those all represent low-hanging fruit that can't be picked twice, but even then I would still expect there to be some opportunity for further gains.
* The original statement had one more digit in it than yours, but this doesn't make much difference to the argument either way
** e.g. https://www.wing.vc/content/plummeting-cost-ai-intelligence
We already have AGI in some ways though. Like I can use Claude for both generating code and helping with some maths problems and physics derivations.
It isn't a specific model for any of those problems, but a "general" intelligence.
Of course, it's not perfect, and it's obviously not sentient or conscious, etc. - but maybe general intelligence doesn't require or imply that at all?
For me, general intelligence from a computer will be achieved when it knows when it's wrong. You may say that humans also struggle with this, and I'd agree - but I think there's a difference between general intelligence and consciousness, as you said.
Being wrong is one thing, on the other hand knowing that they don't know something is something humans are pretty good at (even if they might not admit to not knowing something and start bullshitting anyways). Current AI predictably fails miserably every single time.
> knowing that they don't know something is something humans are pretty good at (even if they might not admit to not knowing something and start bullshitting anyways)
I'd like to believe this, but I'm not a mind reader and I feel like the last decade has eroded a lot of my trust in the ability of adults to know when they're wrong. I still have hope for children, at least.
I think one thing ignored here is the value of UX.
If a general AI model is a "drop-in remote worker", then UX matters not at all, of course. I would interact with such a system in the same way I would one of my colleagues and I would also give a high level of trust to such a system.
If the system still requires human supervision or works to augment a human worker's work (rather than replace it), then a specific tailored user interface can be very valuable, even if the product is mostly just a wrapper of an off-the-shelf model.
After all, many SaaS products could be built on top of a general CRM or ERP, yet we often find a vertical-focused UX has a lot to offer. You can see this in the AI space with a product like Julius.
The article seems to assume that most of the value brought by AI startups right now is adding domain-specific reliability, but I think there's plenty of room to build great experiences atop general models that will bring enduring value.
If and when we reach AGI (the drop-in remote worker referenced in the article), then I personally don't see how the vast majorities of companies - software and others - are relevant at all. That just seems like a different discussion, not one of business strategy.
The value of UX is being ignored, as the magical thinking has these AIs being fully autonomous, which will not work. The phrase "the devil's in the details" needs to be imprinted on everyone's screens, because the details of a "drop-in remote worker" are several Grand Canyons yet to be realized. This civilization is vastly more complex than you, dear reader, realize, and the majority of that complexity is not written down.
Also, the UX of your potential "remote workers" are vitally important! The difference between a good and a bad remote worker is almost always how good they are at communicating - both reading and understanding tickets of work to be done and how well they explain, annotate, and document the work they do.
At the end of the day, someone has to be checking the work. This is true of humans and of any potential AI agent, and the UX of that is a big deal. I can get on a call and talk through the code another engineer on my team wrote and make sure I understand it and that it's doing the right thing before we accept it. I'm sure at some point I could do that with an LLM, but the worry is that the LLM has no innate loyalty or sense of its own accuracy or honesty.
I can mostly trust that my human coworker isn't bullshitting me and any mistakes are honest mistakes that we'll learn from together for the future. That we're both in the same boat where if we write or approve malicious or flagrantly defective code, our job is on the line. An AI agent that's written bad or vulnerable code won't know it, will completely seriously assert that it did exactly what it was told, doesn't care if it gets fired, and may say completely untrue things in an attempt to justify itself.
Any AI "remote worker" is a totally different trust and interaction model. There's no real way to treat it like you would another human engineer because it has, essentially, no incentive structure at all. It doesn't care if the code works. It doesn't care if the team meets its goals. It doesn't care if I get fired. I'm not working with a peer, I'm working with an industrial machine that maybe makes my job easier.
It's hilarious that people don't see this. The UX of an "llm product" is the quality of the text in text out. An "aligned model" is one with good UX. Instruct tuning is UX. RLHF is UX.
I guess part of the point is that the value of the UX will quickly start to decrease as more tasks or parts of tasks can be done without close supervision. And that is subject to the capabilities of the models which continues to improve.
I suggest that before we satisfy _everyone_'s definition of AGI, more and more people may decide we are there as their own job is automated.
The UX at that point, maybe in 5 or 10 or X years, might be a 3d avatar that pops up in your room via mixed reality glasses, talks to you, and then just fires off instructions to a small army of agents on your behalf.
Nvidia actually demoed something a little bit like that a few days ago. Except it lives on your computer screen and probably can't manage a lot of complex tasks on it's own. Yet.
Or maybe at some point it doesn't need sub agents and can just accomplish all of the tasks on its own. Based on the bitter lesson, specialized agents are probably going to have a limited lifetime as well.
But I think it's worth having the AGI discussion as part of this because it will be incremental.
Personally, I feel we must be pretty close to AGI because Claude can do a lot of my programming for me. I still have to make important suggestions, and routinely for obvious things, but it is much better at me at filling in all the details and has much broader knowledge.
And the models do keep getting more robust, so I seriously doubt that humans will be better programmers overall for much longer.
Which is an easier way to interact with your bank? Writing a business letter, or filling out a form?
I suspect that we will still be filling out forms, because that’s a better UI for a routine business transaction. It’s easier to know what the bank needs from you if it’s laid out explicitly, and you can also review the information you gave them to make sure it’s correct.
AI could still be helpful for finding the right forms, auto-filling some fields, answering any questions you might have, and checking for common errors, but that’s only a mild improvement from what a good website already does.
And yes, it’s also helpful for the programmers writing the forms. But the bank still needs people to make sure that any new forms implement their consumer interactions correctly, that the AI assist has the right information to answer any questions, and that it’s all legal.
A drop in remote worker will still require their work to be checked and their access to the systems they need to do their work secured in case they are a bad actor.
Chat models make UI redundant. who will want to learn how to use some apps custom interface when they are used to just asking it to do what they want/need? Chat is the most natural interface for humans. UX will be just trying to steer models to kiss your butt in the right way, eventually, and the bar for this will be low as language interaction problems are going to be obvious even to teen-agers.
The amount of work going into RLHF/DPO/instruct tuning and other types of post training is because UX is very important. The bar is high and the difficulty of making a model with a good UX for a given use case is high.
I think the core problem at hand for people trying to use AI in user-facing production systems is "how can we build a reliable system on top of an unreliable (but capable) model?". I don't think that's the same problem that AI researchers are facing, so I'm not sure it's sound to use "bitter lesson" reasoning to dismiss the need for software engineering outright and replace it with "wait for better models".
The article sits on an assumption that if we just wait long enough, the unreliability of deep learning approaches to AI will just fade away and we'll have a full-on "drop-in remote worker". Is that a sound assumption?
It's a little depressing how many high valued startups are basically just wrappers around LLMs that they don't own. I'd be curious to see what percentage of YC latest batch is just this.
> 70% of Y Combinator’s Winter 2024 batch are AI startups. This is compared to -57% of YC Summer 2023 companies and ~32% from the Winter batch one year ago (YC W23).
The thinking is, the models will get better which will improve our product, but in reality, like the article states, the generalized models get better so your value add diminished as there's no need to fine tune.
On the other hand the crypto fund made a killing off of "me too" block chain technology before it got hammered again. So who knows about 2-5 year term but 10 year almost certainly won't have these billion dollar companies that are wrappers around LLMs
https://x.com/natashamalpani/status/1772609994610835505?mx=2
How is being a wrapper for LLMs you don’t own any different from being a company based on cloud infrastructure you don’t own?
LLMs are a platform.
Bill Gates definition of a platform was “A platform is when the economic value of everybody that uses it exceeds the value of the company that creates it.”
It's relatively easy to move to different cloud infrastructure (or host your own) later on down the line.
If you rely on an OpenAI LLM for your business, they can basically do whatever they want to you. Oh, prices went up 10x? What are you gonna do, train your own AI?
Anyone who says it’s relatively easy to go to a different cloud has never led a major migration (I have). That’s kind of part of my day job - cloud consulting.
And if you think it’s hard to move to another LLM you haven’t done a major implementation using an LLM and used LangChain (I have). It abstracts a lot of the work and people can choose which LLM they want to use.
You don’t train your LLM. You use your LLM along with RAG.
I have no direct experience with this, but I’ve read that prices went down by 10x or so in 2024, and it seems that OpenAI has plenty of competition?
https://simonwillison.net/2024/Dec/31/llms-in-2024/#llm-pric...
A LLM wrapper adds near-zero value. If I type some text into a "convert to Donald Trump style" tool, it produces the exact same output as typing it into ChatGPT following "Convert this text to Donald Trump style:" because that's what the tool actually does. Implementing ChatGPT is 99.999% of the value creation. Prepending the prompt is 0.001%. The surprising fact is that the market assigns a non-zero value to the tool anyway.
Startups that use cloud servers still write the software that goes on those servers, which is 90% of the value creation.
I also think the proliferation of "GPTs" really took the wind out of thousands of these "NextJS Frontend + Custom System Context + LLM" wrapper apps as well.
That’s not what I see from the companies I work with (cloud consulting).
Almost all of them are using LLMs along with “tools” and RAG.
The author discusses the problem from the point of engineering, not from business. When you look at it from business perspective, there is a big advantage of not waiting, and using whatever exists right now to solve the business problem, so that you can get traction, get funding, grab marketshare, build a team, and when the next day a better model will come, you can rewrite your code, and you would be in a much better position to leverage whatever new capabilities the new models provide; you know your users, you have the funds, you built the right UX...
The best strategy from your experience, is to jump on a problem as soon there is opportunity to solve it and generate lots of business value within the next 6 months. The trick is finding that subproblem that is worth a lot right now and could not be resolved 6 months ago. A couple of AI-sales startups "succeeded" quite well doing that (e.g. 11x), now they are in a good position to build from there (whether they will succeed in building a unicorn, that's another question, it just looks like they are in a good position now).
Very true. Most code written today will probably be obsolete in 2050. So why write it? Because it puts you in a good strategic position to keep leading in your space.
Well. We were working on a search engine for industry suppliers since before the whole AI hype started (even applied to YC once), and hit a brick wall at some point were it got too hard to improve search result quality algorithmically. To understand what that means: We gathered lots of data points from different sources, tried to reconcile that into unified records, then find the best match for a given sourcing case based on that. But in a lot of cases, both the data wasn’t accurate enough to identify what a supplier was actually manufacturing, and the sourcing case itself wasn’t properly defined, because users found it too hard to come up with good keywords for their search.
Then, LLMs entered the stage. Suddenly, we became able to both derive vastly better output from the data we got, and also offer our users easier ways to describe what they were looking for, find good keywords automatically, and actually deliver helpful results!
This was only possible because AI augments our product well and really provides a benefit in that niche, something that would just not have been possible otherwise. If you plan on founding a company around AI, the best advice I can give you is to choose a problem that similarly benefits from AI, but does exist without it.
> the data wasn’t accurate enough to identify what a supplier was actually manufacturing
how did the LLM help with that challenge?
A guess: Their ability to infer reality from incomplete and indirect information.
Controversial opinion: I don't believe in the bitter lesson. I just think that the current DNN+SGD approaches are just not that good at learning deep general expressive patterns. With less inductive bias the model memorizes a lot of scenarios and is able to emulate whatever real work scenario you are trying to make the model learn. However it fails to simulate this scenario well. So it's kind of misleading to say that it's generally better to have less inductive bias. That is only true if your model architecture and optimization approach are just a bit crap.
My second controversial point regarding AI research and startups: doing research sucks. It's risky business. You are not guaranteed success. If you make it, your competitors will be hot on your tail and you will have to keep improving all the time. I personally would rather leave the model building to someone else and focus more on building products with the available models. There are exceptions like finetuning for your specific product or training bespoke models for very specific tasks at hand.
I also don't believe in the 'bitter lesson' when extrapolated to apply to all 'AI application layer implementations' - at least in the context of asserting that the universe of problem scopes are affected by it.
I think it is true in an AI research context, but an unstated assumption is that you have complete data, E2E training, and the particular evaluated solution is not real-world unbounded.
It assumes infinite data, and it assumes the ability to falsify the resulting model output. Most valuable, 'real world' applications of AI when trying to implement in practice have an issue with one or both of those. So in other words: where a fully unsupervised AI pathway is viable due to the structure of the problem, absolutely.
I'm not convinced in the universality of this. Doesn't mean the core point of this essay on the futility of startups basing their business around one of the off the shelf LLMs isn't valid - I think for many they risk being generalized away.
> I just think that the current DNN+SGD approaches are just not that good
I'll add even further. The transformers and etc that we are using today are not good either.
That's evidenced by the enormous amount of memory they need to do any task. We have just taken the one approach that was working a bit better for sensorial tasks and pattern matching, and went all in, adding hardware after hardware so we could brute-force some cognitive tasks out of it.
If we do the same to other ML architectures, I don't think they would stay much behind. And maybe some would get even better results.
The "bitter lesson" is self evidently true in one way as was a quantum jump in what AI's could do once we gave them enough compute. But as a "rule of AI" I think it's being over generalised, meaning it's being used to make predictions where it doesn't apply.
I don't see how the bitter lesson could not be true for the current crop of LLM's. They seem to have memorised just about everything mankind has written down, and squished it into something of the order of 1TB. You can't do that without a lot of memory to recognise the common patterns and eliminate them. The underlying mechanism is nothing like the zlib's deflate but when it comes to memory you have to throw at it they are the same in this respect. The bigger the compression window the better deflate does. When you are trying to recognise all the pattens in everything humans have written down to a deep level (such as discovering the mathematical theorems are generally applicable), the memory window and/or compute you have to use must be correspondingly huge.
That was also true to a lesser extent when Deep Mind taught an AI to play pong in 2013. They had 1M of pixels arriving 24 times a second, and it had to learn to pick out balls, bats and balls in that sea of data. It's clearly going to require a lot of memory and compute to do that. Those resources simply weren't available on a researchers budget much before 2013.
Since 2013, we've asked our AI's to ingest larger and larger datasets using the much same techniques used in 2013 (but known long before that) and been enchanted with the results. The "bitter lesson" predicts you need correspondingly more compute and memory to compress those datasets. Is it really a lesson, or engineering rule of thumb that only became apparent when we had enough compute to do anything useful with AI?
I'm not sure this rule of thumb has much applicability outside of this "lets compress enormous amounts of data, looking for deep structure" realm. That's because if we look at neural networks in animals, most are quite small. A mosquito manages to find us for protein, find the right plant sap for food, find a mate, find water with enough algae for it's eggs, using data from vision, temperature sensors, and smell, and uses that to activate wings, legs and god knows what else. It does all that with 100,000 neurons. That's not what a naive reading of "the bitter lesson" tells you it should take.
Granted it may take an AI of enormous proportions to discover how to do it with 100,000 neurons. Nature did it by iteratively generating trillions upon trillions of these 100,000 neurons networks over millennia, and used a genetic algorithm to select the best at each step. If we have to do it that way it will be a very bitter lesson. The 10 fold increases in compute every few years that made us aware of the bitter lesson is ending. If the prediction of the bitter lesson is that we have rely on it continuing to build our mosquito emulation, then it's predicting it will take us centuries to build all the sorts of robots we need to do all the jobs we have.
But that's looking unlikely. We have an example. On one hand we have Tesla FSD, using throwing more and more resources an conventional AI training in the way the bitter lesson says you must do in order to progress. On the other we have Waymo using a more traditional approach. It's pretty clear which approach is failing and the other is working - and it's not going the way the bitter lesson says it should.
This (particularly the figure 1 illustration) discounts the "distribution" layer for apps
Single app/feature startups will lose (true long before AI). A few will grow large enough to entrench distribution and offer a suite of services, creating defensibility against competitors
The distributors (eg. a SaaS startup that rapidly landed/expanded) will continue to find bleeding edge ways to offer a 6-12mo advantage against foundation models and incumbents
GitLab is a great example of this model. The equivalent bitter lesson of the web is that every cutting edge proprietary technology will eventually be offered free open source. However, there is a commercial advantage to purchasing the bleeding edge features with a strong SLA and customer service
The mistake is to think technology is a business. Business has always been about business. Good technology reduces the cost of sale (CAC) and cost of goods sold (COGS) to create a 85-90% margin. Good technology does not create a moat
Resilient businesses do not rely on singular technology advantages. They invest heavily in long term R&D to stay ahead of EACH wave. Resting on one's laurels after catching a single wave, or sitting out of the competition because there will be bigger waves later, are both surefire ways to lose the competition
>Eventually, you’ll just need to connect a model to a computer to solve most problems - no complex engineering required.
The word "eventually" is doing a lot of work here. Yes, it's true in the abstract, but over what time horizon? We have to build products to solve today's problems with today's technology, not wait for the generalized model that can do everything but may be decades away.
True, but it tells that if you are a founder of a niche AI company then you should take money out of it instead of investing everything back into the company, because eventually the generalist-AI will destroy your business and you will be left with nothing.
Not if the generalist AI arrives after you have made your returns, which is the sentiment of the post you’re respond to.
Based on the author's company that be founded, I assume he believes this technology is just years away.
I think with a lot of AI folk in San Francisco, this is a tacit assumption when having these sorts of conversations.
Anyone that thinks this is just years away is utterly ignoring human history, nature, and relationship with technology. My own view is that this will never be achieved, and it's not even just about the tech.
Let's imagine for a moment that this is even achieved. Then, there is still complex engineering required in the world: to maintain and continually improve the AI engines and their interfaces. Unless you want to say that, past some point, the AI will be self-improving without any human input whatsoever. Unless the AI can read our minds, I'm not sure it can continue to serve human interests without human input.
But, never mind, we will never get there. At this very moment, tech is capable of so much more, but most sites I visit have bad UI, are bloated downloading and executing massive amounts of JS, riddled with annoying ads that serve no real useful purpose to society, and riddled with bugs. Even as an engineer, I really struggle to find any good no-code tools to create anything truly sophisticated without digging into hard-core code. Heck, they are now talking about adding more HTTP methods to HTML forms.
This might be true on a very long timescale, but that's not really relevant for VC's. Literally every single VC I've talked to raised the question if our moat is not just having better prompts, it's usually the first question. If a VC really invested in a company whose moat got evaporated by O1, that's on the VC. Everyone saw technology like O1 coming from a mile away.
For the slightly more complex stuff, sure at some point some general AI will probably be able to do it. But with two big caveats, the first being: when? and the second being: for how much?.
In theory every deep and wide enough neural network should be able to be trained to do object detection in images, yet no one is doing that. Technologies specifically designed to process images, like CNN's, reign supreme. Likewise for architectures of LLM's.
At some point your specialization might become obsolete, but that point might be a decade or more from now. Until then, specializations will have large economic and performance advantages making the advancements in AI today available to the industry of tomorrow.
I think it's the role of the VC to determine not if there's an AI breakthrough behind a startups technology, but if there's a market disruption and if that market disruption can be leveraged to establish a dominant company. Similar to how Google leveraged a small and easily replicable algorithmic advantage into becoming one of the most valuable companies on earth.
On your object detection point, Gemini 2.0 Flash has bounding box detection: https://ai.google.dev/gemini-api/docs/models/gemini-v2#bound....
I haven't found it to work particularly well for some more domain-specific things I tried, but it was surprisingly good for an LLM.
More computation cannot improve the quality or domain of data. Maybe the bitter lesson lesson is, lobby bitterly, for copyright laws that favor what you are doing, and weakened anti trust, to give you the insurmountable moat of exclusive data in a walled garden media network.
A human does not need billions of driving hours to learn how to drive competently. The issue with current method is not quality of data but methodology. More computation might unlock newer approaches that are better with less and worse quality data.
A human is not a blank slate. There's millennia of evolutionary history that goes into making a brain adapted and capable of learning from its environment.
A human is a mostly blank slate...but it's a really sophisticated slate that as you say has taken many millions of years of development.
I think there's a more fundamental problem at play here: what seems to work in 'AI', search, is made better by throwing more data into more compute. You then store the results in a model, that amounts to pre-computed solutions waiting for a problem. Interacting with the model is then asking questions and getting answers that hopefully fit your needs.
So, what we're doing on the whole seems to be a lot of coding and decoding, hoping that the data used in training can be adequately mapped to the problem domain realities. That would mean that the model you end up with is somehow a valid representation of some form of knowledge about the problem domain. Trouble is, more text won't yield higher and higher resolution of some representation of the problem domain. After some point, you start to introduce noise.
> A human does not need billions of driving hours to learn how to drive competently.
But humans DO need ~16 years of growth and development "to learn how to drive competently" and then will also know how to ride a bycycle, mow grass, build shelves, cook pizza, use a smart phone, ...! There's a lesson in that somewhere ....
You don't need the 16, you can get a much younger person to drive too. It only supports the fact that data amount/quality is not the problem.
You're forgetting a few billion years of evolution; we come with a fair amount of pretraining encoded in our DNA
You raise a fair point. But I think it still aligns with what I was going for, the millions of years in evolution are not more data on the same system but the system itself changing to be able to cope with limited data and novel scenarios.
Its also not given or constant, modern humans are super young in the evolutionary scale and very different even if very similar from other animals. Funny enough we might have cracked the difference (language processing) but are still very far in the rest for AGI.
It's not pretraining data that we have encoded in our DNA - it's an optimized learning/prediction architecture that is encoded ... how to build a brain that will then be able to learn efficiently.
The amount of knowledge/behavior that is innate - encoded in our DNA - is pretty minimal, consisting of things like opposite sex sexual attraction, fear of heights, fear of snakes, disgust at rotting smell, etc - basic survival stuff.
It is still years of continuous data input from a diverse sensory array
> It only supports the fact that data amount/quality is not the problem.
I agree with that. I.e., for
"drive competently."
the way humans learn it is more general, i.e., also brings the list I gave with bycycles, pizza, grass mowing, etc.
Also
"drive competently."
the way humans do it requires a lot of judgment, maturity which is not very well defined!
Guys, I don't understand how to make progress on AI. E.g., just asked X > Grok "similitude", and it gave a good answer, maybe like a long dictionary answer. But it looked like Grok had some some impressive plagiarism -- I can't tell the difference.
Yeah well. That was a bad analogy, and everyone I know who used to say that, admits error.
Great essay. This is 100% right about the technical side, but I think it misses the “product” aspect.
Building a quality product (AI or otherwise), involves design and data at all levels: UX, on-boarding, marketing, etc. The companies learning important verticals and getting in the door with customers will have a pretty huge advantage as models get better. Both in terms of install base, and knowing what customers need. Really great products don’t simply do what a customer asks, but are built by taking to a ton of customers over and over, and solving their problems better than any one of them can articulate.
It’s true we will need less and less custom software for problems. But it isn’t realistic to say the software wrapper effort is going to zero when models improve.
Plus: a lot of software effort is needed for getting the data AI needs. This is going to be a huge area - think Google maps with satellites, camera cars, network effect products (ratings), data collection (maps, traffic), etc.
This all sounds very similar to the hundreds of excel/spreadsheet "killers" out there.
But excel math is easy to outperform, ChatGPT etc wont be.
Sure. But it also sounds very similar to every successful database-backed company where each tenant has < 1m rows (almost all of them).
Right. It boils down to federated learning on edge to have access to relevant data.
I disagree with the author based on timelines and VC behaviour. There is sufficient time to create a product and raise massive capital before the next massive breakthrough hands the value back to OpenAI/Google/Anthropic/MS. Secondly the execution of a solution in a vertical is sufficiently differentiating that even if the underlying problem could be solved by a next gen model there's very little reason to believe it will be. Big Cos don't have the interest to attack niches while there are billion user markets to go after. So don't build "photo share with AI", build "plant fungus photo share for farmers".
Yeah I can't make this article work as a day to day advice piece. In the time it takes to get a generational change in AI from computational resources, we might find a worthy result for your PhD, business, drug design, or killer application.
It can be simultaneously true that a tech is doomed to obsolescence from over specialization, and that it does incredibly useful things in the mean time.
May be I'm being dumb but isn't the AI approach somewhat like brute force hacking of password? I mean humans don't learn that way - yes we do study code to be better coders but not in the way AI learning does?
Will someone truly discover a way to start from fundamentals and do better not just crunch zillions of petabytes faster and faster.
Or is that completely wrong? Happy to stand corrected
That's part of the problem though, isn't it? We still don't really understand how human intelligence works. We don't know how we learn. We don't even know where or how memories are stored.
We have ideas about how it works, sure. We know a bit about how the basics of the brain works and we know some correlations of what areas of the brain electrically light up in various conditions, but that's about the extent of it.
Its hard to really create an artificial version of something without pretty well understanding the real thing. In the meantime, brute forcing it is probably the best (and most common) approach.
Is it really the best approach though if we sink all this capital into it if it can never achieve AGI? It’s wildly expensive and if it doesn’t achieve all the lofty promises, it will be a large waste of resources IMO. I do think LLMs have use cases, but when I look at the current AI hype, the spend doesn’t match up with the returns. I think AI could achieve this, but not with a brute force like approach.
There's still even a more fundamental question before getting there, how are we defining AGI?
OpenAI defines it based on the economic value of output relative to humans. Historically it had a much less financially arrived definition and general expectation.
You really can't take anything OpenAI says about this kind of thing seriously at this point. It's all self-serving.
It's still important though, they are the ones many are expecting to lead the industry (whether that's an accurate expectation is surely up for debate).
Market will sort that out just like it did dotcom or tulip madness.
Another big push back is copyrighted content. Without proper revenue model how to pay for that?
That will also restrict what can be "learned". Already there's lawsuit, allegations of using pirated books etc
I'll be surprised if anything meaningful comes of those issues in the end.
Copyright issues here feel very similar to claims against Microsoft in the 80s and 90s.
Yes for sure AI or even basic computing prior to that has done wonders, chess being one example. Simply removing some of the issues with humans - inconsistency, errors due to mood or form etc.
I agree this approach is only viable one but I do hope the other way is also being tried. Who knows there may be a breakthrough. Like they try to create life from basic chemicals present in nature.
As long as people are not willing to acknowledge that we're not blank slates, but posses vast intelligence inherently, and that even the simplest life forms are infinitely more intelligent than LLMs, there won't be progress.
Is that something we can acknowledge without at least somewhat understanding how that works?
It sure seems like we have a lot of inherent knowledge, but by the book we're little more than the product of an instruction set that is our DNA and there's no explanation that includes inherited knowledge in that DNA.
I think the harder thing we need to acknowledge is that we understand much less than we think we know. Concepts like inherent, or collective, knowledge would all roll down hill from that.
I think the brilliant minds working on this know this. It shouldn't stop things just as we'll die one day doesn't stop accumulating wealth or working hard.
But at some point I hope there's breakthrough from any entirely different paradigm.
Yes humans are different learners. Its not a problem. At least we dont know if its a problem.
Regarding brute force, of course not. We are 2 years into chatgpt people are still thinking its just ngram statistical models smh. Dont listen to bad youtubers.... Not even ppl like 3b1b, sabine, thor, or primeagen. Why, when you can take it from so many ppl actually working in ai instead.
Anyway, yes chatgpt is huge. But if you use ngram statistical models like its 1980, you'd need the whole universe as a server and it still wont be as good. Big!=infinitely big. Its not 'brute force'.
Maybe its not what you meant. But i heard this a lot. Anyway, sorry for venting!
I've observed that every time OpenAI makes an announcement, 2 dozen startups die. The pace was quite rapid a year ago but has slowed down now. I've advised startups of this "bitter lesson". The value is in proprietary data and your ability to build a moat around domain expertise in a specific industry vertical.
I think this misses a crucial dynamic. It’s not “custom scaffolding vs. wait for better models”. There is also fine-tuning.
In specialized domains you can’t just rely on OpenAI finding all the training data required to render your experts obsolete.
If you can build a data flywheel you can fine-tune models and build that lead into a moat; whoever has the best training set will have the best product. In the short term you might start fine-tuning on OpenAI but once you’ve established your dataset you can potentially gain independence by moving onto OSS models.
If you are building a software assistant, sure, this is clearly something that OpenAI will get better at very quickly, and Altman has commented as much that many companies are building things which are certain to get steamrolled by core product development. But I think there is a very interesting strategic question around areas like law, medicine, and probably more so niche technical areas, where OpenAI will/can not absorb knowledge as quickly.
I think this is largely right. I look a this space as somebody that plugs together bits and pieces of software components for a living for a few decades. I don't need to deeply understand how each of those things work to be effective. I just need to know how to use them.
From that point of view, AI is more of the same. I've done a few things with the OpenAI apis. Easy work. There's not much to it. Scarily simple actually. With the right tools and frameworks we are talking a few lines of code mostly. The rest is just the usual window dressing you need to turn that into an app or service. And LLMs can generate a lot of that these days.
The worry for VC funded companies in this space is that a lot of stuff is becoming a commodity. For example, the llama and phi models are pretty decent. And you can run them yourself. Claude and OpenAI are a bit better and larger so you can't run them yourself. But increasingly those cheap models that you can run yourself are actually good enough for a lot of things. Model quality is a really hard to defend moat long term. Mostly the advantage is temporary. And most use cases don't actually need a best in class LLM.
So, I'm not a believer in the classic winner takes all approach here where one company turns into this trillion dollar behemoth and the rest of the industry pays the tax to that one company in perpetuity. I don't see that happening. The reality already is that the richest company in this space is selling hardware, not models. Nvidia has a nice (temporary) moat. The point of selling hardware is that you want many customers. Not just a few. And training requires more hardware than inference. So, Nvidia is rich because there are a lot of companies busy training models.
> So, I'm not a believer in the classic winner takes all approach here where one company turns into this trillion dollar behemoth and the rest of the industry pays the tax to that one company in perpetuity.
I agree with this sentiment. There are a lot of frontier model players that are very competent (OpenAI, Anthropic, Google, Amazon, DeepSeek, xAI) and I'm sure more will come onboard as we find ways to make models smaller and smaller.
The mental framework I try to use is that AI is this weird technology that is an enabler of a lot of downstream technology, with the best economic analogy being electricity. It'll change our society in very radical ways, but it's unclear who's going to make money off of it. In the electricity era Westinghouse and GE emerged as the behemoths because of their ability to manufacture massive turbines (which are the equivalent of today's NVIDIA and perhaps Google).
For me the issue has been that local models seem to be pretty bad (and it’s entirely possible it’s a user error on my end) compared to calling gpt-4o.
And even gpt-4o needs constant attention to ensure it doesn’t analyze input in a haphazard way.
I made a simple AI app for myself where I ran a bunch of recipes I’ve saved over the years through batch analysis. It tagged recipes with “kosher salt” as kosher lol. I had to baby sit the prompt for this simple pet project analysis for like 4 hours until I felt confident enough. And even then I was at maybe 60%.
Now imagine for a business application. This kind of incorrect analysis would be detrimental.
this is not only not true , it has no basis's in reality. In the "real" world there are tradeoff's and constraints. scaling does not work infinitely , and products which are delivered by good engneering cultures have a non linear growth ( bad vs very good ).
comming back to research one of the frontier model's deepseek was able to come close to SOTA with a relatively small budget because of one of their mixture of experts approach.
Yeah knowing that having infinite perfect data and infinite compute to train an end-to-end DNN is the way to solve anything is great, but not exactly helpful when you don't have either of those things. Not to mention having to actually deploy it on a low power system in the end.
> We’ve seen this pattern in speech recognition, computer chess, and computer vision.
I think the bitter lesson is true for AI researchers, but OP overstates it's relevance to actual products. For example the best chess engines are very much still very much specific to chess. They incorporate more neural networks now, but they are still quite specific.
The smarter AI app founders know that more advanced AI will easily replace their current tech. What they're trying to do now is lock in a userbase.
The thing is, what do we do with the bitter lesson once we’re essentially ingesting the entire internet? More computation runs the risk of overfitting and there just isn’t any more data. Is the bitter lesson here telling me that we’ve basically maxed out?
> More computation runs the risk of overfitting and there just isn’t any more data.
At this scale you can't overfit. The model might not improve in a meaningful way, but it can't overfit because the amount of data is much much larger that the size of the model.
That said, as the state of the art open models show, the way to get better models is not "use more data", but use "more high quality data" (e.g. look at the graphs here comparing the datasets: https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu). I don't know how much high quality data we can extract from the Internet (most of the data on the Internet is, as you'd guess, garbage, which is why aggressively filtering it can improve performance so much), but I'd wager we're still nowhere near running out.
There's also a ton of data that's not available on the public Internet in an easily scrapable form, or even at all; e.g. Anna's Archive apparently contains almost 1PB of data, and even that is (by their estimate) only ~5% of the world's books.
I'd think that this only means that a model cannot suffer from overfitting on average. So, it might totally have been overfitted on your specific problem.
We’re nowhere near ingesting the whole internet.
Though personally, I think we’re missing whatever architecture / mathematical breakthrough will make online learning (or even offline incremental, I.e. dreams) work.
At that point we could give the AI a robot body and train it of lived experience.
> "We’re nowhere near ingesting the whole internet."
We don't need to ingest the whole internet. I'd wager that upwards of 75% of the internet is spam, which would be useless for LLM training purposes. By the way, spam and useless information on the internet is only going to get worse, largely thanks to LLMs.
Only a subset of the internet contains "useful" information, an even a smaller subset contains information which is "clean enough" to be used for training purposes, and an even smaller subset can be legally scraped and used for training purposes.
It's highly likely that we've reached "peak training data" a long time ago, for many areas of knowledge and activities which are available on the internet.
With humans, while you can't read the whole internet you can maybe read everything in a narrow niche. Then the thing is to go out and do something or make something. Maybe that's the future for AI.
The current, intricate, post training / fine tuning going on in models like 0-1 is feature engineering.
One day LLMs may replace traditional search engines… in the meantime Google has built a $2 trillion business on specialized engineering and millions of human-designed features and optimizations.
The Bitter Lesson is an elegant asymptotic result. But from a business perspective it pays to distinguish problems that general deep learning approaches will disrupt in 1 year vs. 5 years vs. 30 years.
I agree the article presents good points specially if you consider current LLM models and the business models (and questionable business practices) being used.
It seems unlikely at this point, but it it might be the case that new methods, models and/or algorithms rise due to the trend of having high expectations/investments on this area
But yes, AI business stuff will be Swallowed by those companies as much as the companies that unwillingly are providing content to those companies.
I does feel like we are having the petfood.com moment in B2B AI. Bespoke solutions, bespoke offerings for very narrow B2B needs. Of course waiting around for AI to get so good that no bespoke solution is needed might be a bad strategy as well. I'm not sure how it will play out but I am certain there will be significant consolidation in the B2B agent space.
In the UK I bought most of my pet food from zooplus.co.uk. I don't think there's anything inherently wrong with the petfood.com idea, it's just that the best executing company will win as usual.
Correction, I meant pets.com. Liquidated in 2002 for $125K, while Amazon is worth $2T. My point, generally, is the market doesn't want overly specialized products.
OTOH, Amazon did start by focusing exclusively on mastering the online book market.
Of course, 01 is a lot more expensive than smaller models. So the startups that have spent time on tuning their prompts May yet get the last laugh, at least in any competitive market where customers are price sensitive. Bear in mind that supposedly OpenAI is losing money even at a $200 a month price point so it's unclear that the current cost structure of model access is genuinely sustainable.
The author appears to be confused about the difference between research and production. In research more generic approaches typically win because they get resourced much better (plus the ever-growing compute and data have helped, but there is no guarantee that these will continue).
On production side of "AI" (I don't love the term being thrown around this loosely as true AI should include planning, etc not just inference) the only question is how well do you solve the one problem that's in front of you. In most business usecases today that problem is narrow.
LLMs drive a minuscule (but growing) amount of value today. Recommender systems drive huge amount of value. Recommender systems are very specialized.
I think there is value in companies built around AI. The value comes from UX, proprietary supplementary datasets, and market capture. Businesses built now will be best positioned to take advantage of future improvements in general AI. That is why I am building in the AI space. I’m not naive to the predictable improvement in foundational models.
love it when a 25 year old founder says we have been here multiple times in the past
In which specific ways are they wrong, though?
Being able to learn from others' mistakes is generally considered to be a sign of aptitude.
They picked the wrong analogy, they misunderstand how much human tuning via specific data sets which show how to arrive at an answer (sound familiar to feature engineering?), they misunderstand how specific vertical solutions get.
Why not pick databases wrapped in a UI as an analogy? That analogy would go the other way - despite the fact it seems simple and sensible from an integration perspective to have a single database with all the UI a business would need, we have a bajillion specialist SAAS products which then jump through hoops to integrate with each other? Why? Because the workflows are so damned specific to the tasks at hand, some "generic" solution doesn't work.
Conceptually that seems right, but think there is a decade+ worth of runway left on working in the "glue" space. i.e. connecting the AI and our lived realities.
I don't see the general-ness of AI overcoming the challenges there...cause there is a lot of non-obvious messiness there.
It's endless. Hence the endless SaaS, despite it all react UI wrapping a postgres DB or similar.
So what is the appropriate course of action if you are a founder? Just wait until the models inevitably get better, or help a customer with their problem now?
Computer power has been increasing all the time, but that hasn't kept people from experimenting with the limited power of their time (which ultimately led to better solutions) rather than waiting for the more powerful machines.
Find a real problem and solve it cheaply, or solve it so much better than anyone else that your high price is worthwhile to enough people.
Don't focus on the method of solving the problem until you have identified the problem and all the current solutions and avoidances.
You can help the customer now, but you should keep in mind that the people working on the more general solutions will at some point take your moat.
E.g. you can build a drawing program where you can tell it in natural language to apply certain effects and make some changes. But another company might be working on a OS assistant that can help the user in any program currently running on the computer.
Or you can build a robot that can pick bolts from an assembly line, but at some point there will be a company building a cheap robot that can do anything from folding t-shirts to knitting a sweater.
I feel like the author glosses over some nuance in the point he is trying to make and his conclusion doesn't incorporate the nuance.
The nuance is that even though AI researchers have learned this lesson, they are still building purpose built AIs because it's not possible to build an AI that can learn to do anything (i.e. AGI). Therefore, building on top of an existing AI model to meet a vertical market demand is not that crazy.
It's the same risk when building on any software platform. That is the provider of the platform may add the feature/app you are building as a free addition and your business becomes obsolete.
And they are generating oodles of training data which... goes step by step on how to do specific tasks....
TFA admits that specialization can gain a temporary edge. But, says using that specialization is useless because the next gen will eclipse that edge using raw CPU.
Even if it is true that the next generational change in AI is based on computational improvements, how can it be true that it's hopeless to build products by specializing this generation of tech?
Moreover if I specialize gen 1, and gen 2 is similar enough, can't specialization maintain an edge?
There seems to be a timescale mismatch.
The main point here seems to be around boxing in the AI with a rigid set of rules, and then not being able to adapt to model improvements over time. I think if those "rules" are just in the prompt, you can still adapt, but if you're starting to code a lot of rule based logic in the product, you can get into trouble.
I'm not an AI founder, but I'd say there is still some value to be added by better architectures.
These AI wrappers run ship engines on a kayak. For example, most coding companion's ignore my code base the moment I close other files. Somehow their context is minimal. Such systems can be improved dramatically just by changing the software around the LLM.
But I get it, you have to be quick and don't waste time on MVPs. If you get critical mass, you can add that later... hopefully
In order to win you often have to start before the problem is fully solved and then count on it being solved better as you build and scale.
Thus starting with engineering effort to fit some of the AI limitations make sense but realize many of those will be temporary and will be replaced in the future.
But there is always a new tech or framework or something that emerges after you start and adopting it will improve your product measurably.
I think the goal is just to get as much funding from VCs and government rackets as possible.
The previous administration had Harris as the "AI czar". Which means that nothing was expected to happen.
The following administration has Sacks as the "Crypto & AI czar". I'm not aware that Sacks has any particular competence in the AI area, but he has connections. So government money is likely to flow, presumably to "defense" startups.
This All-In podcast has paid off: From peddling gut bacteria supplements and wrong explanations on how moderator rods work in nuclear power plants to major influence.
The bitter lesson is a powerful heuristic, but its adherents treat it as dogma and wield it as a club to win arguments.
There is a circular argument going on in this article. Basically:
> "Flexible is better. Why? Because specific has been consistently worse"
I mean, I don't deny the historical trend. But really, why is it so? It makes sense to follow the trend, but knowing more about the reason why would be cool.
Also, I feel that the human cognitive aspects of "engineering some shit" are being ignored.
People are engineering solutions not only to be efficient, but to get to specific vantage points in which they can see further. They do it so they can see what the "next gen flexible stuff" looks like before others.
Finally, it assumes the option to scale computation is always available and ignores the diminishing returns of trying to scale vanguard technology.
The scale requirements for AI stuff are getting silly real fast due to this unshakable belief in infinite scaling. To me, they're already too silly. Maybe we need to cool down, engineer some stuff and figure out where the comfortable treshold lies.
> But really, why is it so?
I don't think the bitter lesson is (or should be?) that you don't need to engineer biases in because with enough compute you could learn them instead ... The real lesson is that finding the right (minimal) set of biases is hard, and you should therefore always learn them instead if possible. Systems that learn are liable to outperform those that don't because by letting the data speak for itself, they are learning the right biases.
I think this is why historically the bitter lesson has been correct - because people have built too much into AI systems, got it wrong, and made them limited and brittle as a result.
The same thing is continuing to happen today with pre-trained LLMs (a fixed set of patterns!) and bespoke test-time reasoning approaches (of which at most one, likely none, are optimal). Of course it's understandable how we got here, even if a bit like the drunk looking for his lost car keys under the street lamp rather than in the dark spot he lost them. Continuous incremental learning (no training vs inference time distinction) is an unsolved problem, and the basis of intelligence/reasoning is not widely understood.
> finding the right (minimal) set of biases is hard
I'm not familiar with this concept of a minimal set of biases.
The way I see it, it's a series of loosely community-defined tresholds. If there's something like a theory that defines those biases in a formal way, I would very much like to read it.
The bitter lesson is a wonderful essay and well worth a read: http://www.incompleteideas.net/IncIdeas/BitterLesson.html
I also believe the original bitter lesson essay is correct. General methods on more powerful compute will supersede anything specific built today.
However I believe the lesson in this blog post is an incorrect conclusion. It is roughly analogous to "all people die, so do not bother having children because they will die one day." It is true that what you build today will be replaced in the future. That is OK. Don't get caught out when it happens. But building specific systems today is the only way to make progress with products.
Waymo started some 12 years ago as Google Chauffeur, built on all sorts of image/object recognition systems that I am sure have fallen to the bitter lesson. Should Google have refused to start a self-driving car project 12 years ago, because generalized transformers would replace a lot of their work in the future? No. You have to build products with what you've got, where you are. Adopt better tools when they come along.
> “The biggest lesson that can be read from 70 years of AI research is that general methods that leverage computation are ultimately the most effective, and by a large margin”.
> He points out that throughout AI’s history, researchers have repeatedly tried to improve systems by building in human domain knowledge. The “bitter” part of the title comes from what happens next: systems that simply use more computing power end up outperforming these carefully crafted solutions. We’ve seen this pattern in speech recognition, computer chess, and computer vision. If Sutton wrote his essay today, he’d likely add generative AI to that list. And he warns us: this pattern isn’t finished playing out.
According to Chomsky (recalled in relatively recent years for him) this is why he didn’t want to work on the AI/linguistics intersection when someone asked him in the mid 50’s. Because he thought that successful approaches would just use machine learning and have nothing to do with linguistics (that he cares about).
Seems intellectually dishonest (given how his theory of universal grammar is all about indispensible biological factors in human languages).
They will learn their lesson how exactly? By getting money for free from VCs?
Remember, most of these startups are grifters. Only few of them really believe in their products.
Cool article, a lot more technical and informational than I thought from the headline.
The article gave the TL;DR as below, for those who skip to the comments:
Historically, general approaches always win in AI.
Founders in AI application space now repeat the mistakes AI researchers made in the past.
Better AI models will enable general purpose AI applications. At the same time, the added value of the software around the AI model will diminish.
AI founders will also make sure that users - the ones that are not capable of telling if you're lying to them by taking their money and energy - will learn the bitter lesson of taking it right up the arse. Again. Just like a repetition of history. Humanity is like ever before caught in the action of serving the individuals that made us fuck ourselves over. And the only thing you do is thanking them for it by applying to their tactics. Middle class humanity and above is morally bankrupt, and AI is the next thing they will fight over and with, but of course without adding anything positive to the world.
I wish I knew what "AI" is.
Please correct me if I'm wrong, but to my understanding, every single one of these "AI" products is based on a model, i.e. it's just a normal program with a very big configuration file. The only thing "AI" about it is the machine learning performed when training the model. At the generation stage, there is nothing "AI" about it anymore. The AI is done already. What runs is a normal program.
I know it's used in computer vision, recommendation algorithms, and generative software like ChatGPT, Stable Diffusion, etc. I don't know if there is anything besides these.
From what I know, the biggest problems with AI is: 1) the program pretends to be intelligent, when it isn't, e.g. by using natural language. And 2) the program doesn't give the user enough control, they only get one text box prompt and that's it, no forms, no dialogs, so the product has to somehow generate the right answer from the most uncontrolled piece of data imaginable.
These two things combined not only limit the products potential by giving it unreasonable expectations but also makes it feel a bit of a scam: the product is a program, but what sells is its presentation: it has to present itself as being MORE than just a program by hiding as much of its internal workings from the consumers and covering its warts with additional layers of logic.
I don't know if the author would consider natural language to be a "hardcoding" create to temporarily solve a problem that should be solved by using more compute (AGI), but to me it feels like it is.
The best application of AI is probably going to be using AI internally to solve existing problems in a given domain your business is familiar with rather than try to come up with some AI solution that you have to sell other people.
I was going to keep this to myself to maintain a competitive advantage, but I will just drop two hints:
1) Have AI turn natural language interactions into programs
2) Use test-driven development on the domain data
There is a third thing that’s far more crucial but if you want to find out what it is, contact me. I’m building a general-purpose solution for a few domains.
Unlike the other two platforms I built (Web 2.0 [1] and Web 3.0 [2]) I am not planning to open-source this project. So I don’t want to discuss more. I will however say that open source is a huge game-changer — because organizations / customers want to be able to be in control and you don’t want to be reliant on some third-party platform for an API provider. We all know how well that one turned out for startups building on Web 2.0 platforms.
Isn't the premise of TDD to do that process in the opposite order?
Anyway, I don't follow what you're premise is about the relationship between "open source being a game changer," but then "customers want control," and then "startups are not successful on Web 2.0 platforms". Maybe you're trying to cram too much insight into one sentence and I'm not seeing what the thesis is
Basically, if you rely on some external computers owned by others to do your work (API, external platform, etc) they can pull the rug from under you, or change the rules, pricing etc. They can misuse your data and more.
That’s why I built these open source solutions, so our customers can know that it will never change, THEY choose the hosting, THEY own the data, they can even hire a different certified developer or agency to add features or maintain it. They have all the choice, and they have a direct relationship with all their members.
This assumes that the AI bros care about building useful products. They don't for the most part, they care only about building a startup that can plausibly get VC funding. So what if in five years your LLM on the blackchain app fails, you got to blow throw a few million dollars over a few years going around to conferences and living the high life.
That's a success in anyone's book.
thanks for the tl;dr
[dead]
Dunno why anyone in the startup space would be excited about new more powerful AI models unless they are just throwing a mask over their existential fears