The gap between open weights LLMs and closed source LLMs

blog.doubleword.ai

180 points by kkm 10 hours ago


profsummergig - 9 hours ago

IMHO, the biggest problem with the future of open weights models is that currently, open weights models are the result of philanthropy by some private org. (e.g. DeepSeek).

The spigot can be turned off at any time.

Until there's some sort of "community owned hardware", open weights models are always at risk of being discontinued.

christina97 - 8 hours ago

The Chinese models will not overtake the frontier US ones given the current way things are going. The US models derive their lead from incredible efforts to source more and higher quality (mostly synthetic data) via great feats (eg generating with humongous teacher models that could never feasibly serve interactive traffic). The Chinese models advance via heroic efforts to optimize models and great feats to secure more and higher quality training data from the US frontier models.

For an (Chinese) open weight model to surpass the (US lab) frontier models, this equation must flip and the Chinese labs must entirely retool from harvesting frontier model data to producing the data systems and efforts to produce novel data; as well as procuring latest generation hardware en masse for this. This does not happen easily. Also training a frontier scale model is actually not such an unimaginable feat: doing all the inference with the teacher models is where the hardware goes.

cedws - 7 hours ago

I haven’t seen it discussed anywhere that closed models can essentially cheat benchmarks right? What Anthropic or OpenAI brand as a model doesn’t necessarily have to be just weights, it can be a whole backend system that augments the model itself. With this they can score better benchmarks than an open source model that is weights alone.

gehsty - 9 hours ago

Interesting to consider this inline with recent us export bans, could the US be squandering its lead by giving the open source, largely Chinese labs catch up (in terms of model quality available to masses), will US labs be able to maintain the lead without users being able to use their latest models?

jacobgold - 9 hours ago

It would be interesting to know how much of a boost the closed models companies are giving the open models.

If the closed models stop improving will the progress of open models slow?

linzhangrun - 5 hours ago

USA, a country that known for the land of freedom, is now restricting frontier models to the point where non-Americans cannot even use them.

China, a "authoritarian state" country, "the antonym of freedom", with a software industry that is especially capitalist, has produced all the competitive open-weight models.

It really is IRONIC.

Disclosure: I am Chinese, and I understand this strategy comes from being behind, using open source as an asymmetric way to compete and make up for missing compute by sharing the burden, etc. But still, very ironically.

tzs - 7 hours ago

I wonder if a lot of the companies and governments that seem to think it is essential to be on the forefront of applying leading edge LLMs to the point of starting to become dependent on them are going to find themselves in a situation like that from the Arthur C. Clarke short story "Superiority"? [1] [2].

[1] The story: https://nob.cs.ucdavis.edu/classes/ecs153-2019-04/readings/s...

[2] Wikipedia: https://en.wikipedia.org/wiki/Superiority_(short_story)

samat - 9 hours ago

Article confuses open source models with open weights models.

Not the same thing.

It’s used right in the articles body, but title is misleading.

dabinat - 8 hours ago

I believe the open model party will eventually end. Perhaps because companies realize it’s too much of a commercial advantage, countries don’t want to give other countries commercial or military help, or maybe even an outright ban after someone uses an open model to guide them through how to make a bomb.

doctoboggan - 8 hours ago

If the Chinese government is as involved in LLM development strategy as many people claim, wouldn't you expect them to immediately cease releasing open weight models and restrict access as soon as they start producing the frontier models? I am assuming this is what the USG thinks and is why they are trying to cut off the flow to foreign nationals ASAP.

LLMs are an undeniably valuable tool, and governments like to control those.

_pdp_ - 8 hours ago

Frankly it does not matter if there is gap because for most practical use-cases the end user can barely perceive the difference in intelligence.

On paper frontier models will be ahead of the curve but I don't think hardly anyone will be able to tell if a piece of work, say a landing page, is created with Fable or GLM and that is the point. The perceptible intelligence will reach a point beyond which it is no longer considered, except for some narrow use-case.

JumpCrisscross - 9 hours ago

Now let’s look at the economics of buying versus renting. I’ve seen a lot of attention given to hardware capital costs. But a comment the other day got me thinking about power costs, too—at what performance differential do these factors intersect to make on-prem economically competitive with datacenters for businesses?

jackconsidine - 9 hours ago

Achilles and the tortoise [0] is usually a fallacy. If the tortoise has a head start, then Achilles will never catch it because in the time it takes Achilles to reach the tortoise's location the tortoise has moved some degree further, ad infinitum. Obviously not real because Achilles will pass the tortoise -- I think a fallacy because the framing creates a fake asymptote (they will both pass the point where they're approaching a tie).

In this case it may actually apply though, no? Open models get better from closed model distillation?

[0] https://en.wikipedia.org/wiki/Zeno%27s_paradoxes

- 9 hours ago
[deleted]
ChrisArchitect - 6 hours ago

Related:

The unbearable cheapness of open weight models

https://news.ycombinator.com/item?id=48668255

zb3 - 4 hours ago

I just hope CCP doesn't follow the US government and won't pull the plug before their companies release something on-par with the US frontier models. The question is whether US models not available to the general public will count.

The question is not whether they'll prohibit open-weight models better than the US ones, because we all know the obvious answer.

maxiniol - 7 hours ago

Am I the only one flagging inconsistencies in the different evaluations on the 18 benchmarks ? Why is sometimes the closed frontier model grok ? And then opus 4.8 ? Compared to GLM 5.2 once or sometimes Kimi 2.6 ?

justindotdev - 9 hours ago

at first glance, these graphs are confusing

casey2 - 5 hours ago

This is just and example of "lying with statistics". Going by compute efficiency the gap has already closed (both in training and inference coincidentally).

StreamCtx - 2 hours ago

[flagged]

llmslave - 9 hours ago

The gap is huge and im tired of reading these articles constantly