DeepSeek-v3.2: Pushing the frontier of open large language models [pdf]

943 points by pretext a day ago

https://huggingface.co/deepseek-ai/DeepSeek-V3.2

https://api-docs.deepseek.com/news/news251201

Well props to them for continuing to improve, winning on cost-effectiveness, and continuing to publicly share their improvements. Hard not to root for them as a force to prevent an AI corporate monopoly/duopoly.

jstummbillig - a day ago

How could we judge if anyone is "winning" on cost-effectiveness, when we don't know what everyones profits/losses are?
- tedivm - a day ago
  
  If you're trying to build AI based applications you can and should compare the costs between vendor based solutions and hosting open models with your own hardware.
  On the hardware side you can run some benchmarks on the hardware (or use other people's benchmarks) and get an idea of the tokens/second you can get from the machine. Normalize this for your usage pattern (and do your best to implement batch processing where you are able to, which will save you money on both methods) and you have a basic idea of how much it would cost per token.
  Then you compare that to the cost of something like GPT5, which is a bit simpler because the cost per (million) token is something you can grab off of a website.
  You'd be surprised how much money running something like DeepSeek (or if you prefer a more established company, Qwen3) will save you over the cloud systems.
  That's just one factor though. Another is what hardware you can actually run things on. DeepSeek and Qwen will function on cheap GPUs that other models will simply choke on.
  - miki123211 - 21 hours ago
    
    > with your own hardware
    Or with somebody else's.
    If you don't have strict data residency requirements, and if you aren't doing this at an extremely large scale, doing it on somebody else's hardware makes much more economic sense.
    If you use MoE models (al modern >70B models are MoE), GPU utilization increases with batch size. If you don't have enough requests to keep GPUs properly fed 24/7, those GPUs will end up underutilized.
    Sometimes underutilization is okay, if your system needs to be airgapped for example, but that's not an economics discussion any more.
    Unlike e.g. video streaming workloads, LLMs can be hosted on the other side of the world from where the user is, and the difference is barely going to be noticeable. This means you can keep GPUs fed by bringing in workloads from other timezones when your cluster would otherwise be idle. Unless you're a large, worldwide organization, that is difficult to do if you're using your own hardware.
    
    embedding-shape - 8 hours ago
    
    > If you use MoE models (al modern >70B models are MoE), GPU utilization increases with batch size
    Isn't that true for any LLM, MoE or not? In fact, doesn't that apply to most concepts within ML, as long as it's possible to do batching at all, you can scale it up and utilize more of the GPU, until you saturate some part of the process.
  - AlexCoventry - a day ago
    
    Mixture-of-Expert models benefit from economies of scale, because they can process queries in parallel, and expect different queries to hit different experts at a given layer. This leads to higher utilization of GPU resources. So unless your application is already getting a lot of use, you're probably under-utilizing your hardware.
  - Muromec - a day ago
    
    >That's just one factor though. Another is what hardware you can actually run things on. DeepSeek and Qwen will function on cheap GPUs that other models will simply choke on.
    What's cheap nowdays? I'm out of the loop. Does anything ever run on integrated AMD that is Ryzen AI that comes in framework motherboards? Is under 1k americans cheap?
    
    GTP - 17 hours ago
    
    Not really in the loop either, but when Deepseek R1 was released, I sumbled upon this YouTube channel [1] that made local AI PC builds in the 1000-2000$ range. But he doesn't always use GPUs, maybe the cheaper builds were CPU plus a lot of RAM, I don't remember.
    [1] https://youtube.com/@digitalspaceport?si=NrZL7MNu80vvAshx
    
    District5524 - 13 hours ago
    
    Digital Spaceport is a really good channel, I second that - the author is not sparing any detail. The cheaper options always use CPU only, or sharding between different cheap GPUs (without SLI/switching) - which is not good for all use cases (he also highlights this). But some his prices are one-off bargains for used stuff. And RAM prices doubled this year, so you won't buy 2x256 GB DDR4 for $336, no matter what: https://digitalspaceport.com/500-deepseek-r1-671b-local-ai-s...
    
    baq - 13 hours ago
    
    'lots of RAM' got expensive lately -_-
  - chazeon - a day ago
    
    Well the seemingly cheap comes with significantly degraded performance, particular for agentic use. Have you tried replacing Claude Code with some locally deployed model, say, on 4090 or 5090? I have. It is not usable.
    
    nylonstrung - 20 hours ago
    
    Deepseek and Kimi both have great agentic performance
    When used with crush/opencode they are close to Claude performance.
    Nothing that runs on a 4090 would compete but Deepseek on openrouter is still 25x cheaper than claude
    
    Aeolun - 14 hours ago
    
    > Deepseek on openrouter is still 25x cheaper than claude
    Is it? Or only when you don’t factor in Claude cached context? I’ve consistently found it pointless to use open models because the price of the good ones is so close to cached context on Claude that I don’t need them.
    
    joefourier - 14 hours ago
    
    Deepseek via their API also has cached context, although the tokens/s was much lower than Claude when I tried it. But for background agents the price difference makes it absolutely worth it.
    
    ewoodrich - 9 hours ago
    
    Yes, if you try using Kilo Code/Cline via Openrouter the cost will be much cheaper using Deepseek/Kimi vs Claude Sonnet 4.5.
    
    elif - 11 hours ago
    
    Strictly speaking, you have not deployed any model on a 5090 because a 5090 card has never been produced.
    And without specifying your quantization level it's hard to know what you mean by "not usable"
    Anyway if you really wanted to try cheap distilled/quantized models locally you would be using used v100 Teslas and not 4 year old single chip gaming GPUs.
    
    __alexs - 9 hours ago
    
    Are you a time traveller from the past? https://www.nvidia.com/en-gb/geforce/graphics-cards/50-serie...
    
    matthewmacleod - 10 hours ago
    
    You can just buy a 5090 now for $3k. Have you confused it with something else?
    
    estsauver - a day ago
    
    Well, those are also extremely limited vram areas that wouldn't be able to run anything in the ~70b parameter space. (Can you run 30b even?)
    Things get a lot more easier at lower quantisation, higher parameter space, and there's a lot of people's whose jobs for AI are "Extract sentiment from text" or "bin into one of these 5 categories" where that's probably fine.
    
    JosephjackJR - 8 hours ago
    
    they took the already ridiculous v3.1 terminus model, added this new deepseek sparse attention thing, and suddenly it’s doing 128k context at basically half the inference cost of the old version with no measurable drop in reasoning or multilingual quality. like, imo gold medal level math and code, 100+ languages, all while sipping tokens at 14 cents per million input. that’s stupid cheap. the rl recipe they used this time also seems way more stable. no more endless repetition loops or random language switching you sometimes got with the earlier open models. it just works. what really got me is how fast the community moved. vllm support landed the same day, huggingface space was up in hours, and people are already fine-tuning it for agent stuff and long document reasoning. i’ve been playing with it locally and the speed jump on long prompts is night and day. feels like the gap to the closed frontier models just shrank again. anyone else tried it yet?
  - kmacdough - 8 hours ago
    
    Furthermore, paid models are heavily subsidized by bullish investors playing for monopoly. So that tips the scales further towards Deepseek.
  - qeternity - a day ago
    
    > DeepSeek and Qwen will function on cheap GPUs that other models will simply choke on.
    Uh, Deepseek will not (unless you are referring to one of their older R1 finetuned variants). But any flagship Deepseek model will require 16x A100/H100+ with NVL in FP8.
- ericskiff - a day ago
  
  I believe this was a statement on cost per token to us as consumers of the service
  - moffkalast - 16 hours ago
    
    Training cost-effectiveness doesn't matter for open models since someone else ate the cost. In this case, Chinese taxpayers.
    
    KvanteKat - 15 hours ago
    
    Deepseek is a private corporation funded by a hedge fund (High-Flyer). I doubt much public money was spent by the Chinese state on this. Like with LLMs in the US, the people paying for it so far are mainly investors who are betting on a return in the long to medium term.
    
    boringg - 8 hours ago
    
    Do you actually believe what you just wrote or are you trolling? One version at least has a foot planted in reality. The other one well...
- deaux - a day ago
  
  We can judge on inference cost because we do know what those are for open-weights models as there are a dozen independent providers that host these models and price them according to respective inference cost.
  We can't judge on training cost, that's true.
- mzl - 19 hours ago
  
  Apart from measuring prices from venture-backed providers which might or might not correlate with cost-effectiveness, I think the measures of intelligence per watt and intelligence per joule from https://arxiv.org/abs/2511.07885 is very interesting.
- stingraycharles - a day ago
  
  You can use tokens/sec on something like AWS Bedrock (which hosts both open and closed models) as a proxy for “costs per token” for the closed providers.
- badmonster - 17 hours ago
  
  Good point. Could usage patterns + inference costs give us proxy metrics? What would be a fair baseline?
- rowanG077 - a day ago
  
  Well consumers care about the cost to them, and those we know. And deepseek is destroying everything in that department.
  - eru - a day ago
    
    Yes. Though we don't know for sure whether that's because they actually have lower costs, or whether it's just the Chinese taxpayer being forced to serve us a treat.
    
    chronogram - 21 hours ago
    
    Third party providers are still cheap though. The closed models are the ones where you can't see the real cost to running them.
    
    eru - 19 hours ago
    
    Oh, I was mostly talking about the Chinese taxpayer footing the training bill.
    You are right that we can directly observe the cost of inference for open models.
    
    rescbr - 12 hours ago
    
    Not sure the Chinese taxpayer is footing the bill though - of course, it might not be net zero, there might be secondary effects, etc.
    A few days ago I read an article saying the Chinese utilities have a pricing structure that favors high-tech industries (say, an AI data center), making the difference by charging more the energy-intensive but less sophisticated industries (an aluminium smelter, for example).
    Admittedly, there are some advantages when you do central and long-term economic planning.
srameshc - a day ago

As much I agree with your sentiment, but I doubt the intention is singular.
- energy123 - 21 hours ago
  
  It's like AMD open-sourcing FSR or Meta open-sourcing Llama. It's good for us, but it's nothing more than a situational and temporary alignment of self-interest with the public good. When the tables turn (they become the best instead of 4th best, or AMD develops the best upscaler, etc), the decision that aligns with self-interest will change, and people will start complaining that they've lost their moral compass.
  - orbital-decay - 21 hours ago
    
    >situational and temporary alignment of self-interest with the public good
    That's how it supposed to work.
  - re-thc - 21 hours ago
    
    It's not. This isn't about competition in a company sense but sanctions and wider macro issues.
    
    energy123 - 21 hours ago
    
    It's like it in the sense that it's done because it aligns with self-interest. Even if the nature of that self-interest differs.
- twelvechairs - a day ago
  
  The bar is incredibly low considering what OpenAI has done as a "not for profit"
  - kopirgan - a day ago
    
    You need get a bunch of accountants to agree on what's profit first..
    
    komali2 - a day ago
    
    Agree against their best interest, mind you!
- echelon - a day ago
  
  I don't care if this kills Google and OpenAI.
  I hope it does, though I'm doubtful because distribution is important. You can't beat "ChatGPT" as a brand in laypeople's minds (unless perhaps you give them a massive "Temu: Shop Like A Billionaire" commercial campaign).
  Closed source AI is almost by design morphing into an industrial, infrastructure-heavy rocket science that commoners can't keep up with. The companies pushing it are building an industry we can't participate or share in. They're cordoning off areas of tech and staking ground for themselves. It's placing a steep fence around tech.
  I hope every such closed source AI effort is met with equivalent open source and that the investments made into closed AI go to zero.
  The most likely outcome is that Google, OpenAI, and Anthropic win and every other "lab"-shaped company dies an expensive death. RunwayML spent hundreds of millions and they're barely noticeable now.
  These open source models hasten the deaths of the second tier also-ran companies. As much as I hope for dents in the big three, I'm doubtful.
  - raw_anon_1111 - a day ago
    
    I can’t think of a single company I’ve worked with as a consultant that I could convince to use DeepSeek because of its ties with China even if I explained that it was hosted on AWS and none of the information would go to China.
    Even when the technical people understood that, it would be too much of a political quagmire within their company when it became known to the higher ups. It just isn’t worth the political capital.
    They would feel the same way about using xAI or maybe even Facebook models.
    
    JSR_FDED - a day ago
    
    AirBnB is all in on DeepSeek and Qwen.
    https://sg.finance.yahoo.com/news/airbnb-picks-alibabas-qwen...
    
    raw_anon_1111 - a day ago
    
    TIL: That Chinese models are considered better at multiple languages than non Chinese models.
    
    tayo42 - 19 hours ago
    
    It's a customer service bot? And Airbnb is a vacation home booking site. It's pretty inconsequential
    
    antonvs - 13 hours ago
    
    Airbnb has ~$12 bn annual revenue, and is a counterexample to the idea that no companies can be "convinced to use DeepSeek".
    The fact that it's customer service means it's dealing with text entered by customers, which has privacy and other consequences.
    So no, it's not "pretty inconsequential". Many more companies fit a profile like that than whatever arbitrary criteria you might have in mind for "consequential".
    
    StealthyStart - a day ago
    
    This is the real cause. At the enterprise level, trust outweighs cost. My company hires agencies and consultants who provide the same advice as our internal team; this is not to imply that our internal team is incorrect; rather, there is credibility that if something goes wrong, the decision consequences can be shifted, and there is a reason why companies continue to hire the same four consulting firms. It's trust, whether it's real or perceived.
    
    raw_anon_1111 - a day ago
    
    I have seen it much more nuanced than that.
    2020 - I was a mid level (L5) cloud consultant at AWS with only two years of total AWS experience and that was only at a small startup before then. Yet every customer took my (what in hindsight might not have been the best) advice all of the time without questioning it as long as it met their business goals. Just because I had @amazon.com as my email address.
    Late 2023 - I was the subject matter expert in a niche of a niche in AWS that the customer focused on and it was still almost impossible to get someone to listen to a consultant from a shitty third rate consulting company.
    2025 - I left the shitty consulting company last year after only a year and now work for one with a much better reputation and I have a better title “staff consultant”. I also play the game and be sure to mention that I’m former “AWS ProServe” when I’m doing introductions. Now people listen to me again.
    
    0xWTF - a day ago
    
    Children do the same thing intuitively: parents continually complain that their children don't listen to them. But as soon as someone else tells them to "cover their nose", "chew with their mouth closed", "don't run with scissors", whatever, they listen and integrate that guidance into their behavior. What's harder to observe is all the external guidance they get that they don't integrate until their parents tell them. It's internal vs external validation.
    
    raw_anon_1111 - a day ago
    
    Or in many cases they go over to their grandparents house and they let them run wild and all of the sudden your parents have “McDonald’s money” for their grandkids when they never had it for you.
    
    - a day ago
    
    [deleted]
    
    coliveira - a day ago
    
    So much worse for American companies. This only means that they will be uncompetitive with similar companies that use models with realistic costs.
    
    raw_anon_1111 - a day ago
    
    I can’t think of a single major US company that is big internationally that is competing on price.
    
    ipaddr - a day ago
    
    Any car company. Uber.
    All tech companies offering free services.
    
    raw_anon_1111 - a day ago
    
    Is a “cheaper” service going to come along and upend Google or Facebook?
    I’m not saying this to insult the technical capabilities of Uber. But it doesn’t have the economics that most tech companies have - high fixed costs and very low marginal costs. Uber has high marginal costs saving a little on inference isn’t going to make a difference.
    
    jamiek88 - a day ago
    
    What American car company competes overseas on price?