OpenAI unveils its first custom chip, built by Broadcom

800 points by jamdesk a day ago

Announcement: https://openai.com/index/openai-broadcom-jalapeno-inference-...

https://decrypt.co/371971/openai-broadcom-jalapeno-first-cus...

https://www.cnn.com/2026/06/24/tech/openai-broadcom-jalapeno...

> Developed from design to production in nine months, accelerated by OpenAI’s models

> the use of OpenAI models to accelerate parts of the design and optimization process.

I wish there was more about this. As is I kind of have to assume that this is just meaningless marketing, like saying development was accelerated by Microsoft Office or their 5k LG Ultrafine 40-inch monitors.

Like, if this was as big a deal as it kind of vaguely implies, they would be making a bigger deal of it, right?

zgao - a day ago

Chip CEO here. It really depends on what "design" or "production" means. Does "design" mean that the design was complete? Does "production" mean the beginning of production, i.e. tapeout? If measuring from RTL-freeze to tapeout, this is a fairly typical (even somewhat unimpressive) timeline (accounting for some unexpected issues) for a large, complex 3nm chip. If measuring from concept (no RTL at all, block diagram of architecture) to tapeout, this is an amazing timeline. The truth is probably somewhere in between. A more concrete statement would use actual technical milestones and gates.
- otterdude - a day ago
  
  Not a chip CEO, but I read this article and thought that they're working on some kind of application specific chip only for serving models. Similar to how an FPGA can optimize certain tasks.
  Given constant weights / biases of a Transformer / DNN you could use pipelining to feed forward calculations through the array one layer at a time. For DNN's with thousands of layers you might see 1:1 speed up per layer channel.
  I doubt they would undergo this process for marginal gains.
  - kmacdough - 13 hours ago
    
    With a striking lack of numbers, I'm not confident. I my experience, everything underspecified in a marketing release is unflattering. They're also not a chip designing company, but they're probably trying to keep up on the eyes of investors. As the article mentions, several of their competitors are chip designers and already have working procuction inference chips.
    
    SwellJoe - 13 hours ago
    
    When you have a few billion dollars you can hire chip people and partner with a chip company.
    That's not to say I expect they'll ship something competitive with Google's custom AI hardware on the first go, since Google has been at it for quite a while, but there's very few technical problems large sums of money won't solve.
    
    IX-103 - 10 hours ago
    
    Yeah, I'm not sure how competitive it is without any specs. Just from it being "inference only" that puts it on the same level as Google's 2015 TPUv1.
  - zgao - a day ago
    
    Yes, my statement was not about the quality or performance of the chip -- simply the tapeout timeline that was stated, by itself.
  - xdavidliu - a day ago
    
    i don't understand what the second paragraph is saying.
    
    nine_k - a day ago
    
    In very crude terms, AFAICT, if you have a bunch of matrix multiplications, but one of matrices (the one with model weights) doesn't change, you can seriously speed up the computation. One thing is that you don't need to re-fetch the elements of the constant matrix, you can keep it near the ALUs. Then you maybe can detect and ignore sparse / empty blocks by marking them once.
    IDK how the custom hardware exploits this; would love to hear any ideas!
    
    guyomes - a day ago
    
    > IDK how the custom hardware exploits this; would love to hear any ideas!
    You might like this article [1], titled "FPGA-based CNN Acceleration using Pattern-Aware Pruning". More context and details can be found in the PhD thesis of Léo Pradels [2].
    [1]: https://inria.hal.science/hal-04689673/document
    [2]: https://theses.hal.science/tel-05021575v1/file/PRADELS_Leo.p...
    
    cm2187 - a day ago
    
    Random thought. Once models stabilise, could you possibly hardcode the model in gates? Or are they too large for a single chip?
    
    8note - a day ago
    
    https://www.anuragk.com/blog/posts/Taalas.html
    
    lsaferite - 18 hours ago
    
    https://taalas.com/
    
    jwHollister - 17 hours ago
    
    wow if they can get something like this working, what happens to all this infrastructure? Hyperscalers have to be assuming the lifespan of that stuff wrong considering the next gen will be 1000x more efficient.
    
    otterley - 16 hours ago
    
    The question isn’t whether it works (it does); the question is whether there are buyers for hardware that is obsolete the day it ships. Models evolve much more quickly than hardware can keep up.