GPT-OSS 120B Runs at 3000 tokens/sec on Cerebras

cerebras.ai

46 points by samspenc 3 days ago


KronisLV - 3 days ago

The Cerebras GML-4.6 post might also be of (some?/more?) interest to the people here, since it's more useful for programming: https://news.ycombinator.com/item?id=45852751

I don't think that this is a dupe or anything and 3000 t/s is really cool, the other post just has more discussion of Cerebras and people's experiences with using GLM 4.6 for software development.

petesergeant - 3 days ago

It’s an absolute beast. I run it via OpenRouter, where I have Groq and Cerebras as the providers. Cheap enough as to be almost free, strong performance, and lightning fast.

drewbitt - 2 days ago

It's a decent general model too - I have it plugged up in llm and raycast since August at great speeds. I wish Cerebras would do MiniMax M2 which should be an upgrade and replacement if it was just faster. It would never be as fast as gpt-oss-120 though

freak42 - 3 days ago

I absolutely hate it, when a website says "try this" and after you went through the trouble of weiting something comes up with a sign up link first. Makes me leave instantly to never come back.

sunpazed - 3 days ago

This is really impressive. At these speeds, it’s possible to run agents with multi-tool turns within seconds. Consider it a feature rich, “non-deterministic API” for your platform or business.

iFire - 2 days ago

Does anyone know how much one system costs?