DeepMind and OpenAI win gold at ICPC

codeforces.com

156 points by notemap 7 hours ago


https://x.com/MostafaRohani/status/1968360976379703569

https://x.com/HengTze/status/1968359525339246825

amluto - 4 hours ago

I've contemplated this a bit, and I think I have a bit of an unconventional take:

First, this is really impressive.

Second, with that out of the way, these models are not playing the same game as the human contestants, in at least two major regards. First, and quite obviously, they have massive amounts of compute power, which is kind of like giving a human team a week instead of five hours. But the models that are competing have absolutely massive memorization capacity, whereas the teams are allowed to bring a 25-page PDF with them and they need to manually transcribe anything from that PDF that they actually want to use in a submission.

I think that, if you gave me the ability to search the pre-contest Internet and a week to prepare my submissions, I would be kind of embarrassed if I didn't get gold, and I'd find the contest to be rather less interesting than I would find the real thing.

modeless - 6 hours ago

More information on OpenAI's result (which seems better than DeepMind's) from the X thread:

> our OpenAI reasoning system got a perfect score of 12/12

> For 11 of the 12 problems, the system’s first answer was correct. For the hardest problem, it succeeded on the 9th submission. Notably, the best human team achieved 11/12.

> We had both GPT-5 and an experimental reasoning model generating solutions, and the experimental reasoning model selecting which solutions to submit. GPT-5 answered 11 correctly, and the last (and most difficult problem) was solved by the experimental reasoning model.

I'm assuming that "GPT-5" here is a version with the same model weights but higher compute limits than even GPT-5 Pro, with many instances working in parallel, and some specific scaffolding and prompts. Still, extremely impressive to outperform the best human team. The stat I'd really like to see is how much money it would cost to get this result using their API (with a realistic cost for the "experimental reasoning model").

JohnKemeny - 6 hours ago

I went to ICPC's web pages, downloaded the first problem (problem A) and gave it to GPT-5, asking it for code to solve it (stating it was a problem from a recent competitive programming contest).

It thought for 7m 53s and gave as reply

    # placeholder
    # (No solution provided)
birktj - 6 hours ago

They apparently managed gold in the IOI as well. A result that was extremely surprising for me and causes me to rethink a lot of assumptions I have about current LLMs. Unfortunately there was very little transparency on how they managed those results and the only source was a Twitter post. I want to know if there was any third party oversight, what kind of compute they used, how much power what kind of models and how they were set up? In this case I see that DeepMind at least has a blog post, but as far as I can see it does not answer any of my questions.

I think this is huge news, and I cannot imagine anything other than models with this capability having a massive impact all over the world. It causes me to be more worried than excited, it is very hard to tell what this will lead which is probably what makes it scary for me.

However with so little transparency from these companies and extreme financial pressure to perform well in these contests, I have to be quite sceptical of how truthful these results are. If true I think it is really remarkable, but I really want some more solid proof before I change my worldview.

smokel - 4 hours ago

The best thing of the ICPC is the first C, which stands for "collegiate". It means that you get to solve a set of problems with three persons, but with only one computer.

This means that you have to be smart about who is going to spend time coding, thinking, or debugging. The time pressure is intense, and it really is a team sport.

It's also extra fun if one of the team members prefers a Dvorak keyboard layout and vi, and the others do not.

I wonder how three different AI vendors would cooperate. It would probably lift reinforcement learning to the next level.

patrickhogan1 - an hour ago

This is impressive.

Here is the published 2025 ICPC World Finals problemset. The "Time limit: X seconds" printed on each ICPC World Finals problem is the maximum runtime your program is allowed. If any judged run of your program takes longer than that, the submission fails, even if other runs finish in time.

https://worldfinals.icpc.global/problems/2025/finals/problem...

NitpickLawyer - 7 hours ago

So this year SotA models have gotten gold at IMO, IoI, ICPC and beat 9/10 humans in that atcoder thing that tested optimisation problems. Yet the most reposted headlines and rethoric is "wall this", "stangation that", "model regression", "winter", "bubble", doom etc.

HarHarVeryFunny - 4 hours ago

ICPC = The International Collegiate Programming Contest. These are college level programmers, not elite competitive programmers.

Apparently Gemini solved one problem (running on who knows what kind of cluster) by burning 30 min of "thinking" time on it, and at a cost that Google have declined to provide.

According to one prior competition paricipant, writing in the comments section of this ArsClasica coverage, each year they include one "time sink" problem that smart humans will avoid until they have tackled everything else.

https://arstechnica.com/google/2025/09/google-gemini-earns-g...

This would all seem to put a rather different spin on this. It's not a case of Google outwitting the worlds best programmers, but rather that by searching for solutions for 30 min on god knows what kind of cloud hardware, they were able to get something done that the college kids did not have time to complete, or deem worthwhile starting.

Imnimo - 3 hours ago

My understanding is that the way they do this is have some number of model instances generating solution proposals, and then another model which chooses which candidates to submit.

I haven't been able to find information on how many proposals were generated before a solution was chosen to submit. I'm curious to know whether this is "you can get ICPC gold medal performance with a handful of GPT-5 instances" or "you will drown yourself in API credit debt if you try this".

Still extremely impressive either way.

ferguess_k - 6 hours ago

I think in the future information will be more walled -- because AI companies are not paying anyone for that piece of information, and I encourage everyone to put their knowledge on their own website, and for each page, put up a few urls that humans won't be able to find (but can still click if he knows where to find), but can be crawled by AI, which link to pages containing falsified information (such as, oh the information on url blah is actually incorrect, here you can find the correct version, with all those explanations, blah blah -- but of course page blah is the only correct version).

Essentially, we need to poison AI in all possible ways, without impacting human reading. They either have to hire more humans to filter the information, or hire more humans to improve the crawlers.

Or we can simply stop sharing knowledge. I'm fine with it, TBF.

ototot - 6 hours ago

Given that ICPC problems are in general easier than IOI problems. I wouldn't be surprise to see they can get Gold (even perfect scores) in ICPC.

Nonetheless, I'm still questioning what's the cost and how long it would take for us to be able to access these models.

Still great work, but it's less useful if the cost is actually higher than hiring someone with the same level.

jaggs - 6 hours ago

I think it's becoming clear that these mega AI corps are juggling with their models at inference time to produce unrealistically good results. By that it seems that they're just cranking up the compute beyond reasonable levels in order to gain PR points against each other.

The fact is most ordinary mortals never get access to a fraction of that kind of power, which explains the commonly reported issues with AI models failing to complete even rudimentary tasks. It's now turned into a whole marketing circus (maybe to justify these ludicrous billion-dollar valuations?).

ChrisArchitect - 6 hours ago

Sharing links to a couple of tweets is not a blog post.

Google source post: https://deepmind.google/discover/blog/gemini-achieves-gold-l... (https://news.ycombinator.com/item?id=45278480)

OpenAI tweet: https://x.com/OpenAI/status/1968368133024231902 (https://news.ycombinator.com/item?id=45279514)

d--b - 4 hours ago

Two words: Uh oh

antegamisou - 3 hours ago

Make that shit cure cancer/disease and abstain from that modern Space race equivalent BS ffs.

huflungdung - 3 hours ago

[dead]

bgwalter - 5 hours ago

A database is good at leetcode, who would have thought. Give humans a database and they'll outperform your "AI" (which probably uses an extraordinary amount of graphics cards and electricity).

It is an idiotic benchmark, in line with the rest of the "AI" propaganda.

sameermanek - 3 hours ago

Whats the point? These models are still unreliable in every day work. And they're getting fat! For a moment, they were getting cheaper, but now they are only getting bigger and this is not going to be cheap in the future. The point is, what are we investing a trillion dollars in?

z7 - 2 hours ago

Current cope collection:

- It's not a fair match, these models have more compute and memory than humans

- Contestants weren't really elite, they're just college level programmers, not the world's best

- This doesn't matter for the real world, competitive programming is very different from regular software engineering

- It's marketing, they're just cranking up the compute to unrealistic levels to gain PR points

- It's brute force, not intelligence