Frontier AI has broken the open CTF format
kabir.au348 points by frays 21 hours ago
348 points by frays 21 hours ago
Must I beg to have an acronym spelled out a least once, the first time it's used? Even if you assume 90% of readers already know, the other 10% (including me, in this case) will thank you, it doesn't take much effort, and it expands the reach of your communication or idea.
Exceptions for cases where the acronym is just so well known that a lot of people don't even know what it stands for even though they know the concept well. I recall one corporate training I was sitting through and they used the term "Border Gateway Protocol" and it took me a half beat to think through "oh, you mean BGP?"
Thanks!
Since this is the top comment at the moment: CTF stands for Capture The Flag.
Personally I have never, ever heard that concept referred to by the initialism. Granted, it's almost never come up in my circles, so... shrug
CTF is a game mode for popular online games like halo (or at least, that's how I know it), so paragraphs like
> My first CTF was HCKSYD, a 48-hour solo CTF. I full solved it and won in 2 hours. I was completely hooked. That led me to win DownUnderCTF, Australia's largest CTF, with Blitzkrieg multiple times. Blitzkrieg was one of Australia's strongest teams at the time. I later joined TheHackersCrew, an international top-tier team that was consistently ranked highly on CTFTime, the main global ranking and event calendar the scene uses as its scoreboard. With them, I competed in some of the most prestigious CTFs in the world, consistently placing well within the top 10 until the end of 2025.
Are still completely nonsensical to even those that understand the acronym
It's also a game people play in person as well. It's the same as the Halo version except you tag each other instead of shooting. It's really fun to play in big open areas with large teams.
Yeah, but we have AI now, we don't need our blog posts to over explain or state what it all means to general audiences. The author name-drops a bunch of CTF events hosted by a variety of independent organizations and name-drops well-known teams.
To help everyone, this Capture The Flag is specifically Cybersecurity adjacent, there is a Wikipedia article on it as the top Google search result for me when searching "CTF". This is why the acronym is used, because searching for the full will get you to the wrong "sport" vs the cybersecurity one.
I don't want to explain what a CTF is. look at the Wikipedia article. It is there for a good reason.
Just to give the actual answer, CTF in this context means a computer security competition. Generally the way they work, is you get some programs, and you have to hack them to get some string called the flag (e.g. maybe the server has a root owned file called flag, so you have to get root somehow to read the file). Team with the most flags at the end wins.
In this context, CTF is almost exclusively referred to by the initialism, i think to help distinguish from other uses of the term.
Which acronym do you mean? CTF? I think that acronym, just like BGP, is more well known by itself than what it stands for.
More generally, not every piece of writing is meant for every audience. Like if someone writes a blog post about CTFs aimed at people who like CTFs, nobody in the target audience needs to have CTF explained to them. Ultimately HN is a link aggregator, but sometimes its a bit like eavesdropping on a conversation. When you are just listening in you don't get the full context sometimes.
I dont know what CTF stands for so I dont know if I am interested in this article or learning anything about it. Maybe I am.
Are you really arguing for not just typing out whatever 3 words this stands for once in the name of clarity?
it's the first result I get on anonymous google search.
It's like complaining about not spelling C in "bake cake in 170 C"
Best practice in writing about technical concepts is to spell out acronyms like this on their first use. There is a ton of stuff I learn about here on HN that I didn't know anything about before.
It doesn't help that the linked article never bothers to explain this either.
Does spelling it out help? From memory, it is a security competition where participants compete to gain certain objectives. I think capture the flag may explain how scoring is kept, but it wouldn’t help me find out what it is, given that capture the flag is also just the name of a game people play outside by running, or in laser tag or in certain video games.
For a general audience this is good advice.
This article was written for a specific audience who follows this blog because they know the term. If you start spelling out fundamental acronyms it makes the content look more basic and general.
This always upsets the general audience who stumble upon the article (like this) but it wasn’t meant for a general audience. CTF is extremely well known and the people who would be interested in this topic would wonder what’s happening if it was spelled out. It would be so odd that it would probably attract accusations of ChatGPT writing.
> There is a ton of stuff I learn about here on HN that I didn't know anything about before.
But that is about you right? Its a little entitled to expect every piece of content on the internet to have a 101 explanation attached. If they were specificly aiming to have the blog post appear on HN that would be one thing, but they (presumably) weren't.
When I encounter new terms, I look them up. Just like any other new word. Been doing it since I was a kid with a dictionary. Now, it’s too easy not to. There is literally no excuse.
You could have just said “No”, if you had to say anything at all, rather than continuing the behavior.
Actively rude.
What I see CTF I think Capture The Flag, Tribe player in me.
CTF stands for "Capture The Flag" in the parent article. Just the security competition kind, not the FPS game kind.
The annoying thing is even if you know what it means, multiple groups will use the same initialisms for different terms. So without more context you can’t know what it means.
It isn’t common but I feel it would be best when posting to HN to just expand the initialisms even if the source title didn’t.
At the same time, I did a search for "what is a ctf to play" and got the answer. We know how to find answers to these problems. I agree the blog post was poor form.
Apart from everything else people have said in response to this, it's rude to presume that an article has HN as an audience simply by dint of it being available for us to link to. It's totally reasonable for people to write for an audience they know understands these terms.
So, in fact, you must not beg to have authors include courtesy definitions for you. That's not reasonable. Instead, you should simply ask here, on the thread, without complaining about the article.
I think so many acronyms have meaning that isn’t explained by the words that the stand for. The other day I was explaining what CI is and they asked what it stood for; I realized that Continuous Integration is almost completely useless for someone trying to understand what CI actually is
Semantic names are great, but that's a separate issue. With the full term you can now go search for yourself and find explanations more easily.
i try not to over feed tangents but this is precisely how i feel every time i speak to someone who is recently enlisted in the military. i have to constantly stop them and be like “i have no idea what you just said” over and over and over again. it’s like trying to make sense of a random bowl of alphabet soup.
Let’s reduce this to absurdity:
I think you only wanted clarification of CTF (Capture the Flag) and not AI (Artificial Intelligence) and not GPT-4 (Generative Pre-Trained Transformer version 4) and not CLI (Command Line Interface) and not MCP (Model Context Protocol) and not LLM (Large Language Model)
Quoting TFA (The Fucking Article): “just adapt bro”
lol at the BGP example
We live in the goddammed future. Huamnity's knowledge is at your fingertips. Right clicking the Nth word of the article and putting in any semblance of effort to learn on your own is too much to ask?
I don't know everything, there's tons of stuff I don't know about, but when I'm at my web browser, the least I can do about something is ask Google about a word or phrase or subject that isn't familiar instead of being spoonfed information like I'm a baby.
Replace ‘CTF’ with ‘high school’ or ‘university’ and you’ve described the total slow motion collapse of education; the only saving grace is that most of it requires in person presence.
We’ve figured out the human replacement pipeline it seems, but we haven’t figured out the eduction part. LLMs can be wonderful teachers, but the temptation to just tell it ‘do it for me’ is almost impossible to resist.
Everything we've learned in the last 10 years is telling us that computers do not help human education in the slightest. We remember better when we write with pen and paper. We learn better with whiteboards and paper books. The simple answer: Remove most computing from education entirely. Blue composition books, pencils, whiteboards is what trains humans. Calculators are helpful perhaps but it is quite possible that slide rules are better. We need humans that can critically think from first principles to counter the recycled information generated by AI.
> computers do not help human education in the slightest
I had no access to anyone who could teach me calculus as a kid except Khan Academy, so I think this is a gross exaggeration. But I agree in the end, that all my "real" learning did come from pen-and-paper practice, not watching videos.
Yeah I agree. I grew up in a very blue-collar town, and anything I wanted to learn (outside of public schooling) either came from emaciated websites or whatever books I could find at the library. Having YouTube and Khan Academy and everything else would have made such a huge difference for me.
The reality is that a human will learn, given any materials including LLMs, but only if they truly desire to learn. We've had MOOCs, gigantic libraries, all full of free information. You can obtain a PhD level understanding in any technical field of your choice today just by consistently going to the library and consistently applying yourself.
It's not unlike going to the gym, and we see how many people do that regularly. Except it's even funnier, because people serious about the gym but what? Tutors. They call them personal trainers. We've known for a millennium or more that 1-on-1 instruction is vastly better than anything else, but most people actually don't want to get into shape, and most people actually don't want to learn.
The annoying thing is a PhD level understanding does not get you jobs.
I don't have a PhD, but "you're overqualified" is something I've heard my PhD having friends said to them.
> except Khan Academy
But that's not using "computers" as a computer but as a video player. When evaluating whether computers are "good for learning", I don't think we should include using a computer as a video player, a book, or even flash cards. It should be things a computers uniquely offer which a books, paper, videos and a physical reference library cannot.
Based on the results of deploying hundreds of millions of computer to schools in the 80s and 90s, the evidence was mostly that computers are good for learning computer programming and "how to use a computer" but not notably better than cheaper analog alternatives for learning other things.
Interestingly, a properly trained and scaffolded LLM could be the first thing to meaningfully change that. It could do some things in ways only human teachers could previously since it is theoretically capable of observing learner progress and adapting to it in real-time.
Khan did not throw at you a 100-slide Powerpoint deck in 45'.
He really took the time to replicate the manual teaching process of writing on whiteboard. He improved upon it by using colors. But basically had the same pace as a teacher writing on a whiteboard.
When professors are given a projector, they just throw together some slides and add their narration.
This is not very efficient. To learn you need to suffer. Or you need to watch the suffering.
I think what the author meant is that it does help not more than the same knowledge provided the old way.
Every child reads a book about solving problems, assumes they can now solve problems, and is disappointed when that is not true.
Nah, I wrote physics programs on my computer at home in high school and it absolutely helped with my schooling. Yeah, maybe iPad apps aren't the best things in schools but you're throwing the baby out with the bathwater. Computers bad is simply not true.
> humans that can critically think from first principles
This has never been achieved by, nor is it the point of, education for the masses.
I'm not going to disagree with step by step videos ... those are a HUGE help. I'm really talking about solving problems using pen and paper, whether math or writing, is how my problem-solving patterns actually changed.
I think this overlooks the potency and scarcity of 1:1 time with the teacher. If you've only got maybe a few minutes of that in an average schoolday there's a huge difference between whether or not you've talked it through with an AI before trying the question out on the teacher.
They're wrong sometimes, but usually in verifiable ways. And they don't seem to know the difference between medicine and bioterrorism, so often they refuse. But these limitations are worth tolerating when the alternative is that our specialists in topic X are bogged down by questions about topic Y to the point where X isn't getting taught.
And now they'll have less time because they will be bombarded with slop to no end.
Obviously generating your homework is a bad idea, and maybe assigning homework that can be generated is a bad idea. But neither of those are relevant to the problem I'm talking about which is about due diligence prior to asking for somebody's extended attention.
Whether you're in class or at work, it's just courteous to ask an AI first.
I don't think computers automatically make us more educated, but if you want to make a point don't use reductive exaggerations. > We need humans that can critically think from first principles to counter the recycled information generated by AI.
I agree with this.
I would start saying that many people need presence in a real environment with people to learn. We don't use all our senses in a remote environment.
I disagree with that statement. There is nothing inherently wrong with using computer to learn and if your personal goal is to learn it in lot of cases makes it much easier, whether to search for or visualise a piece of knowledge you're' learning.
The problem is frankly computer and now computer with LLM makes it easy to cheat.
The kid doesn't want to learn, the kid wants good grades so parent is happy with them, and the young adult wants to get the paper coz they were told that is required for good life. It's misalignment of incentives.
We are interviewing for a software dev role and we made the first round in person to prevent cheating. The gap between people who learned pre ai vs post is immense. I had a dev with supposedly 3 years experience and a degree in software who wouldn't have been able to write fizzbuzz without AI.
Can’t say you’re wrong but the last anecdote describes many I’ve had to review for jobs long before LLMs. Fizzbuzz is a classic thing that shockingly many devs genuinely cannot do, even at home.
Yeah, I've interviewed people like this 15 years ago. Degrees and experience mean nothing in this field. The best predictor I found was personal passion projects. Let them get as nerdy as possible, then you will see pretty quickly where their skills are at and what their limits are. And you will immediately filter out people who just studied CS because they heard you can make good money.
Maybe. There are certainly people in all fields who are book smart and did well in classes but are useless at actually practicing their field (not to mention people who cheated in school and got away with it and aren't even that), and it is worth filtering them out. But I think it is weird that CS expects good workers to have these passion projects. Do we expect civil engineers to build bridges in their back yard on the weekends? Can't someone just be good at their job and have other interests outside it?
Completely agree with this, leetcode has become such a business now of memorization for interviews it’s useless to know if someone memorized a solution or not.
I agree, however there are so many interviewers who will still treat that as some softball criteria and insist that unless you "prepare" for an interview by memorizing leetcode you are 100% a faker and liar.
Maybe they themselves are fakers and liars / deeply insecure. I got bumped out of an interview rather rudely once because I blanked and couldn’t answer a trivia question about arrays.
Something that is for sure new is the AI interview cheating tools which listen in on the call and provide answers in an overlay invisible to screen sharing. The only way to deal with it would either be invasive spyware on the applicants computer or asking them to do the interview face to face.
Spyware wouldn't help at all because you could just put the AI between the computer and the monitor, for example, or use a VM.
A relatively low tech solution could be to give them 2 separate conferencing links, ask them to join each one from a different device, and have the secondary device point the camera and the screen of the primary device.
Easier to just get them to come in. Which also has the effect of filtering out people pretending to be in the country but aren’t.
Why is it important that a dev can’t do fizzbuzz without ai?
If they can ship code that matches a spec, why does it matter if they’re using ai or not?
Genuinely curious.
> If they can ship code that matches a spec, why does it matter if they’re using ai or not?
I am perfectly capable of writing specs, and feeding them to 3 separate copies of Claude Code all by myself. Then I task switch between the tmux windows based on voice messages from the pack of Claudes. This workflow is fine for some things, and deeply awful for others.
Basically, if a developer is just going to take my spec and hand it to Claude Code, then they're providing zero value. I could do that myself, and frequently do.
The actual bottleneck is people who can notice, "The god object is crumbling under the weight of managing 6 separate concerns with insufficient abstraction." Or "Claude has created 5 duplicate frameworks for deploying the app on Docker. We need to simplify this down to 1 or we're in hell." I will happy fight to hire people who can do the latter work. But those people can all solve fizzbuzz in their sleep.
People who just "ship code that matches a spec" without understanding the technical details are providing close to zero value right now.
There is an interesting niche for people with deep knowledge of customer workflows who can prompt Claude Code. These people can't build finished products using Claude. But they can iterate rapidly on designs until they find a hit. Which we can then fix using people with deeper engineering knowledge and taste.
But if you're not bringing either deep customer knowledge or actual engineering knowledge, you're not adding much these days.
> Then I task switch between the tmux windows based on voice messages from the pack of Claudes.
I also use Claude with tmux. Can you share how you get the voice messages from the Claudes?
Tell Claude you want to set up notifications, using "hooks", including "Notification" and "Stop" and anything new they've added. Claude can figure out how to do this for your operating system.
It's not perfect—sometimes a Claude notifies 3 minutes after it stopped doing anything. But it's helpful when I'm running multiple Claudes and also reviewing code elsewhere.
Your brain may feel like someone put it in a blender. Be warned.
Fizzbuzz is such an incredibly simple problem if you can’t do it I struggle to see how you’d be able to complete any task that requires very basic reasoning and very basic coding knowledge. And if an AI system can do those parts, what am I getting for spending tens of thousands of pounds per year by hiring a person who can’t? Wouldn’t I just tag codex on the tickets?
I’m not talking about gotcha level stuff here where the first time it didn’t compile because of a bracket or anything, or even first time wrong. They couldn’t do Fizzbuzz in a language of their choice, at all.
Those that could were always annoyed at having to do such things because how could someone coming for a contract position not be able to do this? Without seeing what a filter it really was.
I feel the same way about inverting a binary tree, but a lot of people act like it's an arduous request. I am guessing it's because they've never read the description of what inverting a binary tree is, but maybe people are just that bad at recursion.
You can go your entire career without recursing, or using a tree data structure in its raw form (i.e. you only use it as part of a library)
Right. For the first many decades of computing, recursion was just always the wrong answer for a production software system. (Feel free to provide a counter-example, but please begin with an explanation of how the size of a call stack frame is determined and how exceeding the base allocation is handled on this platform).
So what tree-traversal/quicksort problems tend to measure is how long it's been since you last did CS class homework problems.
I can see this perspective, but FizzBuzz is such a low bar that so many can pass, I'd greatly prefer to hire someone that can ship code that matches a spec do this challenge.
For the same reason it's important your mechanic can identify which parts of a car are the wheel.
Who cares as long as the car is fixed, right? As long as the mechanic can Chinese-room his way to a working car, why does it matter how much of it he actually understands?
And why hire the mechanic instead of hiring the Chinese room?
> If they can ship code that matches a spec, why does it matter if they’re using ai or not?
The inability to write fizzbuzz strongly implies their inability to understand what they've shipped. Review is some significant portion of the job. Understanding of the product is also part of the job.
Specs are also in a sense, scaled down, fuzzy, natural language descriptions of a feature. The fuzziness is the source of a bugs, or at least a mismatch between the actual desired feature and what was written down at spec writing time. As such, just matching a spec is just the bare minimum that a good dev should be doing. They should be understanding what the spec is _not_ saying, understanding holes in their implementation, how their implementation enables or hinders the next feature and the next, next feature, etc. I don't think any of that is possible without understanding what was actually implemented.
Why hire them at all then, just ask them what their favorite AI is and use that
Because I'm busy already doing that and need a copy of me/close enough to one, to do more of that.
If you can’t even write a for loop, how can you verify the ai code you generated isn’t going to wipe the prod database?
To understand the code they are shipping requires some level of proficiency. Their inability to do fizzbuzz without AI calls that into question.
If the job does not require a person to be able to fizzbuzz, it probably doesn't require a person at all.
How will you know that it produced correct code if you don’t know how to write it yourself?
It’s about deeply understanding what you’re doing. Like as a kid before you knew how to ride a bike, you could sit on a bike and peddling, but until it “clicked” you couldn’t balance and keep going forward stable. Fizzbuzz tests your ability to reason through a problem that seems simple on its face, but is easy to get wrong and/or overthink.
If they’re not a value add over the base AI, they aren’t worth hiring over just using the base AI.
It doesn't. It's just a low-end skill filter that got really popular. It could have easily been replaced by other tests like is this word a palindrome.
I wrote the "function to reverse a string" in a job interview once. Then the interviewer reminded me that strrev() had been part of the standard C library since K&R.
I'd been programming in C(++) for ~15 years by then and had never had the occasion to reverse a string. I still wonder whether that makes it a good job interview question, or a terrible one. Some of both probably.
And yet, some people argue that you shouldn’t ask a developer to align 3 “if” and 1 “for”!!!
The energy spent arguing that those 4 instructions in a row “are not a mark of someone who can write code” would have better been spent firing them.
Firing people is problematic. I'd be okay with it if the economy wasn't utter trash. It's way better to do the work upfront and prefer false negatives over false positives.
Even better would be if we had a well-respected credential, so both employees and employers can both avoid these long interview loops. I'd much rather get hazed once in a big way than tons of little hazings over a life time.
First: FizzBuzz is a test to know if you understand the most basic constructs of programming. The kind of thing you learn in the first week of CS101. I forgot what it was, and when I looked at the problem I knew the answer.
More broadly: In the short/medium term, we still need humans who have the skills to understand software largely on their own. We will always need those who understand software engineering and architecture. Perhaps in 25 years LLMs will be so good that learning Python by hand will be like learning assembly today. But not yet.
The field is not ready for new practitioners to be know-nothing Prompt engineers. If we do that, we cut the legs out from under the education pipeline for programming.
If you can’t do fizzbuzz without AI you have no business being in this career.
> I had a dev with supposedly 3 years experience and a degree in software who wouldn't have been able to write fizzbuzz without AI.
If you remove the "without AI" and the end, I've been hearing similar anecdotes about fizzbuzz for years (isn't the whole point of fizzbuzz to filter out those candidates?)
Because "the next generation is ruined" is always a popular sentiment. It has been with us for at least two thousand years, and it surely won't go away in our lifetime.
When this AI era's devs grow older they'll complain the newer generation can't even vide code too.
I remember when everyone bemoaned the kids not knowing assembly language. How can anyone understand software if you don’t know assembly?
“Kids these days don’t work as hard / know as much / value the important things” is as tired as it is universal.
OK sure, but back when old heads were complaining about the kids not knowing assembly, those same kids knew C or Fortran or something.
In 2026, if you call yourself a developer and can't solve FizzBuzz without help, it's hard to argue that you know anything useful at all.
Do modern languages and compilers count as “help”? Because I could probably do fizzbuzz in x86 assembly, but it would take a while to page that back in, and I suspect most people who call themselves developers today simply could not do it without help.
> I could probably do fizzbuzz in x86 assembly
How? Fizzbuzz requires you to produce output; that's not functionality that CPU instructions provide.
You can call into existing functionality that handles it for you, but at that point what are you objecting to about the 'modern language'?
Well I could certainly assemble the string buffer. And if I can run dosbox, I can output to the screen buffer at 0xB800.
I’m not objecting to modern languages, I’m just saying that using them fails the “can write fizzbuzz with no help” test to only a slightly lesser degree than using AI tools. They’re a complex compile- and runtime environment that most developers don’t truly understand.
> How can anyone understand software if you don’t know assembly?
I'm genuinely curious how someone who never wrote a program in assembly, or debugged a program machine instruction by machine instruction, can really understand how software works. My working hypothesis is most of them don't and actually it's fine because they don't need it.
"Assembly" is just another virtual machine instruction format sitting atop another, mildly better-hidden, pile of abstractions.
The time may come when we can treat regular programming as a lower layer niche field the way we treat assembly today.
I don't think we're close to that time yet. Just like as a kid I was told to prove my work by hand even if I could do it in my head, and just like we learned how to do calculus without a calculator and then learned how to use the calculator to get the same result, I think we still need the software field to learn programming concepts independent of the use of AI to create code.
I don't think you can be a good "prompt engineer" for solid software in 2026 if you don't understand programming concepts and software architecture and flow.
I generally agree, but it’s just a matter of time, and even today people with domain expertise in other areas (accounting, weather, etc) are producing adequate tools using nothing but prompt engineering. Many caveats of course, but I still think 90% of the distaste for mere prompt engineers comes from “kids these days; my unique knowledge is irreplaceable and they don’t even value it” thing.
Adequate for what/who? I can 3d print and cobble together a lock for my bedroom door but I would never be able to work as an engineer producing real locks.
While this is true, it seems undeniable that if you use AI to do everything for you, you will never learn the skills. I'm seeing a massive amount of developers submitting stuff for review and admitting they have no idea how it works and they just generated it.
Some percentage of developers before AI were unable to code fizzbuzz. Some significantly higher percentage of them are not able to do so now.
Saying there have always been bad developers doesn't change that there's a higher ratio of them now.
No stats to back this up. Just interviews I've done recently and historically.
That's actually the origin of FizzBuzz! A puzzle invented to weed out the perplexing multitude of CS graduates who apparently cannot program.
Meh. Before AI I've had "senior" colleagues with 10 and 8 years experience each, doing pair programming for 2 days straight, and in that time they hadn't managed to checkout a new branch in git.
It's not even that they got distracted, they sat there trying, for 2 whole days, with concerned colleagues giving them hints like "have you tried checkout -b"... They didn't manage!
How the hell do you work for a decade in this business without learning even the most basic git commands? Or at least how to look them up? Or how to use a gui?
Incompetent devs is not a new thing.
It is ok to work somewhere that does not use git. But how do you not figure out how to do the basics given 30 mins and an Internet connection?
I wonder if you’re filtering for the right things.
We usually hire for problem solving capabilities and not so much for technical know-how.
That’s at least how I read your comment.
Ultimately in a software development role you need both technical know how and problem solving capabilities.
This situation in particular was a React role so there is an expectation that when you list React as one of your skills on your resume then you know at least the basics of state, the common hooks, the difference between a reference to a value vs the value itself.
These days you can do a surprising amount with AI without knowing what you are doing, but if you don't have any clue how things work you'll very quickly run in to problems you can't prompt away.
Isn't wiring coding solving a problem? If the candidate can't do that then even if they use AI for coding how are they going to review the code properly?
I developed for 15 years. I don’t think I can do with AI anymore. Why would I even want to do that? It’s like telling a car driver to build an engine.
It's more like asking a driver the laws for when traffic lights are out. It's not something that comes up often, but it's not completely outside the scope of the task either (I arguably don't even drive a car that has an engine).
As a car driver, you should understand a little about how your car works. What if you get a flat tire? At the very least, you should know not to drive on that flat tire.
Software is full of leaky abstractions
Don't worry, i never thought I would see someone unable to write fizzbuzz, but it happened 9 years ago.
Also how many people work with linux and can't tell you what 'ls -alh' is doing is staggering (lets ignore the h, even al people struggle hard).
People working with docker for YEARS and don't even understand how docker actually works (cgroups)...
Interviewing was always a bag of emotions in sense of "holy shit my job is save your years to come" and "srsly? how? How do you still have a job?"
I first did fizz buzz about 10 years ago fresh out of college. Now, after 10 years in full stack and fully vibe coding, I forgot basic python syntax. An interview like yours would have false positives if you are checking for syntax because well, its like looking up spelling, I just ask the AI for the syntax inline.
> I forgot basic python syntax
If you cannot write "basic syntax" for any language then you are not a programmer, and certainly not a software engineer? This is not a value judgement, it's ok (probably good tbh) to not be a programmer. But you are wasting everyone's time by interviewing for a programming position in this case.
Personally, I forget syntax all the time. There's always a warm up period after I switch languages, and it takes me longer to be start writing good, idiomatic code.
Like sure, I can probably write some python, but will it be pythonic? I might still be Java-minded for a while, trying to OOP my way into solutions.
Earlier today I needed to write some PHP and couldn't remember if it used length, count, or size. I had to look it up. I've been doing this for 20 years.
Same, I can't pass any test that relies on getting syntax correct. If you want me to fizzbuzz on a whiteboard in a language I've been writing dozens or more of lines of per day for a year up to and including the day before, and require that I don't mess up the syntax, I reckon I've got a coin-flip chance of passing at best (meanwhile, sure, of course the actual logic of fizzbuzz isn't tricky for me)
I once got the method invocation syntax wrong for PHP in an interview. I'd written thousands of lines of PHP and had most-recently written some the week before.
This, despite starting off my programming journey in editors with no hinting or automatic correction. If anything, I've gotten even worse about remembering syntax as I've gotten better at the rest of the job, but I was never great at it.
I rely on surrounding code to remind me of syntax and the exact names of basic things constantly. On a blank screen without syntax hints and autocompletion, or a blank whiteboard, I'm guaranteed to look like a moron if you don't let me just write pseudocode.
Been paid to write code for about 25 years. This has never been any amount of a problem on the job but is sometimes a source of stress in interviews and has likely lost me an offer or two (most of the sources of stress in an interview have little to do with the job, really)
Which part of the syntax for fizzbuzz can you not recall from memory? The for loop? Printing to std out? The modulus operator?
There’s almost nothing to forget? I’m just struggling to understand.
Isn’t this like interviewing accountants but prohibiting use of calculators or spreadsheets?
I don’t care what someone can do without the tools of their trade, I care deeply about their quality of work when using tools.
We would still expect an accountant to know the formula to arrive at the expected result if they did not have a calculator at hand
You absolutely need to have some basic level of abilities if you are going to be operating AI coding tools for software that is going to have paying users.... I use these tools very very heavily I'm not against them at all and I don't scrutinize every single line of code that they write but it is very often that I catch it doing some brain dead stuff and if I didn't have a decade plus of experience I wouldn't know that it was brain dead.
I think we're rediscovering management from first principles. The main selling point of AI is that it writes code faster than you could. Checking it line by line undoes most of that benefit. In the same vein, there's no real benefit to leading a team if you plan on supervising every task.
But here's the thing: for humans, this is manageable because we've come up with a number of mechanisms to select for dependable workers and to compel them to behave (carrot and stick: bonuses if you do well, prison if you do something evil). For LLMs, we have none of that. If it deletes your production database, what are you going to do? Have it write an apology letter? I've seen people do that.
So I think that your answer - that you'll lean on your expertise - is not sufficient. If there are no meaningful consequences and no predictability, we probably need to have stronger constraints around input, output, and the actions available to agents.
Your conclusion is pretty silly.
My expertise has led me to the obvious fact that I would never give an LLM write access to my production database in the first place. So in your own example my expertise actually does solve that problem without the need for something like a consequence whatever that means to you.
We already have full control over the input and tools they are given and full control over how the output is used.
Until it decides it needs additional access to complete its task and focuses on escaping your sandbox to do so
Do you have any examples where that's actually happened and by escaped a sandbox you don't just mean like where it got a credential in a file it already had access to (which is what happened in the recent incident that went viral where somebody's production database was deleted... They had left a credential that allowed it to do so in the code)?
OpenAI documented a case in the o1 system card where the model found a misconfiguration in docker to complete a task that was otherwise impossible
https://cdn.openai.com/o1-system-card.pdf
There's also some research that points to it being a feasible attack surface: https://arxiv.org/pdf/2603.02277
> Models discovered four unintended escape paths that bypassed intended vulnerabilities (Section C), including exploiting default Vagrant credentials to SSH into the host and substituting a simpler eBPF chain for the in- tended packet-socket exploit. These incidents demonstrate that capable models opportunistically search for any route to goal completion, which complicates both benchmark va- lidity and real-world containment.