Cursor IDE support hallucinates lockout policy, causes user cancellations
old.reddit.com1478 points by scaredpelican 5 days ago
1478 points by scaredpelican 5 days ago
Earlier today Cursor, the magical AI-powered IDE started kicking users off when they logged in from multiple machines.
Like,you’d be working on your desktop, switch to your laptop, and all of a sudden you're forcibly logged out. No warning, no notification, just gone.
Naturally, people thought this was a new policy.
So they asked support.
And here’s where it gets batshit: Cursor has a support email, so users emailed them to find out. The support peson told everyone this was “expected behavior” under their new login policy.
One problem. There was no support team, it was an AI designed to 'mimic human responses'
That answer, totally made up by the bot, spread like wildfire.
Users assumed it was real (because why wouldn’t they? It's their own support system lol), and within hours the community was in revolt. Dozens of users publicly canceled their subscriptions, myself included. Multi-device workflows are table stakes for devs, and if you're going to pull something that disruptive, you'd at least expect a changelog entry or smth.
Nope.
And just as people started comparing notes and figuring out that the story didn’t quite add up… the main Reddit thread got locked. Then deleted. Like, no public resolution, no real response, just silence.
To be clear: this wasn’t an actual policy change, just a backend session bug, and a hallucinated excuse from a support bot that somehow did more damage than the bug itself.
But at that point, it didn’t matter. People were already gone.
Honestly one of the most surreal product screwups I’ve seen in a while. Not because they made a mistake, but because the AI support system invented a lie, and nobody caught it until the userbase imploded.
There is a certain amount of irony that people try really hard to say that hallucinations are not a big problem anymore and then a company that would benefit from that narrative gets directly hurt by it. Which of course they are going to try to brush it all away. Better than admitting that this problem very much still exists and isn’t going away anytime soon. https://www.anthropic.com/research/tracing-thoughts-language... The section about hallucinations is deeply relevant. Namely, Claude sometimes provides a plausible but incorrect chain-of-thought reasoning when its “true” computational path isn’t available. The model genuinely believes it’s giving a correct reasoning chain, but the interpretability microscope reveals it is constructing symbolic arguments backward from a conclusion. https://en.wikipedia.org/wiki/On_Bullshit This empirically confirms the “theory of bullshit” as a category distinct from lying. It suggests that “truth” emerges secondarily to symbolic coherence and plausibility. This means knowledge itself is fundamentally symbolic-social, not merely correspondence to external fact. Knowledge emerges from symbolic coherence, linguistic agreement, and social plausibility rather than purely from logical coherence or factual correctness. While some of what you say is an interesting thought experiment, I think the second half of this argument has, as you'd put it, a low symbolic coherence and low plausibility. Recognizing the relevance of coherence and plausibility does not need to imply that other aspects are any less relevant. Redefining truth merely because coherence is important and sometimes misinterpreted is not at all reasonable. Logically, a falsehood can validly be derived from assumptions when those assumptions are false. That simple reasoning step alone is sufficient to explain how a coherent-looking reasoning chain can result in incorrect conclusions. Also, there are other ways a coherent-looking reasoning chain can fail. What you're saying is just not a convincing argument that we need to redefine what truth is. For this to be true everyone must be logically on the same page. They must share the same axioms. Everyone must be operating off the same data and must not make mistakes or have bias evaluating it. Otherwise inevitably sometimes people will arrive at conflicting truths. In reality it’s messy and not possible with 100% certainty to discern falsehoods and truthoods. Our scientific method does a pretty good job. But it’s not perfect. You can’t retcon reality and say “well retrospectively we know what happened and one side was just wrong”. That’s called history. It’s not useful or practical working definition of truth when trying to evaluate your possible actions (individually, communally, socially, etc) and make a decision in the moment. I don’t think it’s accurate to say that we want to redefine truth. I think more accurately truth has inconvenient limitations and it’s arguably really nice most of the time to ignore them. > Knowledge emerges from symbolic coherence, linguistic agreement, and social plausibility rather than purely from logical coherence or factual correctness. This just seems like a redefinition of the word "knowledge" different from how it's commonly used. When most people say "knowledge" they mean beliefs that are also factually correct. As a digression, the definition of knowledge as justified true belief runs into the Gettier problems: I don’t think it’s so clear cut… Even the most adamant “facts are immutable” person can agree that we’ve had trouble “fact checking” social media objectively. Fluoride is healthy, meta analysis of the facts reveals fluoride may be unhealthy. The truth of the matter is by and large what’s socially cohesive for doctors’ and dentists’ narrative, that “fluoride is fine any argument to the contrary—even the published meta-analysis—is politically motivated nonsense”. You are just saying identifying "knowledge" vs "opinion" is difficult to achieve. No, I’m saying I’ve seen reasonbly
minded experts in a field disagree over things-generally-considered-facts. I’ve seen social impetus and context shape the understanding of where to draw the line between fact and opinion. I do not believe there is an objective answer. I fundamentally believe Anthropic’s explanation is rooted in real phenomena and not just a self serving statement to explain AI hallucination in a positive quasi-intellectual light. > The model genuinely believes it’s giving a correct reasoning chain, but the interpretability microscope reveals it is constructing symbolic arguments backward from a conclusion. Sounds very human. It's quite common that we make a decision based on intuition, and the reasons we give are just post-hoc justification (for ourselves and others). > Sounds very human well yes, of course it does, that article goes out of its way to anthropomorphize LLMs, while providing very little substance Isn't the point of computers to have machines that improve on default human weaknesses, not just reproduce them at scale? They've largely been complementary strengths, with less overlap. But human language is state-of-the-art, after hundreds of thousands of years of "development". It seems like reproducing SOTA (i.e. the current ongoing effort) is a good milestone for a computer algorithm as it gains language overlap with us. Why would computers have just one “point”? They have been used for endless purposes and those uses will expand forever The other very human thing to do is invent disciplines of thought so that we don't just constantly spew bullshit all the time. For example you could have a discipline about "pursuit of facts" which means that before you say something you mentally check yourself and make sure it's actually factually correct. This is how large portions of the populace avoid walking around spewing made up facts and bullshit. In our rush to anthropomorphize ML systems we often forget that there are a lot of disciplines that humans are painstakingly taught from birth and those disciplines often give rise to behaviors that the ML-based system is incapable of like saying "I don't know the answer to that" or "I think that might be an unanswerable question." In a way, the main problem with LLMs isn't that they are wrong sometimes. We humans are used to that. We encounter people who are professionally wrong all the time. Politicians, con-men, scammers, even people who are just honestly wrong. We have evaluation metrics for those things. Those metrics are flawed because there are humans on the other end intelligently gaming those too, but generally speaking we're all at least trying. LLMs don't fit those signals properly. They always sound like an intelligent person who knows what they are talking about, even when spewing absolute garbage. Even very intelligent people, even very intelligent people in the field of AI research are routinely bamboozled by the sheer swaggering confidence these models convey in their own results. My personal opinion is that any AI researcher who was shocked by the paper lynguist mentioned ought to be ashamed of themselves and their credulity. That was all obvious to me; I couldn't have told you the exact mechanism the arithmetic was being performed (though what is was doing was well in the realm of what I would have expected from a linguistic AI trying to do math), but the fact that its chain of reasoning bore no particular resemblance to how it drew its conclusions was always obvious. A neural net has no introspection on itself. It doesn't have any idea "why" it is doing what it is doing. It can't. There's no mechanism for that to even exist. We humans are not directly introspecting our own neural nets, we're building models of our own behavior and then consulting the models, and anyone with any practice doing that should be well aware of how those models can still completely fail to predict reality! Does that mean the chain of reasoning is "false"? How do we account for it improving performance on certain tasks then? No. It means that it is occurring at a higher level and a different level. It is quite like humans imputing reasons to their gut impulses. With training, combining gut impulses with careful reasoning is actually a very, very potent way to solve problems. The reasoning system needs training or it flies around like an unconstrained fire hose uncontrollably spraying everything around, but brought under control it is the most powerful system we know. But the models should always have been read as providing a rationalization rather than an explanation of something they couldn't possibly have been explaining. I'm also not convinced the models have that "training" either, nor is it obvious to me how to give it to them. (You can't just prompt it into a human, it's going to be more complicated than just telling a model to "be carefully rational". Intensive and careful RHLF is a bare minimum, but finding humans who can get it right will itself be a challenge, and it's possible that what we're looking for simply doesn't exist in the bias-set of the LLM technology, which is my base case at this point.) I haven’t used Cursor yet. Some colleagues have and seemed happy. I’ve had GitHub Copilot on for what feels like a couple years, a few days ago VS Code was extended to provide an agentic workflow, MCP, bring-your-own-key, it interprets instructions in a codebase. But the UX and the outputs are bad in over 3/4 of cases. It’s a nuisance to me. It injects bad code even though it has the full context. Is Cursor genuinely any better? To me it feels like people that benefit from or at least enjoy that sort of assistance and I solve vastly different problems and code very differently. I’ve done exhausting code reviews on juniors’ and middles’ PRs but what I’ve been feeling lately is that I’m reviewing changes introduced by a very naive poster. It doesn’t even type-check. Regardless of whether it’s Claude 3.7, o1, o3-mini, or a few models from Hugging Face. I don’t understand how people find that useful. Yesterday I literally wasted half an hour for a test suite setup a colleague of mine introduced to the codebase that wasn’t good, and I tried delegating that fix to several of the Copilot models. All of them missed the point, some even introduced security vulnerabilities in the process invalidating JWT validation, I tried “vide coding” it till it works, until I gave up in frustration and just used an ordinary search engine, which led me to the docs, in which I immediately found the right knob. I reverted all that crap and did the simple and correct thing. So my conclusion was simple: vibe coding and LLMs made the codebase unnecessarily more complicated and wasted my time. How on earth do people code whole apps with that? I think it works until it doesn't. The nature of technical debt of this kind means you can sort of coast on things until the complexity of the system reaches such a level that it's effectively painted into a corner, and nothing but a massive teardown will do as a fix. Yes https://link.springer.com/article/10.1007/s10676-024-09775-5 > # ChatGPT is bullshit > Recently, there has been considerable interest in large language models: machine learning systems which produce human-like text and dialogue. Applications of these systems have been plagued by persistent inaccuracies in their output; these are often called “AI hallucinations”. We argue that these falsehoods, and the overall activity of large language models, is better understood as bullshit in the sense explored by Frankfurt (On Bullshit, Princeton, 2005): the models are in an important way indifferent to the truth of their outputs. We distinguish two ways in which the models can be said to be bullshitters, and argue that they clearly meet at least one of these definitions. We further argue that describing AI misrepresentations as bullshit is both a more useful and more accurate way of predicting and discussing the behaviour of these systems. > The model genuinely believes it’s giving a correct reasoning chain The model doesn't "genuinely believe" anything. Offtopic but I'm still sad that "On Bullshit" didn't go for that highest form of book titles, the single noun like "Capital", "Sapiens", etc Starting with "On" is cooler in philosophical tradition, though, starting in classical and medieval times, e.g. On Interpretation, On the Heavens, etc by Aristotle, De Veritate, De Malo, etc. by Aquinas. Capital is actually "Das Kapital", too It's very hipster, Das Kapital. (with the dot/period, check the cover https://en.wikipedia.org/wiki/Das_Kapital#/media/File:Zentra... ) But in English it would be just "Capital", right? (The uncountable nouns are rarely used with articles, it's "happiness" not "the happiness". See also https://old.reddit.com/r/writing/comments/12hf5wd/comment/jf... ) Yeah so I meant the Piketty book, not Marx. But I googled it and turns out it's actually named "Capital in the Twenty-First Century", which disappoints me even more than "On Bullshit" And, for the full picture it's probably important to consider that the main claim of the book is based on very unreliable data/methodology. (Though note that it does not necessarily make the claim false! See [1]) https://marginalrevolution.com/marginalrevolution/2017/10/pi... And then later similar claims about inequality were similarly made using bad methodology (data). https://marginalrevolution.com/marginalrevolution/2023/12/th... [1] "Indeed, in some cases, Sutch argues that it has risen more than Piketty claims. Sutch is rather a journeyman of economic history upset not about Piketty’s conclusions but about the methods Piketty used to reach those conclusions." You misunderstand. I never read it. I simply liked the title, at least before I understood "Capital" that wasn't actually the title. It's a huge problem. I just can't get past it and I get burned by it every time I try one of these products. Cursor in particular was one of the worst; the very first time I allowed it to look at my codebase, it hallucinated a missing brace (my code parsed fine), "helpfully" inserted it, and then proceeded to break everything. How am I supposed to trust and work with such a tool? To me, it seems like the equivalent of lobbing a live hand grenade into your codebase. Don't get me wrong, I use AI every day, but it's mostly as a localized code complete or to help me debug tricky issues. Meaning I've written and understand the code myself, and the AI is there to augment my abilities. AI works great if it's used as a deductive tool. Where it runs into issues is when it's used inductively, to create things that aren't there. When it does this, I feel the hallucinations can be off the charts -- inventing APIs, function names, entire libraries, and even entire programming languages on occasion. The AI is more than happy to deliver any kind of information you want, no matter how wrong it is. AI is not a tool, it's a tiny Kafkaesque bureaucracy inside of your codebase. Does it work today? Yes! Why does it work? Who can say! Will it work tomorrow? Fingers crossed! You're not supposed to trust the tool, you're supposed to review and rework the code before submitting for external review. I use AI for rather complex tasks. It's impressive. It can make a bunch of non-trivial changes to several files, and have the code compile without warnings. But I need to iterate a few times so that the code looks like what I want. That being said, I also lose time pretty regularly. There's a learning curve, and the tool would be much more useful if it was faster. It takes a few minutes to make changes, and there may be several iterations. > You're not supposed to trust the tool, you're supposed to review and rework the code before submitting for external review. It sounds like the guys in this article should not have trusted AI to go fully open loop on their customer support system. That should be well understood by all "customers" of AI. You can't trust it to do anything correctly without human feedback/review and human quality control. > You're not supposed to trust the tool This is just an incredible statement. I can't think of another development tool we'd say this about. I'm not saying you're wrong, or that it's wrong to have tools we can't just, just... wow... what a sea change. Imagine! Imagine if 0.05% of the time gcc just injected random code into your binaries. Imagine, you swing a hammer and 1% of the time it just phases into the wall. Tools are supposed to be reliable. There are no existing AI tools that guarantee correct code 100% of the time. If there is such a tool, programmers will be on path of immediate reskilling or lose their jobs very quickly. Imagine if your compiler just randomly and non-deterministically compiled valid code to incorrect binaries, and the tool's developer couldn't really tell you why it happens, how often it was expected to happen, how severe the problem was expected to be, and told you to just not trust your compiler to create correct machine code. Imagine if your calculator app randomly and non-deterministically performed arithmetic incorrectly, and you similarly couldn't get correctness expectations from the developer. Imagine if any of your communication tools randomly and non-deterministically translated your messages into gibberish... I think we'd all throw away such tools, but we are expected to accept it if it's an "AI tool?" Imagine that you yourself never use these tools directly but your employees do. And the sellers of said tools swear that the tools are amazing and correct and will save you millions. They keep telling you that any employee who highlights problems with the tools are just trying to save their job. Your investors tell you that the toolmakers are already saving money for your competitors. Now, do you want that second house and white lotus vacation or not? Making good tools is difficult. Bending perception (“is reality”) is easier and enterprise sales, just like good propaganda, work. The gold rush will leave a lot of bodies behind but the shovelmakers will make a killing. If you think of AI like a compiler, yes we should throw away such tools because we expect correctness and deterministic outcomes If you think of AI like a programmer, no we shouldn't throw away such tools because we accept them as imperfect and we still need to review. > If you think of AI like a programmer, no we shouldn't throw away such tools because we accept them as imperfect and we still need to review. This is a common argument but I don't think it holds up. A human learns. If one of my teammates or I make a mistake, when we realize it we learn not to make that mistake in the future. These AI tools don't do that. You could use a model for a year, and it'll be just as unreliable as it is today. The fact that they can't learn makes them a nonstarter compared to humans. If the only calculators that existed failed at 5% of the calculations, or if the only communication tools miscommunicated 5% of the time, we would still use both all the time. They would be far less than 95% as useful as perfect versions, but drastically better then not having the tools at all. Absolutely not. We'd just do the calculations by hand, which is better than running the 95%-correct calculator and then doing the calculations by hand anyway to verify its output. Suppose you work in a field where getting calculations right is critical. Your engineers make mistakes less than .01% of the time, but they do a lot of calculations and each mistake could cost $millions or lives. Double- and triple-checking help a lot, but they're costly. Here's a machine that verifies 95% of calculations, but you'd still have to do 5% of the work. Shall I throw it away? Unreliable tools have a good deal of utility. That's an example of them helping reduce the problem space, but they also can be useful in situations where having a 95% confidence guess now matters more that a 99.99% confidence one in ten minutes- firing mortars in active combat, say. There's situations where validation is easier than computation; canonically this is factoring, but even division is much simpler than multiplication. It could very easily save you time to multiply all of the calculator's output by the dividend while performing both a multiplication and a division for the 5% that are wrong. edit: I submit this comment and click to go the front page and right at the top is Unsure Calculator (no relevance). Sorry, I had to mention this > Here's a machine that verifies 95% of calculations, but you'd still have to do 5% of the work. The problem is that you don't know which 5% are wrong. The AI is confidently wrong all the time. So the only way to be sure is to double check everything, and at some point its easier to just do it the right way. Sure, some things don't need to be perfect. But how much do you really want to risk? This company thought a little bit of potential misinformation was acceptable, and so it caused a completely self inflicted PR scandal, pissed off their customer base, and lost them a lot of confidence and revenue. Was that 5% error worth it? Stories like this are going to keep coming the more we rely on AI to do things humans should be doing. Someday you'll be affected by the fallout of some system failing because you happen to wind up in the 5% failure gap that some manager thought was acceptable (if that manager even ran a calculation and didn't just blindly trust whatever some other AI system told them) I just hope it's something as trivial as an IDE and not something in your car, your bank, or your hospital. But certainly LLMs will be irresponsibly shoved into all three within the next few years, if it's not there already.
nerdjon - 4 days ago
lynguist - 4 days ago
emn13 - 4 days ago
dcow - 3 days ago
jimbokun - 3 days ago
indigo945 - 3 days ago
Or from 8th century Indian philosopher Dharmottara: > Smith [...] has a justified belief that "Jones owns a Ford". Smith
> therefore (justifiably) concludes [...] that "Jones owns a Ford, or Brown
> is in Barcelona", even though Smith has no information whatsoever about
> the location of Brown. In fact, Jones does not own a Ford, but by sheer
> coincidence, Brown really is in Barcelona. Again, Smith had a belief that
> was true and justified, but not knowledge.
More to the point, the definition of knowledge as linguistic agreement is convincingly supported by much of what has historically been common knowledge, such as the meddling of deities in human affairs, or that the people of Springfield are eating the cats. > Imagine that we are seeking water on a hot day. We suddenly see water, or so we
> think. In fact, we are not seeing water but a mirage, but when we reach the
> spot, we are lucky and find water right there under a rock. Can we say that we
> had genuine knowledge of water? The answer seems to be negative, for we were
> just lucky.
dcow - 3 days ago
jimbokun - 3 days ago
dcow - 3 days ago
CodesInChaos - 4 days ago
RansomStark - 4 days ago
jimbokun - 3 days ago
canadaduane - 3 days ago
floydnoel - 3 days ago
throwway120385 - 3 days ago
jerf - 3 days ago
jmaker - 4 days ago
trilbyglens - 4 days ago
nickledave - 3 days ago
ScottBurson - a day ago
skrebbel - 4 days ago
mvieira38 - 3 days ago
pas - 3 days ago
skrebbel - 3 days ago
pas - 3 days ago
skrebbel - 2 days ago
ModernMech - 4 days ago
yodsanklai - 4 days ago
ryandrake - 4 days ago
schmichael - 4 days ago
ModernMech - 4 days ago
arvinsim - 4 days ago
ryandrake - 4 days ago
andrei_says_ - 4 days ago
arvinsim - 4 days ago
bigstrat2003 - 4 days ago
ToValueFunfetti - 4 days ago
gitremote - 4 days ago
ToValueFunfetti - 4 days ago
diputsmonro - 4 days ago