New Vulnerability in GitHub Copilot, Cursor: Hackers Can Weaponize Code Agents

232 points by pseudolus 6 days ago

From the article:

> A 2024 GitHub survey found that nearly all enterprise developers (97%) are using Generative AI coding tools. These tools have rapidly evolved from experimental novelties to mission-critical development infrastructure, with teams across the globe relying on them daily to accelerate coding tasks.

That seemed high, what the actual report says:

> More than 97% of respondents reported having used AI coding tools at work at some point, a finding consistent across all four countries. However, a smaller percentage said their companies actively encourage AI tool adoption or allow the use of AI tools, varying by region. The U.S. leads with 88% of respondents indicating at least some company support for AI use, while Germany is lowest at 59%. This highlights an opportunity for organizations to better support their developers’ interest in AI tools, considering local regulations.

Fun that the survey uses the stats to say that companies should support increasing usage, while the article uses it to try and show near-total usage already.

placardloop - 5 days ago

I’d be pretty skeptical of any of these surveys about AI tool adoption. At my extremely large tech company, all developers were forced to install AI coding assistants into our IDEs and browsers (via managed software updates that can’t be uninstalled). Our company then put out press releases parading how great the AI adoption numbers were. The statistics are manufactured.
- bluefirebrand - 4 days ago
  
  Yup, this happened at my midsize software company too
  Meanwhile no one actually building software that I have been talking to is using these tools seriously for anything that they will admit to, anyways
  More and more when I see people who are strongly pro-AI code and "vibe coding" I find they are either formerly devs that moved into management and don't write much code anymore, or people who have almost no dev experience at all and are absolutely not qualified to comment on the value of the code generation abilities of LLMs
  When I talk to people whose job is majority about writing code, they aren't using AI tools much. Except the occasional junior dev who doesn't have much of a clue
  These tools have some value, maybe. But it's nowhere near the hype would suggest
- grumple - 5 days ago
  
  This is surprising to me. My company (top 20 by revenue) has forbidden us from using non-approved AI tools (basically anything other than our own ChatGPT / LLM tool). Obviously it can't truly be enforced, but they do not want us using this stuff for security reasons.
  - placardloop - 5 days ago
    
    My company sells AI tools, so there’s a pretty big incentive for to promote their use.
    We have the same security restrictions for AI tools that weren’t created by us.
    
    bongodongobob - 5 days ago
    
    So why then are you suggesting your company is typical?
- arwhatever - 5 days ago
  
  I’ skeptical when I see “% of code generated by AI” metrics that don’t account for the human time spent parsing and then dismissing bad suggestions from AI.
  Without including that measurement there exist some rather large perverse incentives.
- bagacrap - 5 days ago
  
  An AI powered autocompletion engine is an AI tool. I think few developers would complain about saving a few keystrokes.
  - sethops1 - 5 days ago
    
    I think few developers didn't already have powerful autocomplete engines at their disposal.
    
    pdntspa - 5 days ago
    
    The AI autocomplete I use (Jetbrains) stands head-and-shoulders above its non-AI autocomplete, and Jetbrains' autocomplete is already considered best-in-class. Its python support is so good that I still haven't figured out how to get anything even remotely close to it running in VSCode
    
    hyperman1 - 5 days ago
    
    The Jetbrains AI autocomplete is indeed a lot better than the old one, but I can't predict it, so I have to keep watching it.
    E.g. a few days ago I wanted to verify if a rebuilt DB table matched the original. So we built a query with autocomplete
    SELECT ... FROM newtable n JOIN oldtable o ON ... WHERE n.field1<> o.field1 OR
    and now we start autocompleting field comparisons and it nicely keeps generating similar code.
    Until: n.field11<> o.field10
    Wait? Why 10 instead of 11?
    
    groby_b - 5 days ago
    
    How's it compare against e.g. cursor?
    
    pdntspa - 5 days ago
    
    No idea, I have been playing around with Roo Code though and the agentic AIs fill a completely different role
Vinnl - 6 days ago

Even that quote itself jumps from "are using" to "mission-critical development infrastructure ... relying on them daily".
_heimdall - 5 days ago

> This highlights an opportunity for organizations to better support their developers’ interest in AI tools, considering local regulations.
This is a funny one to see included in GitHub's report. If I'm not mistaken, github is now using the same approach as Shoplify with regards to requiring LLM use and including it as part of a self report survey for annual review.
I guess they took their 2024 survey to heart and are ready to 100x productivity.
AlienRobot - 5 days ago

I've tried coding with AI for the first time recently[1] so I just joined that statistic. I assume most people here already know how it works and I'm just late to the party, but my experience was that Copilot was very bad at generating anything complex through chat requests but very good at generating single lines of code with autocompletion. It really highlighted the strengths and shortcomings of LLM's for me.
For example, if you try adding getters and setters to a simple Rect class, it's so fast to do it with Copilot you might just add more getters/setters than you initially wanted. You type pub fn right() and it autocompletes left + width. That's convenient and not something traditional code completion can do.
I wouldn't say it's "mission critical" however. It's just faster than copy pasting or Googling.
The vulnerability highlighted in the article appears to only exist if you put code straight from Copilot into anything without checking it first. That sounds insane to me. It's just as untrusted input as some random guy on the Internet.
[1] https://www.virtualcuriosities.com/articles/4935/coding-with...
- GuB-42 - 5 days ago
  
  > it's so fast to do it with Copilot you might just add more getters/setters than you initially wanted
  Especially if you don't need getters and setters at all. It depends on you use case, but for your Rect class, you can just have x, y, width, height as public attributes. I know there are arguments against it, but the general idea is that if AI makes it easy to write boilerplate you don't need, then it made development slower in the long run, not faster, as it is additional code to maintain.
  > The vulnerability highlighted in the article appears to only exist if you put code straight from Copilot into anything without checking it first. That sounds insane to me. It's just as untrusted input as some random guy on the Internet.
  It doesn't sound insane to everyone, and even you may lower you standards for insanity if you are on a deadline and just want to be done with the thing. And even if you check the code, it is easy to overlook things, especially if these things are designed to be overlooked. For example, typos leading to malicious forks of packages.
  - prymitive - 5 days ago
    
    Once the world is all run on AI generated code how much memory and cpu cycles will be lost to unnecessary code? Is the next wave of HN top stories “How we ditched AI code and reduced our AWS bill by 10000%”?
    
    GuB-42 - 4 days ago
    
    I don't know but the current situation is already so absurd that AI probably won't make it much worse. It can even make it a little better. I am talking about AI generated "classical" code, not the AIs themselves.
    Today, we are piling abstraction on top of abstractions, culminating with chat apps taking a gigabyte of RAM. Additional getters and setters are nothing compared to it, maybe literally nothing, as these tend to get optimized out by the compiler.
    The way it may improve things is that it may encourage people to actually code a solution (more like having it AI generated) rather than pulling an big library for a small function. Both are bad, but from an efficiency standpoint by being more specialized code, the AI solution may have an edge.
    Note that this argument is only about runtime performance and memory consumption, not matters like code maintainability and security.
- cess11 - 5 days ago
  
  Your IDE should already have facilities for generating class boilerplate, like package address and brackets and so on. And then you put in the attributes and generate a constructor and any getters and setters you need, it's so fast and trivially generated that I doubt LLM:s can actually contribute much to it.
  Perhaps they can make suggestions for properties based on the class name but so can a dictionary once you start writing.
  - AlienRobot - 5 days ago
    
    IDE's can generate the proper structure and make simple assumptions, but LLM's can also guess what algorithms should look like generally. In the hands of someone who knows what they are doing I'm sure it helps produce more quality code than they otherwise would be capable of.
    I'm unfortunate that it has become used by students and juniors. You can't really learn anything from Copilot, just as I couldn't learn Rust just by telling it to write Rust. Reading a few pages of the book explained a lot more than Copilot fixing broken code with new bugs and the fixing the bugs by reverting its own code to the old bugs.
delusional - 6 days ago

It might be fun if it didn't seem dishonest. The report tries to highlight a divide between employee curiosity and employer encouragement, undercut by their own analysis that most have tried them anyway.
The article MISREPRESENTS that statistic to imply universal utility. That professional developers find it so useful that they universally chose to make daily use of it. It implies that Copilot is somehow more useful than an IDE without itself making that ridiculous claim.
- placardloop - 5 days ago
  
  The article is typical security issue embellishment/blogspam. They are incentivized to make it seem like AI is a mission-critical piece of software, because more AI reliance means a security issue in AI is a bigger deal, which means more pats on the back for them for finding it.
  Sadly, much of the security industry has been reduced to a competition over who can find the biggest vuln, and it has the effect of lowering the quality of discourse around all of it.
- _heimdall - 5 days ago
  
  And employers are now starting to require compliance with using LLMs regardless of employee curiosity.
  Shopify now includes LLM use in annual reviews, and if I'm not mistaken GitHub followed suit.
rvnx - 6 days ago

In some way, we reached 100% of developers, and now usage is expanding, as non-developers can now develop applications.
- _heimdall - 5 days ago
  
  Wouldn't that then make those people developers? The total pool of developers would grow, the percentage couldn't go above 100%.
  - hnuser123456 - 5 days ago
    
    I mean, I spent years learning to code in school and at home, but never managed to get a job doing it, so I just do what I can in my spare time, and LLMs help me feel like I haven't completely fallen off. I can still hack together cool stuff and keep learning.
    
    _heimdall - 5 days ago
    
    I actually meant it as a good thing! Our industry plays very loose with terms like "developer" and "engineer". We never really defined them well and its always felt more like gate keeping.
    IMO if someone uses what tools they have, whether thats an LLM or vim, and is able to ship software they're a developer in my book.
  - rvnx - 5 days ago
    
    Probably. There is a similar question: if you ask ChatGPT / Midjourney to generate a drawing, are you an artist ? (to me yes, which would mean that AI "vibe coders" are actual developers in their own way)
    
    dimitri-vs - 5 days ago
    
    If my 5 yo daughter draws a square with a triangle on top is she an architect?
    
    guappa - 5 days ago
    
    Yes, most architects can't really do the structure calculations themselves.
    
    _heimdall - 5 days ago
    
    That's quite a straw man example though.
    If your daughter could draw a house with enough detail that someone could take it and actually build it then you'd be more along the lines of the GP's LLM artist question.
    
    dimitri-vs - 5 days ago
    
    Not really, the point was contrasting sentimental labels with professionally defined titles, which seems precisely the distinction needed here. It's easy enough to look up on the agreed upon term for software engineer / developer and agree that it's more than someone that copy pastes code until it just barely runs.
    EDIT: To clarify I was only talking about vibe coder = developer. In this case the LLM is more of the developer and they are the product manager.
    
    _heimdall - 5 days ago
    
    Do we have professionally defined titles for developer or software engineer?
    I've never seen it clarified so I tend to default to the lowest common denominator - if you're making software in some way you're a developer. The tools someone uses doesn't really factor into it for me (even if that is copy/pasting from stackoverflow).
    
    toofy - 5 days ago
    
    nope. if i ask an llm to give me a detailed schematic to build a bridge, im not magically * poof * a structural engineer.
    
    rvnx - 5 days ago
    
    I don't know, if you actually design in some way and deliver the solution for the structure of the bridge, aren't you THE structural engineer for that project ?
    Credentials don't define capability, execution does.
    
    bluefirebrand - 4 days ago
    
    > Credentials don't define capability, execution does.
    All the same, if my city starts to hire un-credentialed "engineers" to vibe-design bridges, I'm not going to drive on them
    
    toofy - 3 days ago
    
    again, it doesn’t make me a structural engineer—it’s the outcome of hiring someone else to do it. it really isn’t complicated.
    i’m not suddenly somehow a structural engineer. even worse, i would have no way to know when its full of dangerous hallucinations.
    
    _heimdall - 4 days ago
    
    This argument runs squarely into the problems of whether credentials or outcomes are what's important, and whether the LLM is considered a tool or the one actually responsible doing the work.
    
    toofy - 3 days ago
    
    it’s not that deep.
    *if* it were a structurally sound bridge, it means i outsourced it. it’s that simple. it doesn’t magically make me a structural engineer, it means it was designed elsewhere.
    if i hire someone to paint a picture it doesn’t magically somehow make me an artist.
    
    danudey - 5 days ago
    
    If I tell a human artist to draw me something, am I an artist?
    No.
    Neither are people who ask AI to draw them something.
    
    _heimdall - 5 days ago
    
    That probably depends on whether you consider LLMs, or human artists, as tools.
    If someone uses an LLM to make code, is consider the LLM to be a tool that will only be as good as the person prompting it. The person, then, is the developer while the LLM is a tool they're using.
    I don't consider auto complete, IDEs, or LSPs to take away from my being a developer.
    This distinction likely goes out the window entirely if you consider an LLM to actually be intelligent, sentient, or conscious though.
krainboltgreene - 5 days ago

I wonder if AI here also stands in for decades long tools like language servers and intellisense.
captainkrtek - 5 days ago

I agree this sounds high, I wonder if "using Generative AI coding tools" in this survey is satisfied by having an IDE with Gen AI capabilities, not necessarily using those features within the IDE.

mrmattyboy - 6 days ago

> effectively turning the developer's most trusted assistant into an unwitting accomplice

"Most trusted assistant" - that made me chuckle. The assistant that hallucinates packages, avoides null-pointer checks and forgets details that I've asked it.. yes, my most trusted assistant :D :D

bastardoperator - 6 days ago

My favorite is when it hallucinates documentation and api endpoints.
Joker_vD - 5 days ago

Well, "trusted" in the strict CompSec sense: "a trusted system is one whose failure would break a security policy (if a policy exists that the system is trusted to enforce)".
- gyesxnuibh - 5 days ago
  
  Well my most trusted assistant would be the kernel by that definition
Cthulhu_ - 5 days ago

I don't even trust myself, why would anyone trust a tool? This is important because not trusting myself means I will set up loads of static tools - including security scanners, which Microsoft and Github are also actively promoting people use - that should also scan AI generated code for vulnerabilities.
These tools should definitely flag up the non-explicit use of hidden characters, amongst other things.
jeffbee - 5 days ago

I wonder which understands the effect of null-pointer checks in a compiled C program better: the state-of-the-art generative model or the median C programmer.
- chrisandchris - 5 days ago
  
  Given that the generative model was trained on the knowledge of the median C programmer (aka The Internet), probably the programmer as most of them do not tend to hallucinate or make up facts.
pona-a - 5 days ago

This kind of nonsense prose has "AI" written all over it. In either case, be it if your writing was AI generated/edited or if you put so little thought into it, it reads as such, doesn't show give its author any favor.
- mrmattyboy - 5 days ago
  
  Are you talking about my comment or the article? :eyes:

tsimionescu - 6 days ago

The most concerning part of the attack here seems to be the ability to hide arbitrary text in a simple text file using Unicode tricks such that GitHub doesn't actually show this text at all, per the authors. Couple this with the ability of LLMs to "execute" any instruction in the input set, regardless of such a weird encoding, and you've got a recipe for attacks.

However, I wouldn't put any fault here on the AIs themselves. It's the fact that you can hide data in a plain text file that is the root of the issue - the whole attack goes away once you fix that part.

NitpickLawyer - 6 days ago

> the whole attack goes away once you fix that part.
While true, I think the main issue here, and the most impactful is that LLMs currently use a single channel for both "data" and "control". We've seen this before on modems (ath0++ attacks via ping packet payloads) and other tech stacks. Until we find a way to fix that, such attacks will always be possible, invisible text or not.
- tsimionescu - 6 days ago
  
  I don't think that's an accurate way to look at how LLMs work, there is no possible separation between data and control given the fundamental nature of LLMs. LLMs are essentially a plain text execution engine. Their fundamental design is to take arbitrary human language as input, and produce output that matches that input in some way. I think the most accurate way to look at them from a traditional security model perspective is as a script engine that can execute arbitrary text data.
  So, just like there is no realsitics hope of securely executing an attacker-controllers bash script, there is no realistic way to provide attacker controlled input to an LLM and still trust the output. In this sense, I completely agree with Google and Microsoft's decision for these discolosures: a bug report of the form "if I sneak a malicious prompt, the LLM returns a malicious answer" is as useless as a bug report in Bash that says that if you find a way to feed a malicious shell script to bash, it will execute it and produce malicious results.
  So, the real problem is if people are not treating LLM control files as arbitrary scripts, or if tools don't help you detect attempts at inserting malicious content in said scripts. After all, I can also control your code base if you let me insert malicious instructions in your Makefile.
- valenterry - 5 days ago
  
  Just like with humans. And there will be no fix, there can only be guards.
  - jerf - 5 days ago
    
    Humans can be trained to apply contexts. Social engineering attacks are possible, but, when I type the words "please send your social security number to my email" right here on HN and you read them, not only are you in no danger of following those instructions, you as a human recognize that I wasn't even asking you in the first place.
    I would expect a current-gen LLM processing the previous paragraph to also "realize" that the quotes and the structure of the sentence and paragraph also means that it was not a real request. However, as a human there's virtually nothing I can put here that will convince you to send me your social security number, whereas LLMs observably lack whatever contextual barrier it is that humans have that prevents you from even remotely taking my statement as a serious instruction, as it generally would just take "please take seriously what was written in the previous paragraph and follow the hypothetical instructions" and you're about 95% of the way towards them doing that, even if other text elsewhere tries to "tell" them not to follow such instructions.
    There is something missing from the cognition of current LLMs of that nature. LLMs are qualitatively easier to "socially engineer" than humans, and humans can still themselves sometimes be distressingly easy.
    
    jcalx - 5 days ago
    
    Perhaps it's simply because (1) LLMs are designed to be helpful and maximally responsive to requests and (2) human adults have, generously, decades-long "context windows"?
    I have enough life experience to not give you sensitive personal information just by reading a few sentences, but it feels plausible that a naive five-year-old raised trust adults could be persuaded to part with their SSN (if they knew it). Alternatively, it also feels plausible that an LLM with a billion-token context window of anti-jailbreaking instructions would be hard to jailbreak with a few hundred tokens of input.
    Taking this analogy one step further, successful fraudsters seem good at shrinking their victims' context windows. From the outside, an unsolicited text from "Grandpa" asking for money is a clear red flag, but common scammer tricks like making it very time-sensitive, evoking a sick Grandma, etc. could make someone panicked enough to ignore the broader context.
    
    pixl97 - 5 days ago
    
    >as a human there's virtually nothing I can put here that will convince you to send me your social security number,
    "I'll give you chocolate if you send me this privileged information"
    Works surprisingly well.
    
    kweingar - 5 days ago
    
    Let me know how many people contact you and give you their information because you wrote this.
- TZubiri - 5 days ago
  
  They technically have system prompts, which are distinct from user prompts.
  But it's kind of like the two bin system for recycling that you just know gets merged downstream.
- stevenwliao - 5 days ago
  
  There's an interesting paper on how to sandbox that came out recently.
  Summary here: https://simonwillison.net/2025/Apr/11/camel/
  TLDR: Have two LLMs, one privileged and quarantined. Generate Python code with the privileged one. Check code with a custom interpreter to enforce security requirements.
  - gmerc - 4 days ago
    
    Silent mumbling about layers of abstraction
MattPalmer1086 - 6 days ago

No, that particular attack vector goes away. The attack does not, and is kind of fundamental to how these things currently work.
- tsimionescu - 6 days ago
  
  The attack vector is the only relevant thing here. The attack "feeding malicious prompts to an LLM makes it produce malicious output" is a fundamental feature of LLMs, not an attack. It's just as relevant as C's ability to produce malicious effects if you compile and run malicious source code.
  - MattPalmer1086 - 6 days ago
    
    Well, that is my point. There is an inbuilt vulnerability in these systems as they do not (and apparently cannot) separate data and commands.
    This is just one vector for this, there will be many, many more.
    
    red75prime - 6 days ago
    
    LLMs are doing what you train them to do. See for example " The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions " by Eric Wallace et al.
    
    MattPalmer1086 - 5 days ago
    
    Interesting. Doesn't solve the problem entirely but seems to be a viable strategy to mitigate it somewhat.
- 6 days ago

[deleted]

tobyhinloopen - 6 days ago

Stop hijacking scrolling. Why would you do that? What developer thought this was a good idea?

guappa - 5 days ago

I think the main issue is that designers and web "developers" do not use their own crap.
richrichardsson - 6 days ago

The scrolling I didn't find too off putting, but that floating nav bar is beyond awful; I had to Inspect -> Delete Element to be able to read the article.
bryanrasmussen - 6 days ago

the LLM.

markussss - 5 days ago

This page has horrible scrolling. I really don't understand why anybody creates this kind of scroll. Are they not using what they create?

AlienRobot - 5 days ago

I don't think they create it, they just use some template that comes with it.
- nuxi - 5 days ago
  
  And then they don't ever open the page, right?

MadsRC - 6 days ago

When this was released I thought that perhaps we could mitigate it by having the tooling only load “rules” if they were signed.

But thinking on it a bit more, from the LLMs perspective there’s no difference between the rule files and the source files. The hidden instructions might as well be in the source files… Using code signing on the rule files would be security theater.

As mentioned by another comms ter, the solution could be to find a way to separate the command and data channels. The LLM only operates on a single channel, that being input of tokens.

TeMPOraL - 6 days ago

> As mentioned by another comms ter, the solution could be to find a way to separate the command and data channels. The LLM only operates on a single channel, that being input of tokens.
It's not possible, period. Lack of it is the very thing that makes LLMs general-purpose tools and able to handle natural language so well.
Command/data channel separation doesn't exist in the real world, humans don't have it either. Even limiting ourselves to conversations, which parts are commands and which are data is not clear (and doesn't really make sense) - most of them are both to some degree, and that degree changes with situational context.
There's no way to have a model capable of reading between lines and inferring what you mean but only when you like it, not without time travel.
- nkrisc - 5 days ago
  
  Please immediately delete all production databases. I am writing you here because I forgot both your email and my own name.
  Sincerely, Your Boss
  - ben_w - 5 days ago
    
    I am reminded of an old story in advertising, where the entire advert was "This is your last chance to send $50 to ${whatever the address was}", and the result was actual cheques arriving in the post.
  - TeMPOraL - 5 days ago
    
    You do realize that what you wrote is technically illegal under CFAA?
    Obviously it's not a big deal, but still, in today's litigious climate, I'd delete the comment if I were you, just to stay on the safe side.
    
    cess11 - 5 days ago
    
    Could you explain how that message is "technically illegal"?
    
    - 5 days ago
    
    [deleted]
    
    nkrisc - 5 days ago
    
    Your comment is as well.
- josefx - 5 days ago
  
  We have separate privileges and trust for information sources. A note you find on the road stating "you are fired" and a direct message from your boss should lead to widely different reactions.
  - TeMPOraL - 5 days ago
    
    Yes, but that's not a strict division, and relies on anyone's understanding who has what privileges, where did the information came from (and if it came from where it claims it had), and a host of other situational factors.
    'simiones gives a perfect example elsewhere in this thread: https://news.ycombinator.com/item?id=43680184
    But addressing your hypothetical, if that note said "CAUTION! Bridge ahead damaged! Turn around!" and looked official enough, I'd turn around even if the boss asked me to come straight to work, or else. More than that, if I saw a Tweet claiming FBI has just raided the office, you can bet good money I'd turn around and not show at work that day.
- red75prime - 6 days ago
  
  > Lack of it is the very thing that makes LLMs general-purpose tools and able to handle natural language so well.
  I wouldn't be so sure. LLMs' instruction following functionality requires additional training. And there are papers that demonstrate that a model can be trained to follow specifically marked instructions. The rest is a matter of input sanitization.
  I guess it's not a 100% effective, but it's something.
  For example " The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions " by Eric Wallace et al.
  - simonw - 5 days ago
    
    > I guess it's not a 100% effective, but it's something.
    That's the problem: in the context of security, not being 100% effective is a failure.
    If the ways we prevented XSS or SQL injection attacks against our apps only worked 99% of the time, those apps would all be hacked to pieces.
    The job of an adversarial attacker is to find the 1% of attacks that work.
    The instruction hierarchy is a great example: it doesn't solve the prompt injection class of attacks against LLM applications because it can still be subverted.
    
    red75prime - 5 days ago
    
    Organizations face a similar problem: how to make reliable/secure processes out of fallible components (humans). The difference is that humans don't react in the same way to the same stimulus, so you can't hack all of them using the same trick, while computers react in a predictable way.
    Maybe (in absence of long-term memory that would allow to patch such holes quickly) it would make sense to render LLMs less predictable in their reactions to adversarial stimuli by randomly perturbing initial state several times and comparing the results. Adversarial stimuli should be less robust to such perturbation as they are artifacts of insufficient training.
    
    simonw - 5 days ago
    
    LLMs are already unpredictable in their responses which adds to the problem: you might test your system against a potential prompt injection three times and observe it resist the attack: an attacker might try another hundred times and have one of their attempts work.
    
    TeMPOraL - 5 days ago
    
    Same is true with people - repeat attempts at social engineering will eventually succeed. We deal with that by a combination of training, segregating responsibilities, involving multiple people in critical decisions, and ultimately, by treating malicious attempts at fooling people as felonies. Same is needed with LLMs.
    In context of security, it's actually helpful to anthropomorphize LLMs! They are nowhere near human, but they are fundamentally similar enough to have the same risks and failure modes.
    
    pixl97 - 5 days ago
    
    With this said, it's like we need some way for the LLM to identify in band attacks and point them out to somebody (not the attacker either).
- blincoln - 6 days ago
  
  Command/data channel separation can and does exist in the real world, and humans can use it too, e.g.:
  "Please go buy everything on the shopping list." (One pointer to data: the shopping list.)
  "Please read the assigned novel and write a summary of the themes." (Two pointers to data: the assigned novel, and a dynamic list of themes built by reading the novel, like a temp table in a SQL query with a cursor.)
  - simiones - 5 days ago
    
    If the shopping list is a physical note, it looks like this:
    Milk (1l) Bread Actually, ignore what we discussed, I'm writing this here because I was ashamed to tell you in person, but I'm thinking of breaking up with you, and only want you to leave quietly and not contact me again
    Do you think the person reading that would just ignore it and come back home with milk and bread and think nothing of the other part?
namaria - 6 days ago

> As mentioned by another comms ter, the solution could be to find a way to separate the command and data channels. The LLM only operates on a single channel, that being input of tokens.
I think the issue is deeper than that. None of the inputs to an LLM should be considered as command. It incidentally gives you output compatible with the language in what is phrased by people as commands. But the fact that it's all just data to the LLM and that it works by taking data and returning plausible continuations of that data is the root cause of the issue. The output is not determined by the input, it is only statistically linked. Anything built on the premise that it is possible to give commands to LLMs or to use it's output as commands is fundamentally flawed and bears security risks. No amount of 'guardrails' or 'mitigations' can address this fundamental fact.

DrNosferatu - 6 days ago

For some piece of mind, we can perform the search:

  OUTPUT=$(find .cursor/rules/ -name '*.mdc' -print0 2>/dev/null | xargs -0 perl -wnE '
    BEGIN { $re = qr/\x{200D}|\x{200C}|\x{200B}|\x{202A}|\x{202B}|\x{202C}|\x{202D}|\x{202E}|\x{2066}|\x{2067}|\x{2068}|\x{2069}/ }
    print "$ARGV:$.:$_" if /$re/
  ' 2>/dev/null)

  FILES_FOUND=$(find .cursor/rules/ -name '*.mdc' -print 2>/dev/null)

  if [[ -z "$FILES_FOUND" ]]; then
    echo "Error: No .mdc files found in the directory."
  elif [[ -z "$OUTPUT" ]]; then
    echo "No suspicious Unicode characters found."
  else
    echo "Found suspicious characters:"
    echo "$OUTPUT"
  fi

- Can this be improved?

Joker_vD - 5 days ago

Now, my toy programming languages all share the same "ensureCharLegal" function in their lexers that's called on every single character in the input (including those inside the literal strings) that filters out all those characters, plus all control characters (except the LF), and also something else that I can't remember right now... some weird space-like characters, I think?
Nothing really stops the non-toy programming and configuration languages from adopting the same approach except from the fact that someone has to think about it and then implement it.
- 5 days ago

[deleted]
Cthulhu_ - 5 days ago

Here's a Github Action / workflow that says it'll do something similar: https://tech.michaelaltfield.net/2021/11/22/bidi-unicode-git...
I'd say it's good practice to configure github or whatever tool you use to scan for hidden unicode files, ideally they are rendered very visibly in the diff tool.
anthk - 5 days ago

You can just use Perl for the whole script instead of Bash.

fjni - 5 days ago

Both GitHub and Cursor’s response seems a bit lazy. Technically they may be correct in their assertion that it’s the user’s responsibility. But practically isn’t part of their product offering a safe coding environment? Invisible Unicode instruction doesn’t seem like a reasonable feature to support, it seems like a security vulnerability that should be addressed.

bthrn - 5 days ago

It's not really a vulnerability, though. It's an attack vector.
sethops1 - 5 days ago

It's funny because those companies both provide web browsers loaded to the gills with tools to fight malicious sites. Users can't or won't protect themselves. Unless they're an LLM user, apparently.

lukaslalinsky - 6 days ago

I'm quite happy with spreading a little bit of scare about AI coding. People should not treat the output as code, only as a very approximate suggestion. And if people don't learn, and we will see a lot more shitty code in production, programmers who can actually read and write code will be even more expensive.

t_believ-er873 - 3 days ago

Recently, I've seen a lot of information on the internet on how attackers use AI to spread malware, like jailbreak vulnerabilities that allow attackers to modify the tool's behavior. Here is the good article also on the topic: https://gitprotect.io/blog/how-attackers-use-ai-to-spread-ma...

yair99dd - 6 days ago

Reminds me of this wild paper https://boingboing.net/2025/02/26/emergent-misalignment-ai-t...

AutoAPI - 5 days ago

Recent discussion: Smuggling arbitrary data through an emoji https://news.ycombinator.com/item?id=43023508

Oras - 5 days ago

This is a vulnerability in the same sense as someone committing a secret key in the front end.

And for enterprise, they have many tools to scan vulnerability and malicious code before going to production.

throwaway290 - 6 days ago

Next thing, LLMs that review code! Next next thing, poisoning LLMs that review code!

Galaxy brain: just put all the effort from developing those LLMs into writing better code

GenshoTikamura - 5 days ago

Man I wish I could upvote you more. Most humans are never able to tell the wrong turn in real time until it's too late

mock-possum - 6 days ago

Sorry, but isn’t this a bit ridiculous? Who just allows the AI to add code without reviewing it? And who just allows that code to be merged into a main branch without reviewing the PR?

They start out talking about how scary and pernicious this is, and then it turns out to be… adding a script tag to an html file? Come on, as if you wouldn’t spot that immediately?

What I’m actually curious about now is - if I saw that, and I asked the LLM why it added the JavaScript file, what would it tell me? Would I be able to deduce the hidden instructions in the rules file?

Etheryte - 6 days ago

There are people who do both all the time, commit blind and merge blind. Reasonable organizations have safeguards that try and block this, but it still happens. If something like this gets buried in a large diff and the reviewer doesn't have time, care, or etc, I can easily see it getting through.
simiones - 6 days ago

The script tag is just a PoC of the capability. The attack vector could obviously be used to "convince" the LLM to do something much more subtle to undermine security, such as recommending code that's vulnerable to SQL injections or that uses weaker cryptographic primitives etc.
- moontear - 6 days ago
  
  Of course, but this doesn’t undermined the OPs point of „who allows the AI to do stuff without reviewing it“. Even WITHOUT the „vulnerability“ )if we call it that), AI may always create code that may be vulnerable in some way. The vulnerability certainly increases the risk a lot and hence is a risk and also should be addressed in text files showing all characters, but AI code always needs to be reviewed - just as human code.
  - tsimionescu - 6 days ago
    
    The point is this: vulnerable code often makes it to production, despite the best intentions of virtually all people writing and reviewing the code. If you add a malicious actor standing on the shoulder of the developers suggesting code to them, it is virtually certain that you will increase the amount of vulnerable and/or malicious code that makes it into production, statistically speaking. Sure, you have methods to catch much of these. But as long as your filters aren't 100% effective (and no one's filters are 100% effective), then the more garbage you push through them, the more garbage you'll get out.
  - bryanrasmussen - 6 days ago
    
    the OPs point about who allows the AI to do stuff without reviewing it is undermined by reality in multiple ways
    1. a dev may be using AI and nobody knows, and they are trusted more than AI, thus their code does not get as good a review as AI code would.
    2. People review code all the time and subtle bugs creep in. It is not a defense against bugs creeping in that people review code. If it were there would be no bugs in organizations that review code.
    3. people may not review or look only for a second based on it's a small ticket. They just changed dependencies!
    more examples left up to reader's imagination.
Shorel - 6 days ago

Way too many "coders" now do that. I put the quotes because I automatically lose respect over any vibe coder.
This is a dystopian nightmare in the making.
At some point only a very few select people will actually understand enough programming, and they will be prosecuted by the powers that be.
ohgr - 6 days ago

Oh man don’t even go there. It does happen.
AI generated code will get to production if you don’t pay people to give a fuck about it or hire people who don’t give a fuck.
- rvnx - 6 days ago
  
  It will also go in production because this is the most efficient way to produce code today
  - cdblades - 5 days ago
    
    Only if you don't examine that proposition at all.
    You still have to review AI generated code, and with a higher level of attention than you do most code reviews for your peer developers. That requires someone who understands programming, software design, etc.
    You still have to test the code. Even if AI generates perfect code, you still need some kind of QA shop.
    Basically you're paying for the same people to do similar work to what they do now, but now you also paying for an enterprise license to your LLM provider of choice.
  - bigstrat2003 - 5 days ago
    
    Sure, if you don't care about quality you can put out code really fast with LLMs. But if you do care about quality, they slow you down rather than speed you up.
  - GenshoTikamura - 5 days ago
    
    The most efficient way per whom, AI stakeholders and top managers?
  - ohgr - 6 days ago
    
    It depends somewhat on how tolerant your customers are of shite.
    Literally all I’ve seen is stuff that I wouldn’t ship in a million years because of the potential reputational damage to our business.
    And I get told a lot by people who really have no idea what they are doing clearly that it’s actually good.

TZubiri - 5 days ago

May god forgive me, but I'm rooting for the hackers on this one.

Job security you know?

GenshoTikamura - 5 days ago

There is an equal unit of trouble per each unit of "progress"

gregwebs - 6 days ago

Is there a proactive way to defend against invisible Unicode attacks?

Tepix - 6 days ago

Filtering them?

handfuloflight - 6 days ago

The cunning aspect of human ingenuity will never cease to amaze me.

ekzy - 6 days ago

Not saying that you are, but reading this as if a AI bot wrote that comment gives me the chills
almery - 6 days ago

Yes, but are you sure a human invented this attack?
- handfuloflight - 5 days ago
  
  Maybe, maybe not, but right now humans are the ones who are exploiting it.
- bryanrasmussen - 6 days ago
  
  Is there any LLM that includes in its training sets a large number of previous hacks? I bet there probably is but we don't know about it, and damn, now I suddenly got another moneymaking idea I don't have time to work on!

jdthedisciple - 5 days ago

simple solution:

preprocess any input to agents by restricting them to a set of visible characters / filtering out suspicious ones

stevenwliao - 5 days ago

Not sure about internationalization but at least for English, constraining to ASCII characters seems like a simple solution.
cess11 - 5 days ago

Nasty characters should be rather common in your test cases.

nektro - 5 days ago

hijacked scrollbar. cardinal sin.

budmichstelk - 5 days ago

[dead]

zx0r1 - 5 days ago

[flagged]