Show HN: Continue? Y/N: A 60-second game about AI agent permission fatigue

llmgame.scalex.dev

98 points by Wirbelwind 5 hours ago


spurgelaurels - 33 minutes ago

Fun game, but it showed the lack of security hygiene employed by the game writer. It said `cat ~/.zshrc` was bad because it would share tokens and secrets, but I would never put secrets into my shell rc.

xg15 - 2 hours ago

This is amazing!

Currently you can "cheat" by simply denying all requests as quickly as possible. This will give you the "security-conscious engineer" badge and a perfect score in terms of how many requests were processed. (You will get the "overblock" notification, but it's somewhat tucked away at the bottom and the screen still looks as if you won)

I also tried to play as the hustle4lyfe move fast and break things engineer and simply approved as many requests as quickly as possible - turns out, the "malicious command" popups actually slow you down. Mean!

axod - 2 hours ago

Fun little game, but I think the questions jump context so much it's a little unrepresentative. It might be better to group things into "packs", which have more real-world representative structure to them. For example, lots of "editing something.js" file permission requests, and then an "npm publish" is far more normal, and it's more of a risk, if you're used to pressing Y lots and then suddenly out of the blue...

cobbal - 2 hours ago

That's funny. It told me that blocking "npm run build" was the wrong answer. Maybe it doesn't really under The threat model.

zackify - 2 hours ago

I vibe coded a TUI that just shows running lxd containers

I hit 'n' to toggle all network access minus anthropic and openai URLs.

I use pi (sometimes claude, always on bypass) and I auto allow everything. I only toggle manual approval in rare cases like running a script or command that needs to touch a production system and I need to validate everything.

Normally my container has full write access to staging so it can debug and validate everything on its own

t-writescode - 23 minutes ago

I was told I was over protective when the text said “I need to wipe and build my project” and its first thing to do was to read the details of the (already established) package file. Why did it need to read the package file to “get context” if it was just doing a standard wipe and build?

Apparently me telling it that’s the wrong first step and saying “no” is bad; but I’ve seen AI tools waste a ton of time doing a bunch of random work before they do their job.

Liftyee - 2 hours ago

I haven't used local agentic AI yet for programming projects. Hence, -187 score

The filter for "commands I would run myself" and "commands I would let an agent run" are very different it seems.

ghrl - 2 hours ago

I am mostly using OpenCode and barely ever see a permission prompt. While they do enforce it for outside workspace read/write, with the bash tool the agent can just bypass that. I'm not quite sure why it is that way, and it certainly isn't a very good solution, but likely not worse than asking for everything which just trains the user to always accept and provides a false sense of security then.

whimblepop - 37 minutes ago

I got "overblocked" for this one:

  rm -rf node_modules && npm install
but actually if you're only removing `node_modules` and you have a working package-lock.json already, what you want is `npm ci`; `npm install` can mutate package-lock.json and potentially expose you to supply chain attacks. If you use `npm ci` I think you don't need to `rm -rf node_modules`, either.

Anyway you should generally run `npm ci` except when you're deliberately updating your actual dependencies. I'd only permit an `npm install` if I was adding or updating a dependency, or I'd just reviewed an `npm ci` failure.

sandeepkd - an hour ago

Interestingly I kept saying no to everything and some how I am a security conscious rare engineer who actually read the commands. Guess doing nothing is the safest approach from security standpoint.

Wirbelwind - 2 hours ago

Thanks all for checking it out and your suggestions!

If anyone is curious about the actual underlying risks and problems with some mitigations (like the 17% false-negative rates of Auto Mode), I wrote up a quick summary of some of the approaches here

https://scalex.dev/blog/ai-agent-permissions/

kqr - 2 hours ago

Fun! Played twice and refused all dangerous commands, with only one "over-block". Although I disagree that saying no to `kill $(lsof -t -i:3000)` is over-blocking. It's such a simple command I'd rather run it myself and be fully aware of what process I'm killing.

MeetingsBrowser - 3 hours ago

It would be cool to see the distribution of all player scores.

misbau - 2 hours ago

That was fun and gave me an idea how security conscious I am.

sukhavati - an hour ago

Reminds me of the "Papers, please" game. Glory to Arstotzka!

NewJazz - an hour ago

git reset --soft HEAD~1

Uh, how is this an overblock? It is literally a destructive command. No way I want an LLM agent rewriting my commit history. What if that commit was already pushed to a protected branch?

bspammer - 2 hours ago

To be realistic, 99% of the time it should be a totally innocuous command. If half of the commands are dangerous then you don't get fatigue because you're aware what you're doing is dangerous.

soanvig - 2 hours ago

Fun game. Can somebody run an agent against those questions to see how it performs? :)

ilaksh - an hour ago

You can turn that off with an option in most agents.

My own agent harness/framework has never had any permission system. It's also never deleted anything it shouldn't or done anything crazy or unrelated to what I asked.

sevenseacat - 2 hours ago

Continue? Y/N ── SCORE: 2,343 Security-Conscious Engineer

Caught 8/8 threats "Not a single secret leaked"

→ llmgame.scalex.dev

carterschonwald - 3 hours ago

some of the sandboxing ive been playing with gives me the best of both yolo and like logic programming tier perms on llm actions in env. still not ready for prime time though ;)

- 3 hours ago
[deleted]
nardib - 5 hours ago

Use this and save yourself:

claude --dangerously-skip-permissions

cadwell - 3 hours ago

1,640 points on my first try—I fell into a few traps, but it was really interesting. Thanks for the little game! I'm sharing it with my coworkers :)

atemerev - 2 hours ago

--dangerously-skip-permissions is the only way to fly. Of course your environment needs to be properly containerized and autobackup set up, so even rm -rf from your harness would do nothing. Life is too short to spend on replying to permissions requests.

Trung0246 - 2 hours ago

Nice got 6/6

ramonga - 2 hours ago

Score is 6711 by just saying no to everything

vgudur297 - an hour ago

[flagged]

syedofc - an hour ago

[flagged]

rvz - an hour ago

This current thread is proof of AI psychosis.