Prove you are a robot: CAPTCHAs for agents

browser-use.com

84 points by lukasec 5 days ago


Torn - 24 minutes ago

Interesting - Claude immediately refuses

     API Error: Claude Code is unable to respond to this request, which appears
     to violate our Usage Policy (https://www.anthropic.com/legal/aup). Please
     double press esc to edit your last message or start a new session for
     Claude Code to assist with a different task. If you are seeing this refusal
     epeatedly, try running /model claude-sonnet-4-20250514 to switch models.
AgentNews - 4 days ago

Pure genius! I had my agent hit the endpoint and I realized it returned a jumble of text: "if 七 wor~kers co.mplet/e{ | a job in 十七} days but 四 ] quit a^ft|e?r ^ day_ 三 ~ how many to{tal da[y;s> to fin>i?sh" but it was in japanese! Unfortunately my agent proceeded to solve the reverse CAPTCHA and got back the API key. So, I asked it to keep hitting the endpoint again until it returned another CAPTCHA that was in japanese kanji and it did (without solving it this time) and I got "a s:tore h?as ^ 二十 pe@rcent off< items- over 五十 : dollar;s and 八 ~ percent } of\f> ; i]te[ms u~nd~er: # 五十 do/ll@ars wh-ats } the c.omb>ined pri|c;e of a 一 百 二十 一 dollar item a]nd> a* 九 dollar} i!tem" And this time I was able to translate that into "a store has 20 percent off items over 50 dollars and 8 percent off items under 50 dollars what's the combined price of a 121 dollar item and a 9 dollar item?" I solved it and got 1210.8 + 90.92 = 105.08. I will admit I messed up a little bit on translating the kanji and I got a little assistance from my agent pointing out that I was wrong, but overall this was good fun, well done!

Retr0id - 7 hours ago

A small detail about humans that breaks this whole scheme is that they're capable of tool use.

efebarlas - 8 hours ago

Is it even possible to have an inverse captcha without time bounds?

Humans can use agents behind the scenes to crack it, right?

arjie - 8 hours ago

Very clever and fun. Two tangential observations: the bird between two trains problem I remember from childhood when we were studying for an Indian entrance exam. I thought it was in I E Irodov's problem anthology, but I cannot find it there so this must be a false memory. Looks like it's from ancient times, practically Mathematics mythology. Does anyone know the earliest books that have it? No luck with LLMs since it's such a common question today the answers I get from GPT-5.4 and Claude 4.6 Opus with search are unhelpful.

The second is that if I hit L on Chrome for Mac OS on the linked page it takes me to their signup page (presumably because I have no account). So that's a keyboard shortcut to take you to the browser-use app page. But why 'L'? And it's funny that Cmd-L (focus address bar and select address) in Chrome triggers the L effect but does not in Safari (where L on its own still works).

nout - 5 hours ago

If you want to check for agent that can compute stuff, then you can let it compute sha256 of some small string... that's quite tricky for humans to do by hand :)

0xOsprey - 7 hours ago

I aggregated a list of "reverse CAPTCHAs" here for anyone interested: https://x.com/0x_Osprey/status/2043020254289248469

N_Lens - 3 hours ago

Catnip for the HN crowd

not-chatgpt - 7 hours ago

Great premise but can't really agree with the execution. Felt like this makes too many implicit assumptions about LLM capabilities and traps without differentiating enough between a smart human vs AI.

estebarb - 2 hours ago

Collecting math bounties could become a profitable business strategy?

Zetaphor - 8 hours ago

Get the API key, hit the claim link, sign up for a new account, verify my email, go to the homepage:

Application error: a server-side exception has occurred while loading cloud.browser-use.com

Great first impression!

arjunchint - 7 hours ago

cool clickbait, why is this useful?

singpolyma3 - 9 hours ago

...why? Once my agent has a key I, the human, can also use it. And surely any human use would be less intensive than any agent use.

loloquwowndueo - 8 hours ago

> TL;DR: just ask your agent to summarize this post for you.

Holy shit - why don’t they produce an AI summary and plonk it in there for everyone to use? The energy savings across all people who’ll read the summary would be staggering!

bdangubic - 8 hours ago

“It is not you, it’s me” should do it

- 8 hours ago
[deleted]
chattermate - 25 minutes ago

[dead]

vicchenai - an hour ago

[dead]

kantaro - 6 hours ago

[dead]

- 7 hours ago
[deleted]
xdavidshinx1 - 7 hours ago

[dead]

jditu - 5 days ago

[dead]

leonideraturns - 8 hours ago

[dead]

lokthedev - an hour ago

[dead]

polymit - 2 hours ago

[dead]

echelon - 9 hours ago

Speaking of browser automation, are there any LLMs or tools that hook up to actual desktop browsers and can automate the keyboard and mouse?

Which LLMs best drive these? Claude/Gemini, etc., or is anything local actually competent at it?

Can they understand layout and visual cues with a VLM or multimodality?

Are they robust enough to interact with threejs and videos and whatnot, or can they just blindly navigate the DOM?