Why are anime catgirls blocking my access to the Linux kernel?

lock.cmpxchg8b.com

372 points by taviso 14 hours ago


johnklos - 6 hours ago

This is a usually technical crowd, so I can't help but wonder if many people genuinely don't get it, or if they are just feigning a lack of understanding to be dismissive of Anubis.

Sure, the people who make the AI scraper bots are going to figure out how to actually do the work. The point is that they hadn't, and this worked for quite a while.

As the botmakers circumvent, new methods of proof-of-notbot will be made available.

It's really as simple as that. If a new method comes out and your site is safe for a month or two, great! That's better than dealing with fifty requests a second, wondering if you can block whole netblocks, and if so, which.

This is like those simple things on submission forms that ask you what 7 + 2 is. Of course everyone knows that a crawler can calculate that! But it takes a human some time and work to tell the crawler HOW.

eqvinox - an hour ago

TFA — and most comments here — seem to completely miss what I thought was the main point of Anubis: it counters the crawler's "identity scattering"/sybil'ing/parallel crawling.

Any access will fall into either of the following categories:

- client with JS and cookies. In this case the server now has an identity to apply rate limiting to, from the cookie. Humans should never hit it, but crawlers will be slowed down immensely or ejected. Of course the identity can be rotated — at the cost of solving the puzzle again.

- amnesiac (no cookies) clients with JS. Each access is now expensive.

(- no JS - no access.)

The point is to prevent parallel crawling and overloading the server. Crawlers can still start an arbitrary number of parallel crawls, but each one costs to start and needs to stay below some rate limit. Previously, the server would collapse under thousands of crawler requests per second. That is what Anubis is making prohibitively expensive.

Arnavion - 9 hours ago

>This dance to get access is just a minor annoyance for me, but I question how it proves I’m not a bot. These steps can be trivially and cheaply automated.

>I think the end result is just an internet resource I need is a little harder to access, and we have to waste a small amount of energy.

No need to mimic the actual challenge process. Just change your user agent to not have "Mozilla" in it; Anubis only serves you the challenge if it has that. For myself I just made a sideloaded browser extension to override the UA header for the handful of websites I visit that use Anubis, including those two kernel.org domains.

(Why do I do it? For most of them I don't enable JS or cookies for so the challenge wouldn't pass anyway. For the ones that I do enable JS or cookies for, various self-hosted gitlab instances, I don't consent to my electricity being used for this any more than if it was mining Monero or something.)

ChocolateGod - 6 minutes ago

[delayed]

thayne - 14 minutes ago

I can't find any documentation that says Anubis does this, (although it seems odd to me that it wouldn't, and I'd love a reference) but it could do the following:

1. Store the nonce (or some other identifier) of each jwt it passes out in the data store

2. Track the number or rate of requests from each token in the data store

3. If a token exceeds the rate limit threshold, revoke the token (or do some other action, like tarpit requests with that token, or throttle the requests)

Then if a bot solves the challenge it can only continue making requests with the token if it is well behaved and doesn't make requests too quickly.

It could also do things like limit how many tokens can be given out to a single ip address at a time to prevent a single server from generating a bunch of tokens.

ksymph - 14 hours ago

This is neither here nor there but the character isn't a cat. It's in the name, Anubis, who is an Egyptian deity typically depicted as a jackal or generic canine, and the gatekeeper of the afterlife who weighs the souls of the dead (hence the tagline). So more of a dog-girl, or jackal-girl if you want to be technical.

bawolff - 5 hours ago

> This… makes no sense to me. Almost by definition, an AI vendor will have a datacenter full of compute capacity. It feels like this solution has the problem backwards, effectively only limiting access to those without resources or trying to conserve them.

Counterpoint - it seems to work. People use anubis because its the best of bad options.

If theory and reality disagree, it means either you are missing something or your theory is wrong.

rootsudo - 9 hours ago

When I instantly read it, I knew it was anubis. I hope the anime catgirls never disapear from that project :)

sidewndr46 - 5 hours ago

> The CAPTCHA forces vistors to solve a problem designed to be very difficult for computers but trivial for humans

I'm an unsure if this deadpan humor or if the author has never tried to solve a CAPTCHA that is something like "select the squares with an orthodox rabbi present"

userbinator - 3 hours ago

As I've been saying for a while now - if you want to filter for only humans, ask questions only a human can easily answer; counting the number of letters in a word seems to be a good way to filter out LLMs, for example. Yes, that can be relatively easily gotten around, just like Anubis, but with the benefit that it doesn't filter out humans and has absolutely minimal system requirements (a browser that can submit HTML forms), possibly even less than the site itself.

There are forums which ask domain-specific questions as a CAPTCHA upon attempting to register an account, and as someone who has employed such a method, it is very effective. (Example: what nominal diameter is the intake valve stem on a 1954 Buick Nailhead?)

sugarpimpdorsey - 8 hours ago

Every time I see one of these I think it's a malicious redirect to some pervert-dwelling imageboard.

On that note, is kernel.org really using this for free and not the paid version without the anime? Linux Foundation really that desperate for cash after they gas up all the BMW's?

ok123456 - 5 hours ago

Why is kernel.org doing this for essentially static content? Cache control headers and ETAGS should solve this. Also, the Linux kernel has solved the C10K problem.

leumon - 9 hours ago

Seems like ai bots are indeed bypassing the challenge by computing it: https://social.anoxinon.de/@Codeberg/115033790447125787

bogwog - 9 hours ago

I wonder if the best solution is still just to create link mazes with garbage text like this: https://blog.cloudflare.com/ai-labyrinth/

It won't stop the crawlers immediately, but it might lead to an overhyped and underwhelming LLM release from a big name company, and force them to reassess their crawling strategy going forward?

hansjorg - 8 hours ago

If you want a tip my friend, just block all of Huawei Cloud by ASN.

jimmaswell - 14 hours ago

What exactly is so bad about AI crawlers compared to Google or Bing? Is there more volume or is it just "I don't like AI"?

xphos - 6 hours ago

Yeah the PoW is minor for botters but annoying people. I think the only positive is if enough people see anime girls on there screens there might actually be political pressure to make laws against rampent bot crawling

listic - 8 hours ago

So... Is Anubis actually blocking bots because they didn't bother to circumvent it?

extraduder_ire - 5 hours ago

With the asymmetry of doing the PoW in javascript versus compiled c code, I wonder if this type of rate limiting is ever going to be directly implemented into regular web browsers. (I assume there's already plugins for curl/wget)

Other than Safari, mainstream browsers seem to have given up on considering browsing without javascript enabled a valid usecase. So it would purely be a performance improvement thing.

auggierose - an hour ago

Would it not be more effective just to require payment for accessing your website? Then you don't need to care about bot or not.

heap_perms - 6 hours ago

> I host this blog on a single core 128MB VPS

No wonder the site is being hugged to death. 128MB is not a lot. Maybe it's worth to upgrade if you post to hacker news. Just a thought.

iefbr14 - 13 hours ago

I wouldn't be surprised if just delaying the server response by some 3 seconds will have the same effect on those scrapers as Anubis claims.

galaxyLogic - 3 hours ago

I think the solution to captcha-rot is micro-payments. It does consume resources to serve a web-page so whose gonna pay for that?

If you want to do advertisement then don't require a payment, and be happy that crawlers will spread your ad to the users of AI-bots.

If you are a non-profit-site then it's great to get a micro-payment to help you maintain and run the site.

johnisgood - 7 hours ago

I like hashcash.

https://github.com/factor/factor/blob/master/extra/hashcash/...

https://bitcoinwiki.org/wiki/hashcash

Borg3 - 8 hours ago

Oh, its time to bring Internet back to humans. Maybe its time to treat first layer of Internet just as transport. Then, layer large VPN networks and put services there. People will just VPN to vISP to reach content. Different networks, different interests :) But this time dont fuck up abuse handling. Someone is doing something fishy? Depeer him from network (or his un-cooperating upstream!).

herf - 4 hours ago

We deployed hashcash for a while back in 2004 to implement Picasa's email relay - at the time it was a pretty good solution because all our clients were kind of similar in capability. Now I think the fastest/slowest device is a broader range (just like Tavis says), so it is harder to tune the difficulty for that.

qwertytyyuu - 5 hours ago

Isn’t animus a dog? So it should be anime dog/wolf girl rather than cat girl?

andromaton - 7 hours ago

Hug of death https://archive.ph/BSh1l

spiritplumber - 3 hours ago

For the same reason why cats sit on your keyboard. Because they can

ksymph - 13 hours ago

Reading the original release post for Anubis [0], it seems like it operates mainly on the assumption that AI scrapers have limited support for JS, particularly modern features. At its core it's security through obscurity; I suspect that as usage of Anubis grows, more scrapers will deliberately implement the features needed to bypass it.

That doesn't necessarily mean it's useless, but it also isn't really meant to block scrapers in the way TFA expects it to.

[0] https://xeiaso.net/blog/2025/anubis/

Philpax - 14 hours ago

The argument isn't that it's difficult for them to circumvent - it's not - but that it adds enough friction to force them to rethink how they're scraping at scale and/or self-throttle.

I personally don't care about the act of scraping itself, but the volume of scraping traffic has forced administrators' hands here. I suspect we'd be seeing far fewer deployments if the scrapers behaved themselves to begin with.

jchw - 6 hours ago

> This… makes no sense to me. Almost by definition, an AI vendor will have a datacenter full of compute capacity. It feels like this solution has the problem backwards, effectively only limiting access to those without resources or trying to conserve them.

A lot of these bots consume a shit load of resources specifically because they don't handle cookies, which causes some software (in my experience, notably phpBB) to consume a lot of resources. (Why phpBB here? Because it always creates a new session when you visit with no cookies. And sessions have to be stored in the database. Surprise!) Forcing the bots to store cookies to be able to reasonably access a service actually fixes this problem altogether.

Secondly, Anubis specifically targets bots that try to blend in with human traffic. Bots that don't try to blend in with humans are basically ignored and out-of-scope. Most malicious bots don't want to be targeted, so they want to blend in... so they kind of have to deal with this. If they want to avoid the Anubis challenge, they have to essentially identify themselves. If not, they have to solve it.

Finally... If bots really want to durably be able to pass Anubis challenges, they pretty much have no choice but to run the arbitrary code. Anything else would be a pretty straight-forward cat and mouse game. And, that means that being able to accelerate the challenge response is a non-starter: if they really want to pass it, and not appear like a bot, the path of least resistance is to simply run a browser. That's a big hurdle and definitely does increase the complexity of scraping the Internet. It increases more the more sites that use this sort of challenge system. While the scrapers have more resources, tools like Anubis scale the resources required a lot more for scraping operations than it does a specific random visitor.

To me, the most important point is that it only fights bot traffic that intentionally tries to blend in. That's why it's OK that the proof-of-work challenge is relatively weak: the point is that it's non-trivial and can't be ignored, not that it's particularly expensive to compute.

If bots want to avoid the challenge, they can always identify themselves. Of course, then they can also readily be blocked, which is exactly what they want to avoid.

In the long term, I think the success of this class of tools will stem from two things:

1. Anti-botting improvements, particularly in the ability to punish badly behaved bots, and possibly share reputation information across sites.

2. Diversity of implementations. More implementations of this concept will make it harder for bots to just hardcode fastpath challenge response implementations and force them to actually run the code in order to pass the challenge.

I haven't kept up with the developments too closely, but as silly as it seems I really do think this is a good idea. Whether it holds up as the metagame evolves is anyone's guess, but there's actually a lot of directions it could be taken to make it more effective without ruining it for everyone.

0003 - 4 hours ago

Soon any attempt to actually do it would indicate you're a bot.

fluoridation - 13 hours ago

Hmm... What if instead of using plain SHA-256 it was a dynamically tweaked hash function that forced the client to run it in JS?

serf - 8 hours ago

I don't care that they use anime catgirls.

What I do care about is being met with something cutesy in the face of a technical failure anywhere on the net.

I hate Amazon's failure pets, I hate google's failure mini-games -- it strikes me as an organizational effort to get really good at failing rather than spending that same effort to avoid failures all together.

It's like everyone collectively thought the standard old Apache 404 not found page was too feature-rich and that customers couldn't handle a 3 digit error, so instead we now get a "Whoops! There appears to be an error! :) :eggplant: :heart: :heart: <pet image.png>" and no one knows what the hell is going on even though the user just misplaced a number in the URL.

jmclnx - 9 hours ago

>The CAPTCHA forces vistors to solve a problem designed to be very difficult for computers but trivial for humans

Not for me, I have nothing but a hard time solving CAPTCHAs, ahout 50% of the time I give up after 2 tries.

johnea - 9 hours ago

My biggest bitch is that it requires JS and cookies...

Although the long term problem is the business model of servers paying for all network bandwidth.

Actual human users have consumed a minority of total net bandwidth for decades:

https://www.atom.com/blog/internet-statistics/

Part 4 shows bots out using humans in 1996 8-/

What are "bots"? This needs to include goggleadservices, PIA sharing for profit, real-time ad auctions, and other "non-user" traffic.

The difference between that and the LLM training data scraping, is that the previous non-human traffic was assumed, by site servers, to increase their human traffic, through search engine ranking, and thus their revenue. However the current training data scraping is likely to have the opposite effect: capturing traffic with LLM summaries, instead of redirecting it to original source sites.

This is the first major disruption to the internet's model of finance since ad revenue look over after the dot bomb.

So far, it's in the same category as the environmental disaster in progress, ownership is refusing to acknowledge the problem, and insisting on business as usual.

Rational predictions are that it's not going to end well...

zb3 - 8 hours ago

Anubis doesn't use enough resources to deter AI bots. If you really want to go this way, use React, preferably with more than one UI framework.

ge96 - 8 hours ago

Oh I saw this recently on ffmpeg's site, pretty fun

raffraffraff - 8 hours ago

HN hug of death

WesolyKubeczek - 14 hours ago

I disagree with the post author in their premise that things like Anubis are easy to bypass if you craft your bot well enough and throw the compute at it.

Thing is, the actual lived experience of webmasters tells that the bots that scrape the internets for LLMs are nothing like crafted software. They are more like your neighborhood shit-for-brain meth junkies competing with one another who makes more robberies in a day, no matter the profit.

Those bots are extremely stupid. They are worse than script kiddies’ exploit searching software. They keep banging the pages without regard to how often, if ever, they change. If they were 1/10th like many scraping companies’ software, they wouldn’t be a problem in the first place.

Since these bots are so dumb, anything that is going to slow them down or stop them in their tracks is a good thing. Short of drone strikes on data centers or accidents involving owners of those companies that provide networks of botware and residential proxies for LLM companies, it seems fairly effective, doesn’t it?

yuumei - 14 hours ago

> The CAPTCHA forces vistors to solve a problem designed to be very difficult for computers but trivial for humans. > Anubis – confusingly – inverts this idea.

Not really, AI easily automates traditional captchas now. At least this one does not need extensions to bypass.

lxgr - 14 hours ago

> This isn’t perfect of course, we can debate the accessibility tradeoffs and weaknesses, but conceptually the idea makes some sense.

It was arguably never a great idea to begin with, and stopped making sense entirely with the advent of generative AI.

anotherhue - 14 hours ago

Surely the difficulty factor scales with the system load?

jonathanyc - 5 hours ago

> The idea of “weighing souls” reminded me of another anti-spam solution from the 90s… believe it or not, there was once a company that used poetry to block spam!

> Habeas would license short haikus to companies to embed in email headers. They would then aggressively sue anyone who reproduced their poetry without a license. The idea was you can safely deliver any email with their header, because it was too legally risky to use it in spam.

Kind of a tangent but learning about this was so fun. I guess it's ultimately a hack for there not being another legally enforceable way to punish people for claiming "this email is not spam"?

IANAL so what I'm saying is almost certainly nonsense. But it seems weird that the MIT license has to explicitly say that the licensed software comes with no warranty that it works, but that emails don't have to come with a warranty that they are not spam! Maybe it's hard to define what makes an email spam, but surely it is also hard to define what it means for software to work. Although I suppose spam never e.g. breaks your centrifuge.

lousken - 14 hours ago

aren't you happy? at least you see catgirl

immibis - 11 hours ago

The actual answer to how this blocks AI crawlers is that they just don't bother to solve the challenge. Once they do bother solving the challenge, the challenge will presumably be changed to a different one.

tonymet - 7 hours ago

So it's a paywall with -- good intentions -- and even more accessibility concerns. Thus accelerating enshittification.

Who's managing the network effects? How do site owners control false positives? Do they have support teams granting access? How do we know this is doing any good?

It's convoluted security theater mucking up an already bloated , flimsy and sluggish internet. It's frustrating enough to guess schoolbuses every time I want to get work done, now I have to see porfnified kitty waifus

(openwrt is another community plagued with this crap)

xena - 12 hours ago

[dead]

pinoy420 - 4 hours ago

[dead]

throwaway984393 - 5 hours ago

[dead]

slowdoorsemillc - 14 hours ago

[dead]

naikrovek - 8 hours ago

[flagged]

easterncalculus - 9 hours ago

[flagged]

PaulHoule - 14 hours ago

[flagged]

alt187 - 3 hours ago

[flagged]

senectus1 - 5 hours ago

the action is great, anubis is a very clever idea i love it.

I'm not a huge fan of the anime thing, but i can live with it.

superkuh - 9 hours ago

Kernel.org* just has to actually configure Anubis rather than deploying the default broken config. Enable the meta-refresh proof of work rather than relying on the corporate browsers only bleeding edge javascript application proof of work.

* or whatever site the author is talking about, his site is currently inaccessible due to the amount of people trying to load it.

efilife - 8 hours ago

This cartoon mascot has absolutely nothing to do with anime

If you disagree, please say why

valiant55 - 9 hours ago

I really don't understand the hostility towards the mascot. I can't think of a bigger red flag.

rnhmjoj - 14 hours ago

I don't understand, why do people resort to this tool instead of simply blocking by UA string or IP address. Are there so many people running these AI crawlers?

I blackholed some IP blocks of OpenAI, Mistral and another handful of companies and 100% of this crap traffic to my webserver disappeared.

jayrwren - 14 hours ago

literally the top link when I search for his exact text "why are anime catgirls blocking my access to the Linux kernel?" https://lock.cmpxchg8b.com/anubis.html Maybe travis needs more google-fu. maybe that includes using duckduckgo?