State of AI: An Empirical 100T Token Study with OpenRouter

openrouter.ai

184 points by anjneymidha 12 hours ago


lukev - 10 hours ago

Super interesting data.

I do question this finding:

> the small model category as a whole is seeing its share of usage decline.

It's important to remember that this data is from OpenRouter... a API service. Small models are exactly those that can be self-hosted.

It could be the case that total small model usage has actually grown, but people are self-hosting rather than using an API. OpenRouter would not be in a position to determine this.

greatgib - 8 hours ago

I like to see stats like that, but I find it very concerning that OpenRouter don't mind inspecting its user/customer data without shame.

Even if you pretend that the classifier respect anonymity, if I pay for the inference, I would expect that it would be a closed tube with my privacy respected. If at least it was for "safety" checks, I don't like that but I would almost understand, now it is for them to have "marketing data".

Imagine, and regarding the state of the world it might come soon, that you have whatsapp or telegram that inspect all the messages that you send to give reports like:

- 20% of our users speak about their health issues

- 30% of messages are about annoying coworkers

- 15% are messages comparing dick sizes

majdalsado - 9 hours ago

Very interesting how Singapore ranks 2nd in terms of token volume. I wonder if this is potentially Chinese usage via VPN, or if Singaporean consumers and firms are dominating in AI adoption.

Also interesting how the 'roleplaying' category is so dominant, makes me wonder if Google's classifier sees a system prompt with "Act as a X" and classifies that as roleplay vs the specific industry the roleplay was intended to serve.

syspec - 10 hours ago

According to the report, 52% of all open-source AI is used for *roleplaying*. They attribute it to fewer content filters and higher creativity.

I'm pretty surprised by that, but I guess that also selects for people who would use openrouter

m0rde - 9 hours ago

> The noticeable spike [~20 percentage points] in May in the figure above [tool invocations] was largely attributable to one sizable account whose activity briefly lifted overall volumes.

The fact that one account can have such a noticeable effect on token usage is kind of insane. And also raises the question of how much token usage is coming from just one or five or ten sizeable accounts.

armcat - 4 hours ago

These are fantastic insights! I work in legaltech space so something to keep in mind is that legal space is very sensitive to data storage and security (apart from this of course: https://alexschapiro.com/security/vulnerability/2025/12/02/f...). So models hosted in e.g. Azure, or on-prem deployments are more common. I have friends in health space and similar story there. Finance (banking especially) is the same. Hence why those categories look more or less constant over time, and have smallest contributions in this study.

paulirish - 8 hours ago

I worry that OpenRouter's Apps leaderboard incentivizes tools (e.g. Cline/Kilo) to burn through tokens to climb the ranks, meanwhile penalizing being context-efficient.

https://openrouter.ai/rankings#apps

trebligdivad - 8 hours ago

The 'Glass slipper' idea makes sense to me; people have a bunch of different ideas to try on AIs, and try it as new models come out, and once a model does it well they stick with it for a while.

adidoit - 2 hours ago

With studies like these it's important to keep in mind selection effects.

Most of the high volume enterprise use cases use their cloud providers (e.g., azure)

What we have here is mostly from smaller players. Good data but obviously a subset of the inference universe.

IgorPartola - 8 hours ago

Here is the thing: they made good enough open weight models available and affordable, then found that people used them more than before. I am not trying to diminish the value here but I don’t think this is the headline.

sosodev - 10 hours ago

The open weight model data is very interesting. I missed the release of Minimax M2. The benchmarks seem insanely impressive for its size. I would suspect benchmaxing but why would people be using it if it wasn’t useful?

themanmaran - 11 hours ago

> The metric reflects the proportion of all tokens served by reasoning models, not the share of "reasoning tokens" within model outputs.

I'd be interested in a clarification on the reasoning vs non-reasoning metric.

Does this mean the reasoning total is (input + reasoning + output) tokens? Or is it just (input + output).

Obviously the reasoning tokens would add a ton to the overall count. So it would be interesting to see it on an apples to apples comparison with non reasoning models.

asadm - 10 hours ago

Who is using grok code and why?

skywhopper - 9 hours ago

This is interesting, but I found it moderately disturbing that they spend a LOT of effort up front talking about how they don’t have any access to the prompts or responses. And then they reveal that they did actually have access to the text and they spend 80% of the rest of the paper analyzing the content.

swyx - 6 hours ago

my highlights of this report: https://news.smol.ai/issues/25-12-04-openrouter

shubhamjain - 6 hours ago

I am a person who wants to maintain a distance from the AI-hype train, but seeing a chart like this [1], I can't help think that we are nowhere near the peak. The weekly token consumption keeps on rising, and it's already in trillions, and this ignores a lot of consumption happening directly through APIs.

Nvidia could keep delivering record-breaking numbers, and we may well see multiple companies hit six, seven, or even eight trillion dollars in market cap within a couple of years. While I am skeptical of claims like AI will make programming obsolete, but it’s clear that the adoption is still going like crazy and it's hard to anticipate when the plateau happens.

[1]: https://openrouter.ai/state-of-ai#open-vs_-closed-source-mod...

meander_water - 9 hours ago

Overall really interesting read, but I'm having trouble processing this:

> OpenRouter performs internal categorization on a random sample comprising approximately 0.25% of all prompts

How can you arrive at any conclusion with such a small random sample size?

adamraudonis - 4 hours ago

Very cool study

typs - 11 hours ago

This is really amazing data. Super interesting read

nextworddev - 9 hours ago

*State of non-enterprise, indie AI

All this data confirms that OpenRouter’s enterprise ambitions will fail. It’s a nice product for running Chinese models tho