Was my $48K GPU server worth it?
rosmine.ai499 points by apwheele 4 days ago
499 points by apwheele 4 days ago
In the last year, I have bought an M3 Ultra Mac Studio with 512 GB, a Macbook Pro M5 MAX with 128 GB and an RTX 6000 Pro. I have spent around $25k so far, not including electricity. I figured worst case scenario I can sell them in the next year and only take a haircut as opposed to losing my entire investment.
In comparison to just spending for tokens, the tokens would have been much cheaper and much much faster. I've been running against Gemma4:31b, Qwen3.5 and 3.6, and getting local LLMs to solve AMC 8/10 math questions and it's about 10-100x slower than just doing it online. When I tried it with ChatGPT late last year, it took about one night and $25 to solve about 1000 questions. Using my RTX 6000 and M3 Ultra and Gemma4:31b on both, it answered about 40 questions in 7 hours and I haven't checked how good the answer is yet. At 800 watts (600 for RTX and 200 for M3 Ultra) and running for 7 hours, it solved around 40 questions.
At the very least I'm going to try to sell my M3 Ultra if I can find a reliable place to sell it without getting ripped off by scammers.
This is, sadly, obvious and inevitable in retrospect.
The two major drivers of inference costs are GPUs and electricity. You can't get cheaper GPUs, but you can make existing GPUs not sit idle, and you do that by utilizing them 24/7, processing user B's request when user A is thinking, and handling many requests in parallel, neither of which you can do as an individual. You can get cheaper electricity... by moving, and it's much easier to move your AI workload than to move yourself.
This is a completely different dynamic than renting houses or apartments, as you can't really rent out the same house to different people at different times of day.
Yea. LLM inference requires batch processing to have a shred of hope at being cost efficient. Batch processing requires a not so insignificant amount of scale (but probably not as much as people think).
I'm very pro local models, but not to have parity with SoTA frontier models. Just contextually trained small models doing smaller specific tasks.
Trying to run bigger LLMs for an individual user to do big tasks is not going to be a good time.
On top of that, AI providers are also eating a big loss on the service.
Are they? I only ever see unsubstantiated claims for this whereas I see many justifications that interference is comfortably profitable in isolation.
SpaceX's has disclosed that they're loosing $2Bln a quarter on A.I - and rising - in their IPO documents.
Anthropic told the Department of War-nee-Defence that they'd made $5bln total, which is a lot LOT less than what they're spending.
We'll see what's in OpenAi's IPO later this year I guess. I'll be very surprised if they're losing less that $100bln a year.
Its basic math, go calculate max sessions for a certain tps on any hardware. Session# * tps * 86400 (secs in a day) * 30 days.
You'll realize real quick its not profitible. You cant just say things you don't like to hear are unsubstantiated without verifying.
Not to mention, subscriptions.. $2mm in GPUs being given out for 5 hrs a day at a cost of $200 a month.
I could easily say that everyone who says its profitible is msking unsubstantiated claims lol.
To some degree I think there's a hope that it becomes like a gym membership. If everybody used their membership, the gym would be too crowded. It's all of those memberships that people feel like they need to have but don't use where the extra profit comes in.
As long as the power users are paying per token, everything is good.
>Its basic math
Yes, once you have modeled the problem correctly and you know all the input parameters. This is not that: Session# * tps * 86400 (secs in a day) * 30 days.
I don't think there is enough public information to check Anthropic's claims regarding inference profitability. It depends not just on unknown technical factors but also on agreements they have with other companies.
We should specify which subscription plan we are talking about. You seem to be talking about the Anthropic Claude Max plan. I think it's consensus that these flat rate type of subscriptions are loss leaders, as they come with restrictions how you can use the API via T&C, namely only with Claude Code et al. They are meant to hook developers into their products.
Shouldn't we compare the API pricing, where we pay per token? The whole point of local inference is that we don't have any restrictions regarding product use or time limits, so it would only be fair if we compare it to a plan that offers the same. And even that is only a first approximation, because the commercial models are usually much more capable than the open weight models.
> I could easily say that everyone who says its profitible is msking unsubstantiated claims lol.
And people who don't understand the difference between capex and opex are making uneducated claims. It's not basic math.
Running an inference data center is a mix of variable and fixed costs. The fixed costs are currently in the billions of billions of dollars for pretty much any investment in this space. Many of those fixed costs have (currently) unknown refresh cycles. So, unless you have access to the financial books of these companies it's currently just speculation whether inference is profitable.
You got numbers? Because it seems perfectly possible to me. OpenAI and Anthropic’s marginal cost for inference is certainly far less than their API pricing.
See: https://www.wheresyoured.at/ He's been "numbering" for quite a while now.
Everything there is extremely speculative and I don't see anything that contradicts that inference itself could be profitable at massive scale. See https://youtu.be/xmkSf5IS-zw for example.
If the companies as a whole are destined to be profitable, or worth their valuations is a very different question. The only people who can truely answer that have time machines.
How can you say that with such certainty? You have no idea what it costs to run a 10T parameter model at extremely high concurrency.
These 1T param models running at <$3.00 per 1mm are certainly not profitable.
Because I’ve looked at what it would cost my company to self-host a SOTA sized model. For us it wasn’t worth it because the hardware is all bought up by frontier labs and we can’t get any supply. But if we could, at the prices they’re paying, it would pay for itself in 10-ish months. I assume further that they have economies of scale on top of what I was estimating.
Supposedly Anthropic just reported that they’re operationally profitable. So maybe not?
"operationally" implies that capex (which I would assume includes datacenters, gpus, and r&d) is not in. So the big news is that they can now pay for electricity and sysadmin.
I believe they also excluded stock-based compensation from their calculation, which could easily tip them in the non-profitable direction.
You can definitely run many requests in parallel as a single user, you just have to be OK with a significant slowdown for any single request. Cloud inference can't reach that ratio of total throughput per hardware cost since they are heavily incented to get the most expensive hardware available and to then minimize latency (and RAM occupation over time) even at the cost of throughput. Running slower inference with cheaper hardware is just not workable in a cloud setting.
Historically it was not uncommon for beds to be rented out to multiple people.
The word for this type of boarding is “flophouse.”
This is the type of place one might be “waiting for the other shoe to drop.” Which carries a variety of potential meanings in this moment of AI.
Tangentially related: Mack and the boys lived in the “Palace Flophouse and Grill” in Cannery Row.
I suppose I must have looked up flophouse when reading all the Steinbeck I could get my hands on and it’s stuck w me.
It is unfortunately still common practice among irregular agricultural workers in many parts of the world (I’m Italian so I definitely remember news about busts in southern Italy)
Yeah there are good accounts of this in Down and Out in Paris and London and also one of Hemingway's books - forgot which one.
High usage seems to change the economics. The author of the article had a payback period of about 14 months which is excellent by any standards and an order of magnitude better than rent vs buy for a house in most places.
I’m not usually one to ask this because learning to do a thing can be fun, but why exactly have you spent 25 thousand dollars on getting an LLM someone else made to answer maths exam questions?
The cost is obviously not that big of factor for OP as it might be for others. It's actually refreshing to hear the candid viewpoint that he expresses here.
25k is definitely a lot but I did the risk analysis and I figured worst case I would lose a 1000-2000 after a year of playing around with it, so I look at it more like renting (I'm going to keep the Macbook Pro no matter what since I needed a new one).
Nitpicking, but the worst case of spending $25k is unforeseen circumstances that write off the entire asset. I don’t think -$2000 is a conservative enough figure for standard depreciation either (a lot can happen in a year)
I wouldn't call this nitpicking. This is how people who are careful with money think. I learned embarrassingly late to stop justifying purchases by making predictions about future returns. I treat everything as having zero value as soon as I purchase it. Thinking otherwise is, for me, always a dangerous rationalization -- always a craving that's trying to outmaneuver sense.
Either I don't understand the used apple market.. or I agree this is crazy. Someone spends $25k on new hardware, waits a year, and expects to sell it for $23k? Unless the ram issues save him, and cost of new goes up, I don't see how that was going to work.
Well, Apple is literally not offering the M3 512GB studios currently. You can’t even back order one.
They are selling on EBay for over $20k, used.
It’s hard to know if any of these eBay listing are real or actual sales. Lots of scams.
The ones sold for $25k from established sellers are legit. Filter by "sold."
The 0-reputation account in Spain selling an M3U 512GB for $4200 is 100% fraud.