My iPhone 16 Pro Max produces garbage output when running MLX LLMs
journal.rafaelcosta.me262 points by rafaelcosta 11 hours ago
262 points by rafaelcosta 11 hours ago
Methodology is one thing; I can't really agree that deploying an LLM to do sums is great. Almost as hilarious as asking "What's moon plus sun?"
But phenomenon is another thing. Apple's numerical APIs are producing inconsistent results on a minority of devices. This is something worth Apple's attention.
(This is a total digression, so apologies)
My mind instantly answered that with "bright", which is what you get when you combine the sun and moon radicals to make 明(https://en.wiktionary.org/wiki/%E6%98%8E)
Anyway, that question is not without reasonable answers. "Full Moon" might make sense too. No obvious deterministic answer, though, naturally.
FTR the Full Moon was exactly 5 hours ago (It's not without humour that this conversation occurs on the day of the full moon :)
> What's moon plus sun?
Eclipse, obviously.
That’s sun minus moon. Moon plus sun is a wildly more massive, nuclear furnace of a moon that also engulfs the earth.
Reminds me of this AI word combination game recently shared on HN, with almost exactly these mechanics:
https://neal.fun/infinite-craft/
For the record, Sun+Moon is indeed eclipse.
>Moon plus sun is a wildly more massive, nuclear furnace of a moon that also engulfs the earth.
i just looked up mass of sun vs mass of moon (they differ by 10^30 vs 10^20), and the elemental composition of the sun: the moon would entirely disappear into the insignificant digits of trace elements which are in the range of .01 % of the sun. I could be off by orders of magnitude all over the place and it would still disappear.
This thread reminds me of Scribblenauts, the game where you conjure objects to solve puzzles by describing them. I suspect it was an inspiration for Baba Is You.
Scribblenauts was also an early precursor to modern GenAI/word embeddings. I constantly bring it up in discussions of the history of AI for this reason.
Not obvious. Astronomers are actively looking for signatures of exomoons around exoplanets. So "sun plus moon" could mean that too.
The OP said moon + sun, rather than sun + moon. We have no idea yet if celestial math is non-communicative.
Well you find the signature by looking for a dip in but sun's luminosity. So minus might be the better relationship here
I wish he would have tried on a different iPhone 16 Pro Max to see if the defect was specific to that individual device.
So true! And as any sane Apple user or the standard template Apple Support person would have suggested (and as they actually suggest) - did they try reinstalling the OS from scratch after having reset the data (of course before backing it up; preferably with a hefty iCloud+ plan)? Because that's the thing to do in such issues and it's very easy.
Reinstalling the OS sucks. I need to pull all my bank cards out of my safe and re-add their CVV's to the wallet, and sometimes authenticate over the phone. And re-register my face. And log back in to all my apps. It can take an hour or so, except it's spread out over weeks as I open an app and realize I need to log in a dozen times.
Latest update at the bottom of the page.
"Well, now it's Feb. 1st and I have an iPhone 17 Pro Max to test with and... everything works as expected. So it's pretty safe to say that THAT specific instance of iPhone 16 Pro Max was hardware-defective."
Low level numerical operation optimizations are often not reproduceable. For example: https://www.intel.com/content/dam/develop/external/us/en/doc... (2013)
But it's still surprising that that LLM doesn't work on iPhone 16 at all. After all LLMs are known for their tolerance to quantization.
Yes, "floating point accumulation doesn't commute" is a mantra everyone should have in their head, and when I first read this article, I was jumping at the bit to dismiss it out of hand for that reason.
But, what got me about this is that:
* every other Apple device delivered the same results
* Apple's own LLM silently failed on this device
to me that behavior suggests an unexpected failure rather than a fundamental issue; it seems Bad (TM) that Apple would ship devices where their own LLM didn't work.
> floating point accumulation doesn't commute
It is commutative (except for NaN). It isn't associative though.
I think it commutes even when one or both inputs are NaN? The output is always NaN.
NaNs are distinguishable. /Which/ NaN you get doesn't commute.
I guess at the bit level, but not at the level of computation? Anything that relies on bit patterns of nans behaving in a certain way (like how they propagate) is in dangerous territory.
> Anything that relies on bit patterns of nans behaving in a certain way (like how they propagate) is in dangerous territory.
Why? This is well specified by IEEE 754. Many runtimes (e.g. for Javascript) use NaN boxing. Treating floats as a semi-arbitrary selection of rational numbers plus a handful of special values is /more/ correct than treating them as real numbers, but treating them as actually specified does give more flexibility and power.
Can you show me where in the ieee spec this is guaranteed?
My understanding is the exact opposite - that it allows implementations to return any NaN value at all. It need not be any that were inputs.
It may be that JavaScript relies on it and that has become more binding than the actual spec, but I don't think the spec actually guarantees this.
Edit: actually it turns out nan-boxing does not involve arithmetic, which is why it works. I think my original point stands, if you are doing something that relies on how bit values of NaNs are propagated during arithmetic, you are on shaky ground.
Don't have the spec handy, but specifically binary operations combining two NaN inputs must result in one of the input NaNs. For all of Intel SSE, AMD SSE, PowerPC, and ARM, the left hand operand is returned if both are signaling or both or quiet. x87 does weird things (but when doesn't it?), and ARM does weird things when mixing signaling and quiet NaNs.