Apple M3 Ultra

1077 points by ksec 4 months ago

512GB of unified memory is truly breaking new ground. I was wondering when Apple would overcome memory constraints, and now we're seeing a half-terabyte level of unified memory. This is incredibly practical for running large AI models locally ("600 billion parameters"), and Apple's approach of integrating this much efficient memory on a single chip is fascinating compared to NVIDIA's solutions. I'm curious about how this design of "fusing" two M3 Max chips performs in terms of heat dissipation and power consumption though

FloatArtifact - 4 months ago

They didn't increase the memory bandwidth. You can get the same memory bandwidth, which is available on the M2 Studio. Yes, yes, of course you can get 512 gigabytes of uRAM for 10 grand.
The the question is if a llm will run with usable performance at that scale? The point is there's diminishing returns despite having enough uRAM with the same amount of memory bandwidth even with increased processing speed of the new chip for AI.
So there must be a min-max performance ratio between memory bandwidth and the size of the memory pool in relation to the processing power.
- lhl - 4 months ago
  
  Since no one specifically answered your question yet, yes, you should be able to get usable performance. A Q4_K_M GGUF of DeepSeek-R1 is 404GB. This is a 671B MoE that "only" has 37B activations per pass. You'd probably expect in the ballpark of 20-30 tok/s (depends on how much actually MBW can be utilized) for text generation.
  From my napkin math, the M3 Ultra TFLOPs is still relatively low (around 43 FP16 TFLOPs?), but it should be more than enough to handle bs=1 token generation (should be way <10 FLOPs/byte for inference). Now as far is its prefill/prompt processing speed... well, that's another matter.
  - lynguist - 4 months ago
    
    I actually think it’s not a coincidence and they specifically built this M3 Ultra for DeepSeek R1 4-bit. They also highlight in their press release that they tested it with 600B class LLMs (DeepSeek R1 without referring to it by name). And they specifically did not stop at 256 GB RAM to make this happen. Maybe I’m reading too much into it.
    
    tgma - 4 months ago
    
    Pretty sure this has absolutely nothing to do with Deepseek and even local LLM at large, which has been a thing for a while and an obvious use case original Llama leak and llama.cpp coming around.
    Fact is Mac Pros in the Intel days supported 1.5TB RAM in some configurations[1] and that was 6 years ago expectations of their high end customer base. They needed to address the gap for those customers so they would have shipped such a product regardless. Local LLM is cherry-on-top. Deepseek in particular almost certainly had nothing to do with it. They will still need to double their supported RAM in their SoC to get there. Perhaps in a Mac Pro or a different quad-Max-glued chip.
    [1]: https://support.apple.com/en-us/101639
    
    saagarjha - 4 months ago
    
    The thing that people are excited about here is unified memory that the GPU can address. Mac Pro had discrete GPUs with their own memory.
    
    tgma - 4 months ago
    
    I understand why they are excited about it—just pointing out it is a happy coincidence. They would have and should have made such a product to address the need of RAM users alone, not VRAM in particular, before they have a credible case to cut macOS releases on Intel.
    
    water9 - 4 months ago
    
    Intel integrated graphics, technically also used unified memory with the standard dram
    
    kergonath - 4 months ago
    
    Those also have terrible performance and worse bandwidth. I am not sure they are really relevant, to be honest.
    
    McDaveNZ - 4 months ago
    
    Did the Xeons in the Mac Pro even have integrated graphics?
    
    icedchai - 4 months ago
    
    So did the Amiga, almost 40 years ago...
    
    vaxman - 4 months ago
    
    You mean this? ;) http://de.wikipedia.org/wiki/Datei:Amiga_1000_PAL.jpg
    RIP Jay Miner who watched his unified memory daughters Agnus, Denise and Paula be slowly murdered by Jack Tramiel's vengeance against Irving Gould. [Why couldn't the shareholders have stormed their boardroom 180 days before the company ran out of cash, installed interim management who, in turn, would have brought back the megalomaniac Founder that would, until his dying breath, keep spreading their cash to the super brilliant geniuses that made all the magic chips happen and then turn the resulting empire over to ops people to make their workplace so uncomfortable they all retire early and live happily ever after on tropical islands and snowy mountain tops?]