Nvidia's new Llama-3.1 Nemotron Ultra outperforms DeepSeek R1 at half the size

11 points by hochmartinez 7 days ago

Half the size is not a great metric when comparing a dense model against a MoE.

llama has order of magnitude mode compute requirement than deepseek.