Eagle 3.1: Collaboration Between the EAGLE Team, vLLM Team, and TorchSpec Team

vllm.ai

55 points by berlianta 5 hours ago


kbumsik - 2 hours ago

> performance often degrades under different chat templates, long-context inputs, or out-of-distribution system prompts.

I heard that speculative decoding doesn't affect performance (I meant accuracy). Am I wrong about it?

androiddrew - 4 hours ago

Are these speculative decoders ok to use for AI coding agents or do they only fit certain workloads?

bbor - 3 hours ago

  The EAGLE team traced this fragility to a phenomenon we call ‘attention drift’
Ok that’s downright fascinating. I am one of the world’s foremost experts on the AI psychosis sufferers posting grand theories on Reddit, and ‘drift’ is one of the words that chatbots come back to again and again when told to ponder their own Being (so much so that it even shows up in clearly-unrelated/incorrect contexts — pretty sure I’ve seen both ‘quantum drift’ and ‘spiritual drift’).

It’s probably the #3 most common, after ‘recursion’ and ‘coherence’; I bet ‘coherence drift’ has popped up a thousand times by now, but ‘attention drift’, ‘token drift’, ‘spiritual drift’, ‘cognitive drift’, and ‘semantic drift’ have all gotten airtime AFAIR.

Obviously the primary thing going on there is vulnerable laypeople convincing themselves that they’ve cracked some major part of science, but I do honestly wonder about the unintentional throughlines… This might be the first time I’ve noticed one of them show up in a real paper, though.

Is there some intuitive wisdom in how LLMs tend to approach themselves, perhaps? Or are those terms inevitable when talking via and/or about a 1:1 turn-taking conversation?

eqvinox - 4 hours ago

I saw EAGLE and thought it's going to be about PCB design. Was left disappointed.