Attention Is the New Big-O: A Systems Design Approach to Prompt Engineering

alexchesser.medium.com

90 points by alexc05 2 days ago


trjordan - 2 days ago

A few of these points are right, but a lot of it is so far lost from the reality of LLMs that it isn't even wrong.

Yes: structure beats vibes. Primacy/recency bias is real. Treating prompts as engineering artifacts is empirically helpful.

But:

- “Reads all tokens at once.” Not quite. Decoder LLMs do a parallel prefill over the prompt, then sequential token-by-token decode with causal masks. That nuance is why primacy/recency and KV-cache behavior matter, and why instruction position can swing results.

- Embeddings & “labels.” Embeddings are learned via self-supervised next-token prediction, not from a labeled thesaurus. “Feline ≈ cat” emerges statistically, not by annotation.

- "Structure >> content". Content is what actually matters. “Well-scaffolded wrong spec” will give you confidently wrong output.

- Personas are mostly style. Yes, users like words typed in their style better, but it'll actually hide certain information that a "senior software engineer" might not know.

I don't really get the Big-O analogy thing, either. Models are constantly exposing and shifting how they direct attention, which is exactly the opposite of the durably true nature of algorithmic complexity. Memorizing how current models like their system prompts written is hardly the same thing.

eric-burel - 2 days ago

I don't buy the arguments made here. You can't call it attention optimized without opening the LLM brain and assessing what happens in the attention layers. Does the quoted paper did some of that? I know Anthropic is advanced in this area but I haven't seen that many results elsewhere yet. I mean the fact that optimizing for attention makes better prompt is a solid hypothesis, but I don't read a proof here.

gashmol - 2 days ago

Aside - It's funny to me how many developers still don't like to call their craft engineering and how fast LLM users jumped on the opportunity.

nateroling - 2 days ago

Can you write a prompt to optimize prompts?

Seems like an LLM should be able to judge a prompt, and collaboratively work with the user to improve it if necessary.

Xorakios - 2 days ago

FWIW, after my programming skills became hopelessly outdated, with an economics degree and too old to start over, I generally promoted my skillset as a translator between business and tech teams.

A lot of what I received as input was more like the first type of instruction, what I sent to the actual development team was closer to the second.

cobbzilla - 2 days ago

This article has some solid general advice on prompt-writing for anyone, even though the examples are technical.

I found the “Big-O” analogy a bit strained & distracting, but still a good read.

mavilia - 2 days ago

This was a great refresher on things I’ve seen writings of but never thought deeply about. A lot of it already “made sense” yet in my day to day I’m still doing the bad versions of the prompts.

Do you have preference n a more continual system like Claude Code for one big prompt or just trying to do one task and starting something new?

esafak - 2 days ago

Could you share an example with results so we can see what difference it made?

energy123 - 2 days ago

I can vouch for this prompting best practice. It leads to better results and better instruction following, whatever the cause.

lubujackson - 2 days ago

To add to these good points - for bigger changes, don't just have LLMs restructure your prompt, but break it down into a TODO list, a summary and/or build a scaffolded result then continue from that. LLMs thrive in structure, and the more architectural you can make both your inputs and outputs, the more consistent your results will be.

For example, pass an LLM a JSON structure with keys but no values and it tends to do a much better job populating the values than trying to fully generate complex data from a prompt alone. Then you can take that populated JSON to do something even more complex in a second prompt.

ath3nd - a day ago

I hate to be a purist here, but structing your sentences into coherent chunks is basic communication, and giving it fancy names like "prompt engineering" is doing a disservice to the term "engineering" which is already a concept stretched pretty thin.

LLMs are random and unpredictable, the opposite of what real engineering is. We better start using terms like "invocations", "incantations", "spells" or "rain dance"/"rituals" to describe how to effectively "talk" to LLMs, because a science it most definitely isn't.

And yeah, taking the five seconds extra to do the bare minimum in structuring your communication will yield better results in literally any effort. Don't see why this concept deserves an article.

PS I am also extremely triggered from the idea of comparing Big-O, a scientific term and exact concept with well understood and predictable outcomes, with "prompt engineering" which is basically "my random thoughts and anecdotal biases of how to communicate better with one of the many similar but different fancy autocompletes with randomness built in".

jwilber - 2 days ago

Nowadays, basically no architecture with an API is using standard attention anymore. There are all kinds of attention alternatives (e.g. Hyena) and tricks (e.g. Sliding Window, etc.) that make this analogy, as presented, flat out incorrect.

In addition, for the technical aspect to make sense, a more effective article would place the points should be shown alongside evals. For example, if you're trying to make a point about where to put important context in the prompt, show a classic needle-in-the-haystack eval, or a jacobian matrix, alongside the results. Otherwise it's largely more prompt fluff.