For algorithms, a little memory outweighs a lot of time
quantamagazine.org336 points by makira 3 days ago
336 points by makira 3 days ago
Minus the fuzz: A multitape Turing machine running in time t can be simulated using O(sqrt(t log t)) space (and typically more than t time).
It's unfortunate that Quanta links are so popular, when they include so much pseudo-poetic fluff around the mathematics. Below there's an entire thread to dismiss a misconception introduced by the quanta article.
"I think it is very intuitive that more space beats the pants off of more time." (poster is absolutely right) The The article say "Until now, the only known algorithms for accomplishing certain tasks required an amount of space roughly proportional to their runtime, and researchers had long assumed there’s no way to do better.", which is interpreted as that there's a proportional relation between time and space. However, a quick look at the complexity hierarchy would never suggest such a thing. Reading more carefully, it says "known algorithms" for "certain tasks", but then where do you get a general intuition from such a particular statement?
I think they used to be better but really have made a blatant turn. I really thought that wormhole fiasco would have killed them. To go 4 whole months before putting the editor's note is beyond egregious[0]. Mistakes happen, but 4 months kills all credibility. You have to act fast on those things! There were big names raising serious concerns on day 1 and it really shows they didn't do due diligence to get outside verification before running a piece that they knew would be really popular.
All this accomplishes is discrediting science. Trading personal gains for eroding the very thing that they make their money off of. This is a major part of why Americans (and people) have such high distrust for science. News outlets, and in particular science focused news outlets, constantly spew inaccurate information. It really should be no wonder that so many people are confused about so many scientific topics, as unless they actually take the years it takes to become an expert in a field, they are going to have a difficult time distinguishing fact from fiction. And why shouldn't the average person expect to trust a source like Quanta? They're "full of experts", right? smh
[0] This is the earliest archive I see with the note. Press back one day and it should not be there. Article was published on Nov 30 2022, along with a youtube video https://web.archive.org/web/20230329191417/https://www.quant...
I'm the author of this article. If you ask a complexity theorist, they will tell you that they did in fact have a general intuition that certain problems require space close to to linear in time to solve (see e.g., Ryan's comment #22 on Scott Aaronson's blog post about the result: https://scottaaronson.blog/?p=8680, and the comments after that). The most intuitive way to see this is in a circuit/DAG picture, where the goal is to get from the input nodes of the graph to the output nodes. Some graphs are very "wide": cut the graph at some intermediate point, and there will be a lot of edges crossing the cut, each of which represents some information from an earlier stage in the computation that you'll need to remember to get to the output. Ryan's first result is a general-purpose method for doing any computation, even ones whose graph structure looks like this, in asymptotically far less space. That is precisely what makes the result so surprising!
My article was quite explicit in multiple places that the universal/comprehensive character of the result was that counterintuitive part:
- In the first paragraph: "memory was more powerful than computer scientists believed: A small amount would be as helpful as a lot of time in all conceivable computations."
- Further down in the introduction, in the passage you quoted: "Until now, the only known algorithms for accomplishing certain tasks required an amount of space roughly proportional to their runtime, and researchers had long assumed there’s no way to do better. Williams’ proof established a mathematical procedure for transforming any algorithm — no matter what it does — into a form that uses much less space.
- In the third section, I explicitly state that researchers do believe space is more powerful than time in the specific sense that you're criticizing my article for misrepresenting: "But complexity theorists suspect that PSPACE is a much larger class, containing many problems that aren’t in P. In other words, they believe that space is a far more powerful computational resource than time. This belief stems from the fact that algorithms can use the same small chunk of memory over and over, while time isn’t as forgiving — once it passes, you can’t get it back."
- In the fourth section, I explain why researchers didn't think the HPV75 result could be improved further, despite their intuition that space is more powerful than time in the above sense: "While many problems can be solved with much less space than time, some intuitively seemed like they’d need nearly as much space as time."
TCS (and complexity theory specifically) are complicated subjects. I spend a lot of time interviewing researchers and thinking about how to distill the results of my reporting into a form that is accessible to readers with widely varying levels of familiarity with the subject matter. You are of course well within your rights to critique my stylistic choices, the narrative aspects of the story, and the order in which I presented information, but I will push back against the claim that my article is spreading misinformation about complexity theory. You're referring to a misconception that arises, by your own admission, when you don't read carefully. If it's the headline you object to, you could lodge a similar complaint against the complexity theorist Lance Fortnow: https://blog.computationalcomplexity.org/2025/02/you-need-mu....
It's kind of insulting to the reader that they explain P complexity class without using the word polynomial ("all problems that can be solved in a reasonable amount of time")
Be generous - it saves a lot of time. Once you say "polynomial" readers will think, "like, ANY polynomial, even like n^100?!" and you'll have to explain, yes, but that's STILL better than exponential, etc. They avoided all of that
Quanta targets people who are above average. So I don't think it is too much for them to give a sentence or two stating that. Or even a little graphic could do wonders. I don't think it would take much time or effort to make a graphic like the one on wikipedia[0] and just throw in some equations within the ring. You can easily simplify too, by removing NL and merging EXP. Hell, look at the graphics here[1]. That's much more work.
I don't think Quanta should be afraid of showing math to people. That's really their whole purpose. Even if I think they've made some egregious mistakes that make them untrustable...[2]
[0] https://en.wikipedia.org/wiki/PSPACE#/media/File:Complexity_...
[1] https://www.quantamagazine.org/june-huh-high-school-dropout-...
I suppose my point is that the readers who will wonder about this are a) very likely to know about complexity classes already, or b)capable of learning about it themselves. Perhaps a simple link to something like https://complexityzoo.net/Petting_Zoo would have been a nice middle-ground.
Edit: Aaronson even mentions the n^100 problem in the section about P!
I disagree and even think that this is besides the point. It is hard to wonder about what you don't know to wonder about. It is the job of the communicator to prime that and provide any critical information that the reader is not expected to know about. Without some basic explanation here then these terms might as well be black boxes to readers.
The point is that a single line[0] and a minimal graphic could substantially improve the reader's comprehension while simultaneously providing them the necessary nomenclature to find relevant material to further increase their understanding.
Look at this line:
| One of the most important classes goes by the humble name “P.”
It tells us almost nothing, except of its importance. Only to be followed by | Roughly speaking, it encompasses all problems that can be solved in a reasonable amount of time. An analogous complexity class for space is dubbed “PSPACE.”
This tells us nothing... My first thought would by "why not PTIME and PSPACE" if I didn't already know what was going on.The whole work is about bridging these two concepts! How can we understand that if we don't know what we're building a bridge between? It's like reporting on a bridge being built connecting England and France but just calling it a bridge. Is it important? Sounds like it by the way they talk, but how can you even know the impact of such a thing when not given such critical context? You get tremendous amounts of additional context with the addition of so few words.
I think this is actually a pretty reasonable description but I also have read Quantum Computing Since Democritus.
From the „Camel Book”, one of my favorite programming books (not because it was enlightening, but because it was entertaining); on the Perl philosophy:
“If you’re running out of memory, you can buy more. But if you’re running out of time, you’re screwed.”
This can work both ways. If the program needs more memory than the computer has, it can't run until you buy more. But if it takes twice as long, at least it runs at all.
Modern computers have so much memory it feels like it doesn't matter. Spending that memory on arrays for algorithms or things like a Garbage Collector just make sense. And, extra memory is worthless. You WANT the summation of all your programs to use all your memory. The processor, on the other hand, can context switch and do everything in it's power to make sure it stays busy.
The CPU is like an engine and memory is your gas tank. Idling the engine is bad, but leaving gas in the tank doesn't hurt, but it doesn't help either. I'm not gonna get to my destination faster because I have a full tank.
Only if running one such memory-hungry program at a time, which usually cannot be afforded. Multi-program workloads are much more common and the strategy of using as much ram as possible can't work in that environment.
The Camel book was written when Moore’s Law was trucking along. These days you can’t buy much more time but you used to be able to just fine. Now it’s horizontal scaling. Which is still more time.
Lookup tables with precalculated things for the win!
In fact I don’t think we would need processors anymore if we were centrally storing all of the operations ever done in our processors.
Now fast retrieval is another problem for another thread.
Reminds me of when I started working on storage systems as a young man and once suggested pre-computing every 4KB block once and just using pointers to the correct block as data is written, until someone pointed out that the number of unique 4KB blocks (2^32768) far exceeds the number of atoms in the universe.
It seems like you weren’t really that far off from implementing it, you just need a 4 KB pointer to point to the right block. And in fact, that is what all storage systems do!
Reminds me of when I imagined brute-forcing every possible small picture as simply 256 shades of gray for each pixel x (640 x 480 = 307200 pixels) = 78 million possible pictures.
Actually I don't have any intuition for why that's wrong, except that if we catenate the rows into one long row then the picture can be considered as a number 307200 digits long in base 256, and then I see that it could represent 256^307200 possible different values. Which is a lot: https://www.wolframalpha.com/input?i=256%5E307200
78 million is how many pixels would be in 256 different pictures with 307200 pixels each. You're only counting each pixel once for each possible value, but you actually need to count each possible value on each pixel once per possible combinations of all of the other pixels.
The number of possible pictures is indeed 256^307200, which is an unfathomably larger number than 78 million. (256 possible values for the first pixel * 256 possible values for the second pixel * 256 possi...).
Yeah I had a similar thought back in the 90s and made a program to iterate through all possible images at a fairly low res, I left it running while I was at school and got home after many hours to find it had hardly got past the first row of pixels! This was a huge eye-opener about how big a possibility-space digital images really exist in!
I has the same idea when I first learned about programming as a teenager. I wonder how many young programmers have had this exact same train of thought?
i think at some point you should have realized that there are obviously more than 78 million possible greyscale 640x480 pictures. theres a lot of intuitive examples but just think of this:
https://images.lsnglobal.com/ZFSJiK61WTql9okXV1N5XyGtCEc=/fi...
if there were only 78 million possible pictures, how could that portrait be so recongizably one specific person? wouldnt that mean that your entire picture space wouldnt even be able to fit a single portrait of everyone in Germany?
"At some point" I do realise it. What I don't have is an intuitive feel for why a number can be three digits 000 to 999 and each place has ten choices, but it's not 10 x 3 possibles. I tried to ask ChatGPT to give me an intuition for it, but all it does is go into an explanation of combinations. I know it's 10 x 10 x 10 meaning 10^3 I don't need that explanation again, what I'm looking for is an intuition for why it isn't 10x3.
> "if there were only 78 million possible pictures, how could that portrait be so recongizably one specific person? wouldnt that mean that your entire picture space wouldnt even be able to fit a single portrait of everyone in Germany?"
It's not intuitive that "a 640x480 computer picture must be able to fit a single portrait of everyone in Germany"; A human couldn't check it, a human couldn't remember 78 million distinct pictures, look through them, and see that they all look sufficiently distinct and at no point is it representing 50k people with one picture; human attention and memory isn't enough for that.
Try starting with a 2x2, then 3x3, etc. image and manually list all the possibilities.
That's focusing on the wrong thing; as I said, "I know it's 10 x 10 x 10 meaning 10^3 I don't need that explanation [for the correct combinations], what I'm looking for is an intuition for why it isn't 10x3".
I had friend who had the same idea to do it for pixel fonts with only two colors and 16x16 canvas. It was still 2^256. Watching that thing run and trying to estimate when it would finish made me understand encryption.
The other problem is that (if we take literally the absurd proposal of computing "every possible block" up front) you're not actually saving any space by doing this, since your "pointers" would be the same size as the blocks they point to.
If you don't do _actually_ every single block then you have Huffman Coding [1].
I imagine if you have a good idea of the data incoming you could probably do a similar encoding scheme where you use 7 bits to point to a ~512 bit blob and the 8th bit means the next 512 couldn't be compressed.
In some contexts, dictionary encoding (which is what you're suggesting, approximately) can actually work great. For example common values or null values (which is a common type of common value). It's just less efficient to try to do it with /every/ block. You have to make it "worth it", which is a factor of the frequency of occurrence of the value. Shorter values give you a worse compression ratio on one hand, but on the other hand it's often likelier that you'll find it in the data so it makes up for it, to a point.
There are other similar lightweight encoding schemes like RLE and delta and frame of reference encoding which all are good for different data distributions.
The other problem is to address all possible 4098 byte blocks, you need a 4098 byte address. I suppose we would expect the actual number of blocks computed and reused to be a sparse subset.
Alternately, have you considered 8 byte blocks?
If your block pointers are 8-byte addresses, you don't need to count on block sparsity, in fact, you don't even need to have the actual blocks.
A pointer type, that implements self-read and writes, with null allocations and deletes, is easy to implement incredibly efficiently in any decent type system. A true zero-cost abstraction, if I have ever seen one!
(On a more serious note, a memory heap and CPU that cooperated to interpret pointers with the top bit set, as a 63-bit linear-access/write self-storage "pointer", is an interesting thought.
The idea is not too far off. You could compute a hash on an existing data block. Store the hash and data block mapping. Now you can use the hash in anywhere that data block resides, i.e. any duplicate data blocks can use the same hash. That's how storage deduplication works in the nutshell.
Except that there are collisions...
This might be completely naive but can a reversible time component be incorporated into distinguishing two hash calculations? Meaning when unpacked/extrapolated it is a unique signifier but when decomposed it folds back into the standard calculation - is this feasible?
Some hashes do have verification bits, that are used not just to verify intact hash, but one "identical" hash from another. However, they do tend to be slower hashes.
Do you have an example? That just sounds like a hash that is a few bits longer.
Mostly use of GCM (Galois/Counter Mode). Usually you tag the key, but you can also tag the value to check verification of collisions instead.
But as I said, slow.
hashes by definition are not reversible. you could store a timestamp together with a hash, and/or you could include a timestamp in the digested content, but the timestamp can’t be part of the hash.
> hashes by definition are not reversible.
Sure they are. You could generate every possible input, compute hash & compare with a given one.
Ok it might take infinite amount of compute (time/energy). But that's just a technicality, right?
Sure they are. You could generate every possible input
Depends entirely on what you mean by reversible. For every hash value, there are an infinite number of inputs that give that value. So while it is certainly possible to find some input that hashes to a given value, you cannot know which input I originally hashed to get that that value.
Can use cryptographic hashing.
How does that get around the pigeonhole principle?
I think you'd have to compare the data value before purging, and you can only do the deduplication (purge) if the block is actually the same, otherwise you have to keep the block (you can't replace it with the hash because the hash link in the pool points to different data)
The hash collision chance is extremely low.
For small amounts of data yeah. With growing data, the chance of a collision grows more than proportional. So in the context of working on storage systems (like s3 or so) that won't work unless customers actually accept the risk of a collission as okay. So for example, when storing media data (movies, photos), I could imagine that, but not for data in general.
Cryptographic hashing collisions are very very small, like end of universe in numerous times small. They're smaller than AWS being burnt down and all backups were lost leading to data loss.