Billion-Parameter Theories

worldgov.org

96 points by seanlinehan 16 hours ago


wavemode - 11 hours ago

Not to sound condescending, but this reads like someone fimiliar with LLMs but very unfamiliar with statistics in general.

If we could understand economics, or poverty, or any number of other social structures, simply by cramming data into a statistical model with billions of parameters, we would've done that decades ago and these problems would already be understood.

In the real world, though, there is a phenomenon called overfitting. In other words you can perfectly model the training data but be unable to make useful predictions about new data (i.e. the future).

harperlee - 15 hours ago

Two handwavey ideas upon reading this:

- Even for billion-parameter theories, a small amount of vectors might dominate the behaviour. A coordinate shift approach (PCA) might surface new concepts that enable us to model that phenomenon. "A change in perspective is worth 80 IQ points", said Alan Kay.

- There is analogue of how we come up with cognitive metaphors of the mind ("our models of the mind resemble our latest technology (abacus, mechanisms, computer, neural network)"), to be applied to other complicated areas of reality.

b450 - 15 hours ago

Reminds me of the blog post about Waymo's "World Model". Training on real-world data results in a sufficiently rich model to start simulating novel scenarios that aren't in the training data (like the elephant wandering into the street), which in turn can feed back into training. One could imagine scientific inquiry working the same way.

It strikes me that many of these complex systems have indeterminate boundaries, and a fair amount of distortion might be baked into the choice of training data. Poverty (to take an example from this post) probably has causes at economic, psychological, ecological, physiological, historical, and political levels of description (commenters please note I didn't think too hard about this list). What data we feed into our models, and how those data are understood as operationalizations of the qualitative phenomena we care about, might matter.

roughly - 11 hours ago

There's a lot of ink in this spent on how Poverty, Climate Change, Urban Decay, and Financial Markets are Complex Hard Complicated problems.

The problem with these is they're also problems where there are actors profiting from the failure to fix the system - the issue isn't that we don't understand the complex nature of the domain, it's that the components of the system actively and agentically resist changes to the system. George Soros called this Reflexivity - the fact that the system responds to your manipulations means you can't treat yourself and the system as separate agents, and you can't treat the system as a purely mechanistic/passive recipient of your changes. It's maybe the biggest blind spot for people who want to apply the rules and methods of physics to social issues - the universe may be indifferent, but your neighbors are not.

niemandhier - 15 hours ago

He talks about the Santa Fe institute and how they failed to carry their findings into the real world.

They did not.

They showed that for certain problems one could not do more than figure out some invariant and scaling laws. Showing what is impossible is not failure.

For the rest: Modern gene networks and lots of biological modelling is based on their work as well as quite a few other things. That’s also not failure.

I agree that modern AI is alchemy.

js8 - 15 hours ago

I disagree with the article. I think it is always possible to come up with reasonably small theories that capture most of the given phenomena. So in a sense, you don't need complex theories in the form of large NNs (models? functions? programs?), other than for more precise prediction.

For example - global warming. It's nice to have AOGCMs that have everything and the carbon sink in them. But if you want to understand, a two layer model of atmosphere with CO2 and water vapor feedback will do a decent job, and gives similar first-order predictions.

I also don't think poverty is a complex problem, but that's a minor point.

curuinor - 15 hours ago

Connectionist models have lots of theory by theoreticians explicitly pissed off about Chomsky's assertion that there is an inbuilt ability for language. Jay McClelland's office had a little corkboard thingy with Chomsky mockery on the side, for example. Putting forth even the implicature that the present direct descendants are intellectual descendants of Chomsky is like saying Protestants are intellectual descendants of Pope Leo X.

quinndupont - 15 hours ago

Summary: good scientific theories have “reach,” which is not defined in any precise way. Reach has complexity and this can be handled with large parameter neural networks. Assumptions: mechanistic and deterministic worldview; epistemological perfection is the goal (perfect knowledge of facts).

mistivia - 5 hours ago

> The deepest truths fit on a napkin.

If you have really done physics or engineering, you would never believe this. Simple and elegant formulas usually can only solve the "spherical chicken in a perfect vacuum" kind of problems. The real world is incredibly messy. Beneath those clean and beautiful-looking partial differential equations lies a mathematical nightmare. And these equations often only hold at certain scales or rely on extremely strict boundary conditions.

lkm0 - 15 hours ago

It's an optimistic point of view. Still, when people use large neural nets to model physics, they also have a lot of parameters but they replicate very simple laws. So there's something deeper about this. Something like a simulation of theory.

dakiol - 15 hours ago

> You could capture the behavior of every falling object on Earth in three variables and describe the relationship between matter and energy in five characters.

What we can do is to approximate. Newton had a good approximation some time ago about gravitation (force equals a constant times two masses divided by distance squared. Super readable indeed) But nowadays there's a better one that doesn't look like Newton's theory (Einstein's field equations which look compact but nothing like Newton's). So, what if in a 1000 years we have yet a better approximation to gravity in the universe but it's encoded in millions of variables? (perhaps in the form of a neural network of some futuristic AI model?)

My point is: whatever we know about the universe now doesn't necessarily mean that it has "captured" the underlaying essence of the universe. We approximate. Approximations are useful and handy and will move humanity forward, but let's not forget that "approximations != truth"

If we ever discover the underlaying "truth" of the universe, we would look back and confidently say "Newton was wrong". But I don't think we will ever discover such a thing, thereore sure approximations are our "truth" but sometimes people forget.

ileonichwiesz - 15 hours ago

This might be an unkind reading, but to me this just sounds like an attempt to reinvent the very same kind of mysticism that it mentions in the first paragraph.

“No need to study the world around you and wonder about its rules, peasant - it’s far beyond your understanding! Only ~the gods~ computers can ever know the truth!”

I shudder to think about a future where people give up on working to understand complex systems because it’s hard and a machine can do it better, so why bother.

rbanffy - 13 hours ago

If we think of spacetime as some sort of cellular automaton, where each state of a given point is a function (with some randomness, because God likes to throw dice) of previous states of the surrounding points, if the rules for a new state generation are extremely complex, there will be some significant overhead in dimensions we don't see, because the rules need to be somehow represented outside the observable reality. Another issue with this idea is that while the rules might be "outside", the parameters themselves have to be somehow encoded in the state of a cell, and can't propagate faster than light, or one cell (an indivisible unit of space) per indivisible unit of time), which limits the number of parameters accessible to any given cell to the ones immediately surrounding it.

Disclaimer: I hope it's obvious, but I'm no physicist. This is just how I would build a universe.

zkmon - 14 hours ago

> It's remarkable how much of reality turned out to be modelable by theories that fit in a few symbols.

The admiration for "remarkable" things puts humanity on a dangerous path that is disconnected from the real goals of human progress as a species. You don't need any of this compression of knowledge or truths. Folklore tales about celestial bodies are fine and hood enough. The vulgar pursuit for knowledge is paving the way for extinction of humans as biological creatures.

jjk166 - 9 hours ago

> And the epistemology shifts in ways that might be uncomfortable. Instead of "I understand the causal mechanism and can predict what happens if I change X," you get something more like "I have a sufficiently rich model that I can simulate what happens if I change X, with probabilistic confidence." The answers are distributions, not deterministic outputs. That's a different kind of knowing.

Being able to simulate something is not a kind of knowing. It is, in fact, the opposite of knowing. If you know how a system behaves, there is no need to simulate it. In particular, if the model you need to simulate it is way more complicated then the phenomenon itself, you really really don't understand it.

I'm reminded of Feynman's observation that to simulate a quantum system, like an atom, with classical methods requires a tremendous number of atoms, and his intuition that there should be a much smaller way to perform such calculations. This is the conceptual underpinning of quantum computation.

A billion parameter neural network may work as a functional tool, but the fact is these supposedly complex problems simply don't have billions of relevant free parameters. You're not going to understand a hurricane by feeding terabytes of data to find the butterfly that flapped its wings in just the wrong way at just the wrong time. Sure extremely small differences in starting conditions can have lead to radically different outcomes, and a butterfly flapping its wings could have influenced a hurricane in some way. But if you understand how hurricanes work, you know that butterfly's influence is just noise - the hurricane started and progresses as it does because of temperature gradients on the ocean. If you found and stopped the butterfly from flapping its wings, the conditions for the hurricane would still exist and something else would set it in motion.

Billion parameter theories work in practice because if you throw everything at the wall, the small amount of stuff that can stick will. Likewise if you throw enough data at a problem, whatever data is actually relevant will be analyzed. This can be useful as a stepping stone to understanding, interrogating the model to reveal which parameters have more relevance and the wights of their interactions. But the idea that because you have a tool that addressed a symptom of your ignorance means you are no longer ignorant is folly.

brunohaid - 15 hours ago

Very skeptical Adam Curtis hat on while reading this, but it is quite well written. Thanks & kudos!

us-merul - 15 hours ago

I think this also creates a vulnerability where, the more time and effort is spent to craft the “correct” solution, it becomes easier to dismiss topics out of hand. Even if our modeling tools have changed, emotions and the human mind have not.

ashton314 - 12 hours ago

The core of this little essay seems to be this:

Instead of "I understand the causal mechanism and can predict what happens if I change X," you get something more like "I have a sufficiently rich model that I can simulate what happens if I change X, with probabilistic confidence." The answers are distributions, not deterministic outputs. That's a different kind of knowing.

At the beginning this sounded like, "hard problems are complex, machine learning can help us manage complexity, therefore we will be able to solve hard problems with machine learning", which betrays a shallowness of understanding. I think what this essay argues here is a little deeper than that trite tech-bro hype meme.

But I disagree with this conclusion: I don't know that we can begin to build these models to begin with or that our new LLM/transformer-powered tools can help solve these problems. If simulation were the answer to everything, why will new ML tools make a significant difference in ways that existing simulation tools do not?

Stuff like AlphaFold is amazing—I'm not saying that better medical results won't come about from ML—but I feel like there's some substance missing and that even this level of excitement that the author expresses here needs more and better backing.

meltyness - 10 hours ago

... but then why not a model model to perform that outer analysis and overcome the representations shortcomings of an encoder network?

bigbuppo - 13 hours ago

Maybe I missed the point, but this read like Big Think Thought Leadership that would make a good TED talk but not much else. I'll just put it on the big pile over there.

gnarlouse - 12 hours ago

AI slop DNR

bbor - 14 hours ago

  There's a parallel in linguistics. Chomsky showed that all human languages share deep recursive structure. True, and essentially irrelevant to the language modeling that actually learned to do something with language.
...this is so absurdly and blatantly wrong that it's hard to move past. Has the author ever heard of programming languages??
usgroup - 14 hours ago

[dead]

xikrib - 14 hours ago

Let's gather authors of 15 different world languages together in a room and see if they can collaboratively write a short story. Surely their inability to do so will prove their inadequacy in their native language. /s

Simplicity brings us closer to truth — Occam's razor has underpinned the development of our species for centuries. It's enterprise, empire, and capital that feed off of complexity.

We're entering a period of human history where engineers and businesspeople drive academic discourse, rather than scientists or philosophers. The result is intellectual chicken scratch like this article.