Biohub releases a world model of protein biology

biohub.org

134 points by gmays 4 days ago


a_bonobo - 6 hours ago

The accompanying preprint is interesting: https://www.biorxiv.org/content/10.64898/2026.06.03.729735v1

Modeling protein-protein binding is still a massively unsolved problem, mainly because we don't really have the data. Alphafold2 was great but didn't actually 'solve' protein-folding as all input data is from single 'state' X-ray crystallography of the proteins, not 'really' how these proteins behave in the wild. So it's still very, very had to predict what binds to what, which of course is a multi-billion-dollar industry.

I work in a pharma-field and I wish we could easily design molecular binders. We still spend millions every year finding targets that could 'smuggle' our drugs into cells.

Some other players in this field are Boltz Lab and Isomorphic Labs (the Alphafold Google spinoff led by Hasabi). None of them can predict anything complex or 'big', everything is peptide-level. OP's work is another step towards something better.

The most interesting part in the preprint is that they find no matches for their designed binders in the world-write protein database. An open question with protein-designers is whether they just regurgitate training material, which is far easier to test with English-language models.

Frannky - 11 hours ago

It's interesting that there are almost no comments on this. This feels like some of the most exciting and impactful fields of the next years. I worked with a cracked researcher that was generating molecules a couple of years ago. She spent most of her time fighting cuda bugs and trying installing packages. I wonder if the ecosystem matured right now. There are people studying cells to see what enters and what exits and engineer how to stop, for example, resources feeding a bad cell. Possibilities feel endless. I am a little worried about side effects, since bio is way more chaotic than silicon, but hopefully AI will help with that level of chaos too.

swyx - 11 hours ago

we interviewed Alex Rives, cofounder of EvoScale and Head of Science at BioHub - here https://www.latent.space/p/esmfold2

also 3 paper coauthors walked thru it with us: https://youtu.be/4g1bURdKN0Q

all this is part of the new AI for Science effort we are spinning up at Latent Space - all guidance and support would be greatly appreciated as this is a much harder domain to cover than software

tmoertel - 7 hours ago

Our mission is to cure or prevent all disease

Okay, now you have my attention.

What's the deal on the company behind it? “Biohub is a 501(c)(3) biomedical research organization...” Nonprofit. Nifty!

This all sounds great, but as we have recently seen with, say OpenAI, there is nonprofit and then there is nonprofit. Anyone know which Biohub is?

trilogic - 5 hours ago

It is a nice work, however the domain specific finetuning will always be of higher accuracy prediction. Another thing worth noting is the sequence length used for the training (usually cut to 1024/2048) which is a game changer if left uncut.

I did have a bit of fun myself finetuning esm2 in domain specific bacteria (cause it gives better score) and comparing it to another model (self created) and self created beat it at 25% more accuracy. Then for the 3d structure was coded a 3d protein visualizer hypergraph with the upload file option and visualize instantly the result. 2 days job :)

rguiscard - 9 hours ago

A similar work is Foundry (https://github.com/RosettaCommons/foundry). While both of them are good, the main issue is that it is not accurate enough at atom level. There are good chances that predicted or designed active site is slightly different from the real structure solved by X-ray, NMR or cryo-electronic microscopy. A side-chain or two may turn the other way so that it changes how the interaction is interpreted. So the tools are good and convenient now. But the design or prediction is often hit-and-miss.

Den_VR - 8 hours ago

Incredible, but also scary if you think about what it may be lowering the barrier of entry to…

RobotToaster - 6 hours ago

> This model is released under the MIT License.

Huh, appears to be actually open source, that's a pleasant surprise. Usually these academic models have some weird license attached to them.

ethanwillis - 8 hours ago

  a scientific engine for prediction, design, and discovery that can map proteins across the tree of life, predict their structures, and design new protein binders that function in laboratory experiments. 
So, my issue with this is just like in a lot of the other areas of bio we're not able to explore outside the semantics of what is "known." Even a simpler task of just doing proper assembly is plagued by this. De Novo assembly of an alien/novel organism mixed with samples from other alien organisms would be impossible with what we can do today. Even with things that we're familiar we struggle with metagenomic assembly.