ESMFold2 creates ESM Atlas of 1 billion predictions

ESMFold2 open-source – A new open-source protein-structure atlas, generated with ESMFold2, predicts the structures of more than one billion proteins and catalogs sequences for 6.8 billion proteins—surpassing prior efforts and drawing fresh attention to how much of biology remains un
By the time biologists finish decoding one protein’s job. another thousand have already been discovered—or found hiding in genomes from soil and oceans. On May 30. 2026. that slow grind got a jolt: researchers at the Chan Zuckerberg Initiative’s Biohub unveiled the ESM Atlas. an open resource containing an AI-generated map of more than one billion predicted protein structures and information on the sequences of 6.8 billion proteins.
The atlas is built using ESMFold2. a protein-structure prediction model Biohub says surpasses the performance of AlphaFold3. Google DeepMind’s latest system and other protein-structure prediction AIs. The work, described in a preprint released that day, is positioned as more than a bigger dataset. Biohub science head Alex Rives. who led the effort. framed it as a way to expose parts of protein biology that have been hardest to reach. “What this atlas does is it shows the totality of protein biology and especially the parts that are most unknown. ” Rives said. “We think it’s going to be a really powerful substrate for the discovery of new biology.”.
The numbers are hard to miss. The ESM Atlas eclipses the AlphaFold Database of predicted protein structures by more than 800 million entries. It also surpasses a previous ESM Atlas by some 300 million. In other words. the tool doesn’t just improve accuracy; it expands the scale of the protein universe researchers can interrogate.
ESMFold2 comes from a “protein language” approach that Rives’s team unveiled in 2024. That training set includes billions of proteins drawn from across the tree of life. and it also incorporates metagenomic sequences from environments like soil and ocean—data that Biohub says are absent from the AlphaFold database of predicted protein structures. The inclusion matters because vast portions of biology never show up in the lab-friendly proteins that dominate many early benchmarks.
In a field racing toward practical speed, the most pointed claim in the preprint is about protein interactions. Rives’s team says ESMFold2 outperforms existing methods, including AlphaFold3, at determining the correct structure of complexes of interacting proteins. They cite examples that include antibody molecules binding to antigen molecular targets.
That capability is part of why the preprint also reads like a test drive. Rives’s group describes using ESMFold2 to design new antibodies and other proteins intended to strongly attach to proteins implicated in cancers and immunological conditions. When those designs were created and tested in the lab, a high proportion of the designs worked as predicted.
The atlas itself is designed to bridge the gap between what’s known and what’s still foggy. Rives hopes the freely accessible resource will help scientists make connections between familiar and poorly characterized regions of protein space. Using the dataset. the researchers report finding structural similarities between CRISPR microbial defense proteins and a gene-editing protein identified in a soil fungus in 2023—seen in other eukaryotic species as well.
Protein prediction has become a battleground of models, but the ESM Atlas has landed in the middle of a different kind of tension: open science versus closed tools. Other researchers say the results are compelling, particularly because ESMFold2 is fully open source.
Gemma Atkinson, a computational biologist at Lund University in Sweden, called the atlas “an extraordinary resource for biology.” She added that it’s exciting to see how large-scale protein language models can capture fundamental rules of protein biology.
Christine Orengo, a computational biologist at University College London, said the predictions will first need evaluating, but they could help uncover new protein folds and functions—work with implications for protein design and for basic understanding of biology.
Still. not everyone is satisfied with “bigger” as a substitute for “right in the hard cases.” Martin Steinegger. a computational biologist at Seoul National University. said his biggest question is how well ESMFold2 can predict the structure of proteins that are very different from those already known. His team found that the first edition of ESMFold wasn’t especially good at predicting unusual protein structures. especially those found in metagenome data.
Sergey Ovchinnikov. a computational biologist at the Massachusetts Institute of Technology in Cambridge. sees the ESM Atlas as a supplement rather than a replacement. He points to the widely used AlphaFold database of more than 200 million protein structures. Ovchinnikov also notes that ESMFold2’s impressive performance on interacting proteins is not entirely surprising: earlier this year. Isomorphic Labs—Google DeepMind’s biopharma spin-off—unveiled a proprietary model that made substantial gains at predicting interacting structures.
Open-source models. he added. have also achieved strong results in predicting protein interactions. though Ovchinnikov said his view is based on what’s been reported rather than direct comparisons to ESMFold2. He emphasized the practical difference that ESMFold2 offers because of its fully open-source nature. including “no restrictions on commercial use.” “I expect many people will be excited to try ESMFold2. ” he said.
For now. the ESM Atlas arrives as a floodlight aimed at protein biology’s darker corners: metagenomic sequences that have been poorly characterized. new structure predictions at a scale that dwarfs earlier atlases. and a tool designed to make interactions—where biology often becomes most consequential—more tractable. The stakes are simple and immediate for researchers who have been trying to chase down what proteins can do. If enough of these predictions hold up in evaluation. ESMFold2 and its billion-plus structures won’t just add entries to a database—they could change what scientists choose to test next.
This article was reproduced with permission and was first published on May 27, 2026.
protein structure prediction ESMFold2 ESM Atlas AlphaFold Database Chan Zuckerberg Initiative Biohub open-source AI metagenomics antibody design CRISPR proteins CTLA-4
So it’s like… a map of proteins? That’s kinda wild.
Wait I thought AlphaFold3 already did all this. Like where do they even get the protein info from if it’s just predictions? Seems like another AI flex to me.
I don’t get it, 1 billion predictions sounds made up. If it predicts the structure, doesn’t that mean it’s guessing what the protein does too? Also why is it only “uncovering” stuff now, the genomes been around forever.
This is good but I swear these AI protein things always come out and everyone acts surprised like biology isn’t basically code already. 6.8 billion proteins?? That’s like the whole ocean and dirt in a spreadsheet. Next thing you know somebody’s gonna say it proves vaccines or proves nothing, idk. I’m just glad it’s open-source though, even if half of it is wrong.