Wednesday, March 04, 2026

Mapping the yeast atructural interactome with AlphaFold3: an open call for collaboration

 
We are excited to announce the early-stage release of our S. cerevisiae structural interactome mapping project. Using AlphaFold3 (AF3), we are systematically predicting the protein-protein interactions and their 3D structures across the yeast proteome. 

We are currently about 25–30% of the way through the project. Rather than waiting until the end, we are releasing our data and current benchmarks early. We want to avoid duplicate work in the community, provide researchers with immediate access to high-confidence structural models, and put out an open call for collaborators to help us finish this project.


Predicting structures for all possible protein pair individually is computationally difficult. To facilitate this, we are using the pooling approach that we recently published together with Horia Todor in Carol Gross' lab at UCSF. By packing randomly sampled proteins into pools of up to 5,000 tokens, we can cover the interactome far more efficiently. Surprisingly, in addition to being more time efficient, the pooling approach also results in more accurate confidence scores, perhaps due to some in silico competition reducing the false positive rate. 

Current progress and early access

For our yeast interactome, we are currently limiting our target list to a subset of approximately 4,000 proteins that are expressed under standard conditions. By grouping these, we condensed the interactome down to 300,000 unique pools. We attempted still to further optimize the screen and through our benchmarking, we tested the impact of a reduced number of recycles.We found that dropping from 10 recycles to just 3 recycles  cuts the compute time in half without a meaningful loss in predictive power. In our tests against STRING scores, the AUC only shifted marginally from 0.85 (10 recycles) to 0.84 (3 recycles). The prediction is based on a corrected ipTM value described in the mycoplasma paper. 

We have so far computed close to 30% of the target set corresponding to prediction scores and structures for over 4 million pairs of yeast proteins. We are making the data available through a docker image found in this github page. The docker image creates also a simple web app to go through the list and select individual files but at the moment this release is going to be most useful for those capable of parsing through large scale datasets.  



A call for collaborators

While our optimizations have made this project possible, completing the remaining ~70% of the interactome is still a big challenge and we would certainly welcome collaborations. We are looking for labs or institutions with significant access to high-end compute. We also welcome collaborations focused on the downstream biological analysis of these structures and applying the network to specific biological questions.If you have the compute power to help us process the remaining pools, or if you are interested in diving into the analysis, please reach out.  We do ask that researchers refrain from publishing proteome-wide or large-scale data mining studies until our formal publication is released.