DeepMind's AlphaFold Seeing Uptake for Protein
NEW YORK – Since its introduction in 2018, DeepMind's AlphaFold program has become a key tool in biological research, allowing scientists to predict protein structures with high accuracy based on their amino acid sequences.
More recently, researchers have begun using AlphaFold and its newer iteration, AlphaFold2, for protein-protein interaction (PPI) work, exploring its utility for predicting and validating protein interactions as well as generating models of their structures.
These efforts remain limited by the intensive computing resources required, but studies indicate that AI-based approaches could prove useful for large-scale PPI studies and could complement existing tools like mass spectrometry and yeast two-hybrid systems.
"It's a very interesting space," said Juri Rappsilber, professor of proteomics at the University of Edinburgh and professor of bioanalytics at the Berlin Institute of Technology. In April, he and colleagues published a study in Molecular Systems Biology on combining crosslinking mass spectrometry and co-fractionation mass spec with the AlphaFold-Multimer software — an extension of AlphaFold2 intended for PPI research — to predict and validate PPIs in Bacillus subtilis.
Rappsilber cited as one common use what he called an "Alpha pulldown" in which researchers use the software to screen candidate protein interactors against a particular protein of interest, much as they might with an immune-pulldown mass spec experiment.
"They have a protein that they are interested in, and they have a number of candidate proteins that they think may interact with that protein, and they just toss them one-by-one against their protein of interest," Rappsilber said.
"If AlphaFold is positive, then it is highly likely that the two are interacting," he said. "So you go from having 10 or 20 or 50 candidates down to a handful of candidates, and that is more plausible to follow up on."
Rappsilber added that, importantly, unlike other approaches for validating PPIs, researchers come out of such an experiment with models of the structures of the interactions.
"And that is a very clear instruction of what to do next as an experiment," he said, noting that with this structural information researchers can design point mutants at the interaction sites allowing them to disrupt the interaction and investigate its biological effect.
"The major limitation is computational power," Panagiotis Kastritis, junior professor of cryo-EM at the Martin-Luther University of Halle-Wittenberg and ERA chair for cryo-EM at Greece's National Hellenic Research Foundation, said of using AlphaFold for PPI work. "Most of these calculations have been done on institute-scale computers."
Kastritis noted, though, that this will likely become less of a challenge over the next five to 10 years as computing power continues to become less expensive and more accessible.
He also suggested that certain computing strategies could reduce the computational power required to use AlphaFold for large-scale PPI work. For instance, he said, AlphaFold2 predicts protein structures using what are called multiple sequence alignments, which it produces by taking a protein's amino acid sequence and comparing it to other protein sequence databases to identify similar sequences that it uses to construct its models. Kastritis said that as more and more of these MSAs are identified, they can be stored in such a way that allows the software to access them directly rather than having to once again search sequence databases to identify them.
"If we have predetermined and precalculated [MSAs] it would of course be faster and easier," he said.
In April, researchers from Microsoft and the Free University of Berlin published a bioRxiv preprint that used several computational approaches, including ones similar to those suggested by Kastritis, to speed AlphaFold2's prediction of PPIs. According to the authors, when they applied their approach to predicting the pairwise interactions of 1,000 proteins, it reduced the time required for the predictions by 40-fold while reducing the disk space required by 4,460-fold.
One of the preprint authors, Patrick Bryant, a postdoctoral fellow at the Free University, was also first author on a 2022 Nature Communications paper that detailed a new pipeline for AlphaFold2-based PPI prediction called FoldDock. In January 2023, Bryant and a team led by researchers at the SciLifeLab at Stockholm University (where Bryant had been a graduate student) and the European Bioinformatic Institute used the FoldDock pipeline to predict structures for 65,484 human PPIs, generating 3,137 high-confidence PPI models.
Kastritis said researchers are also using experimental data produced by techniques like mass spec and cryo-electron microscopy to make AlphaFold2 predictions less computationally intensive. For instance, he said, a researcher might provide AlphaFold2 with crosslinking mass spec data or the shape and 3D structure of proteins as determined by cryo-EM and ask it to predict only protein structures that agree with the experimental data.
"Using this kind of information, we can reduce the computational costs," he said.
"You have to go in candidate-driven," Rappsilber said, likewise highlighting the usefulness of experimental data in combination with AlphaFold.
In their MSB study, Rappsilber and his coauthors started by using crosslinking mass spectrometry in whole B. subtilis cells to identify potential protein-protein interactions. They identified a total of 560 PPIs, 384 of which had not been previously detected. They followed this up with co-fractionation mass spec experiments, which identified 667 candidate PPIs, resulting in a total of 878 candidate PPIs generated by the two methods.
The researchers then downloaded known high-quality PPIs from the B. subtilis database SubtiWiki and combined them with their experimentally derived PPIs to create a set of 2032 candidate PPIs that they submitted to AlphaFold-Multimer. The software was able to generate high-quality structural models for 114 of these interactions.
AlphaFold-Multimer was also able to predict high-quality structures for 14 trimeric protein complexes, indicating its potential for moving beyond binary PPIs.
Predicting protein complexes consisting of multiple proteins or other molecules remains a difficult challenge, noted Kastritis, who was not involved in the MSB study. A major issue with such larger complexes, he said, has been that minor inaccuracies in protein structure predictions can propagate throughout the broader complex, leading to larger inaccuracies.
Looking ahead, Rappsilber said he sees three main routes — all of which are currently being pursued — by which AlphaFold and other AI-based tools will grow more useful for large-scale PPI and protein complex work.
The first, which he described as the "brute force" approach, is to simply leverage the continuing improvements in computing power.
"Wait a bit and your smartwatch will be able to do it," he joked.
The second path is the ongoing development of faster and more efficient computational strategies for doing such work, such as the FoldDock pipeline noted above.
The third is more effective integration of experimental data to aid AI-based predictions. This is where much of his lab's efforts are focused, Rappsilber said, pointing to a recent paper from his lab detailing a version of AlphaFold2 called AlphaLink that incorporates data from sources like crosslinking mass spec experiments data that can provide the software with information on the distance between certain amino acid residues.
With crosslinking data "we could get structures for challenging targets where AlphaFold alone failed," he said.