Aathreya Kadambi L&S Math & Physical Sciences
Use of Coevolutionary Information in AlphaFold2
In this project, I will use a generalized Potts model—a statistical physics-based model that captures pairwise residue interactions in protein sequences—to determine the degree to which AlphaFold2’s implicit energy function relies on coupling information between different positions in a multiple sequence alignment (MSA). AlphaFold2 (AF2), a neural network for protein structure prediction, has been remarkably successful in predicting protein structures, but its underlying learned representation remains a black box. If AF2 implicitly learns something akin to a Potts model, then this would provide evidence that its neural network has learned certain structural constraints beyond what is captured by a traditional Potts model.
This project will provide substantial insight into how AF2 uses MSA information and whether its learned representations resemble well-established models such as Potts models. If successful, this work will shed light on the role of evolutionary data in protein structure prediction models and guide the development and interpretation of ML-based protein models in the future.
Message To Sponsor
Thank you for your funding for the SURF program! I really appreciate the opportunity to work on better understanding AlphaFold's use of MSA information. You have given me the opportunity to work on a problem where I can apply interesting mathematical ideas, and this experience will be a great step towards my future research projects. I am grateful, and I will do my best to conduct some awesome research this summer!