CoEVFold suite: user-friendly pipelines to visually represent protein coevolution in bacteria

Chris Graham (University of Warwick, UK)

09:45 - 10:00 Thursday 16 April Morning

+ Add to Calendar

Abstract

Multiple sequence alignment (MSA) data underlies current principles in protein folding and protein-protein interaction prediction, from which large language models (LLM), in tandem with protein datasets, can predict protein structure. However, what is missing are user-friendly tools that enable researchers to predict and demonstrate coevolution - the principal output which these MSAs infer. Here we present a method to identify and visualize coevolution, through a pipeline (CoEVFold) that uses basic direct coupling algorithms derived from GREMLIN using MMSEQs2 to provide the multiple sequence alignments to that algorithm. The pipeline generates a visual representation of coevolution for a single protein but can also represent coevolution of homomeric or heteromeric protein complexes, as well as coevolution within protein networks. The input for this pipeline can be an amino acid sequence, or user input protein structures from Alphafold or the PDB database. In validation of CoEVFold capabilities, and utilising proteins from known prokaryotic and eukaryotic model systems (Bacillus subtilis, Escherichia coli and Saccharomyces cerevisiae) as well as phage proteins, CoEVFold predicts coevolution between proteins known to interact, proteins known to oligomerise, and coevolution in proteins known to be part of a protein complex. Collectively, our data suggest that this suite of tools, named ‘CoEVFold suite’, has broad applicability, making it a potentially essential toolkit to those interested in study protein interactions and networks.

More sessions on Registration