A Machine Learning Tool Integrating Structural and Evolutionary Context into Protein Language Model Predictions of Norovirus Mutation

Sebastian Bowyer (London School of Hygiene and Tropical Medicine, UK)

12:10 - 12:20 Thursday 16 April Morning

+ Add to Calendar

Abstract

The complexity and rapid evolution of RNA viruses make predictive identification of variants with epidemic or pandemic potential challenging. In the case of noroviruses, a leading cause of acute gastroenteritis worldwide, this challenge is compounded by a historic lack of in vitro culture systems, limited genomic data, gaps in global surveillance, and substantial evolutionary diversity. To improve our capacity to study norovirus evolution, we have used protein large language models and their foundational predictive capabilities to better understand the evolutionary pathways followed by human noroviruses and to investigate the mechanisms that govern the emergence of new variants. We have utilised sequence data and the ESM-2 protein language model to identify biophysically plausible amino acid substitutions in the capsid protein of the GII.4 human norovirus subtype. Building on these predictions, we are developing a machine-learning metapredictor that integrates sequence-based likelihoods from ESM-2 with structural context derived from in silico deep mutational scanning and evolutionary constraints inferred from comparative analyses. This framework is designed to estimate the context-dependent feasibility and functional impact of mutations within the VP1 capsid protein. By combining these complementary perspectives, the model aims to highlight substitutions most likely to influence viral fitness, antigenicity, and variant emergence, providing a new computational tool for anticipating the evolutionary trajectories of human noroviruses.

More sessions on Registration