Offered talk: Benchmarking strain-level profiling of Escherichia coli in short-read gut metagenomes

Matthew Galbraith (University of Oxford, UK)

14:10 - 14:20 Tuesday 07 July Afternoon

+ Add to Calendar

Abstract

Introduction Metagenomes offer potential to characterise Escherichia coli strain-level diversity within the human gut microbiome, informing understanding of genetic differences between infection and colonisation. However, this remains challenging due to low relative abundance and the historic predominance of short-read sequencing. Despite development of numerous reference-based tools, the best approach remains unclear, and an independent benchmarking is lacking. Methods We benchmarked six published tools—StrainR2, Strainify, StrainScan, PanTax, PathoScope and StrainGE—for their ability to detect and estimate the relative abundance of co-existing E. coli strains. Here, metagenomes representing the healthy human gut microbiome were simulated using InSilicoSeq across multiple sequencing depths, with two strains (K12-MG1655; Sakai) spiked in at varying abundances and evaluated across tools using multiple reference database compositions. Results PanTax, StrainGE and StrainScan showed the highest F1 scores (0.936, 0.978 and 0.909). PathoScope, StrainR2 and Strainify showed markedly lower F1 scores (0.131, 0.270 and 0.139), although performance improved when applying thresholds to remove low-abundance detections. Meanwhile, PanTax and StrainR2 had the most accurate relative abundance predictions (mean absolute proportional error = 0.06). When the true positive strains were removed from the reference database, relative abundance predictions diverged across tools: StrainGE collapsed accurately to phylogroup-level assignments, while PanTax showed the highest strain-level specificity (0.992). Conclusions Published metagenomic strain-level profilers vary in their ability to profile strains. Our benchmark identified PanTax as the most consistently high-performing tool across metrics. This work provides a framework for more accurate profiling and improved understanding of E. coli strain-level diversity in short-read gut metagenomes.

More sessions on Registration