ANI old method won't do: objective benchmarking of ANI/OGRI tool performance

Leighton Pritchard (University of Strathclyde, UK)

14:45 - 15:00 Tuesday 14 April Afternoon

+ Add to Calendar

Abstract

Overall genome relatedness index (OGRI) methods such as digital DNA-DNA hybridisation (dDDH) and average nucleotide identity (ANI), and genome identity estimates using k-mer based approaches such as Mash, fastANI, and sourmash are central to taxonomic assignment and classification of microbes in modern microbiology. Almost all OGRI methods claim, or imply, that they measure or estimate pairwise genome sequence identity. However, while relative performance of these methods using real genomes has been compared many times, OGRI methods have not previously been comprehensively and objectively benchmarked against ground truth genome sequences where pairwise distances are known exactly. To establish an objective benchmark against which OGRI methods can be evaluated, we used pyani-plus to construct sets of synthetic genomes with known substitution and rearrangement histories, for which precise pairwise sequence identities could be calculated. Using the pyani-plus software we analyse the performance of multiple ANI/OGRI methods against these synthetic datasets, with known pairwise distances. We find that all methods tested display systematic biases that cause estimated sequence identity to depart from the true underlying pairwise identity, but that the methods can be ranked by accuracy, taking into account sensitivity to structural rearrangement, enabling principled recommendations of good practice when estimating ANI for microbial genomes, and a clear choice of ANI methodology that outperforms competitors.

More sessions on Registration