AI-driven clustering of long-read 16S data reveals microbial dark-matter hotspots in aquaponics microbiomes

Jacques Olivier (University College Dublin, Ireland)

12:40 - 12:50 Thursday 16 April Morning

+ Add to Calendar

Abstract

Recirculating aquaponics systems contain fish tanks, clarifiers, degassing units, run-off lines and filters, each harbouring a distinct microbiome shaped by gradients in nutrients, particles and oxygen. These communities are usually profiled with reference-based 16S workflows that assign every read to the closest taxon in curated databases, and the small residue of unclassified reads is often taken as microbial “dark matter.” We combined a conventional k-mer classifier (Kraken2/Bracken, NCBI 16S) with an AI-enabled, unsupervised learning framework (NaMeco; UMAP + HDBSCAN) that learns the natural structure of sequence space directly from long-read data to reassess microbial dark matter in an operational aquaponics facility. NaMeco builds error-corrected consensus sequences for each cluster and assigns taxonomy against GTDB, exposing cluster-level abundance and similarity to known taxa. Across all compartments, it assigned ~50–65% of reads to near-reference GTDB taxa (≥99% identity) and ~30–40% to intra-genus variants (95–99%). Clusters with <95% identity—representing AI-discovered microbial dark matter—comprised only ~6% of reads in tanks, clarifier and degassing zones but increased to 14.5% in the particulate filter and 19.9% in run-off sludge, pinpointing solids- and biofilm-associated hotspots. Compared with Bracken, NaMeco reduced apparent genus richness by 126 genera per sample while maintaining or enhancing Shannon diversity. AI-driven unsupervised learning thus converts unclassified reads into explicit, ecologically structured guilds, enabling data-driven discovery of hidden diversity and function in engineered microbiomes.

More sessions on Registration