Behind the Scenes of machine learning and Pseudomonas aeruginosa

Posted on April 28, 2023   by Jesse Shaperio and Dao Neugen

Jesse Shaperio and Dao Neugen take us behind the scenes of their latest publication 'Single nucleotide variants in populations from sputum correlate with baseline lung function and predict disease progression in individuals with cystic fibrosis' published in Microbial Genomics

Psudo. main.jpg
iStock/Love Employee

What is your name, job title and institution/company?

Jesse Shapiro is an associate professor of Microbiology and Immunology at McGill University, Canada, located at the McGill Genome Centre. Dao Nguyen is also an associate professor of Medicine at McGill University, Canada, located at the Research Institute of the McGill University Health Centre.

Dao in text.jpg Jessie in text.jpg

How did this collaboration come about?

As a clinician-scientist, Dao has been working for some time with cystic fibrosis (CF) patients fighting Pseudomonas aeruginosa and other lung infections. Her lab developed an AmpliSeq panel to measure genetic variation of Pseudomonas within each patient’s lung infection without the need to culture and sequence individual bacterial colonies. Jannik Donner, a postdoc in her group, had validated and benchmarked the method. With Pseudomonas sequences from a cystic fibrosis cohort in hand, she was looking for creative ways to analyse the data and struck up a conversation with Jesse. A postdoc in Jesse’s lab, Morteza (Masih) Saber had been experimenting with using machine learning to predict patient outcomes from bacterial sequence data, and we decided to take this approach.

Tell us a little bit about your research and why it is important.

It is hard to predict the severity and outcome of Pseudomonas infections in CF patients. Clinical features like body mass index and the bacterial species present in the lung microbiome can provide some clues, but are currently not sufficient to make a good prognosis. In our paper, we show that genetic diversity within the infecting Pseudomonas population – including mutations that could affect virulence – provides a valuable indicator of lung function and patient outcomes.

Tell us about your most recent paper?

We used a small cohort of CF patients from the Calgary biobank and asked if we could infer their lung function at the time of sampling – and predict their decline in lung function five years later – based on the Pseudomonas AmpliSeq data. We trained different machine learning algorithms by showing it examples of AmpliSeq data from severe or mild cases of lung impairment, and then seeing how well the algorithm could make predictions on new data. The algorithms were quite good at separating mild from moderate/severe lung function impairment at baseline, but it was more difficult to predict the decline in lung function five years into the future. The future predictions were still better than random, but might be improved by training on more data or expanding the AmpliSeq panel to cover more of the genome. Although other patient factors like age, body mass index, and the relative abundance of Pseudomonas within the lung microbiome do contain some information, we found that most of the predictive value comes from the AmpliSeq data.

What do you hope the future implications of this research will be?

The AmpliSeq method still needs more research and development, but we hope it could provide the basis for a new kind of prognostic tool for CF patients. The current standard of care is that clinicians can detect the presence or absence of Pseudomonas in the respiratory secretions of CF patients by culture. A prognostic tool that could predict the clinical outcome of patients with Pseudomonas infections would help identify those that are at greater risk of deterioration. It is then plausible that such individuals would benefit from specific interventions.

On a more fundamental level, our results suggest that much of the variation across patients in their lung disease progression can be explained by evolution or strain diversity within the infecting pathogen population. Some of the mutations we identified as important predictors of lung disease could be targeted for further experimentation to understand how they harm the patient and under what conditions they evolve.

What do you enjoy most about your work?

Jesse: For me, it’s working with a diverse set of students and collaborators. As someone who is focused more on genomic methods and evolutionary concepts, I don’t have a particular model system – but I love collaborating with people like Dao who have deep expertise in a particular species or disease. I learn so much, and it’s incredibly humbling. It’s also very rewarding to work both on very fundamental questions in ecology and evolution, and also on more translational projects like this one.

Dao: Although my lab has many projects specifically focused on Pseudomonas and respiratory infections, my interests are quite broad. I truly enjoy working across disciplines with collaborators like Jesse, who view biology through a very different lens and have expertise in completely different methods. This is where I can bring my knowledge on bacterial and infection model systems and clinical medicine to projects where we think about bacterial genetic variation with Jesse, or discover new diagnostic and antimicrobial material with engineers. I am also privileged to combine research with clinical medicine. Taking care of patients and working with clinical colleagues are constant reminders of the “real-world” problems and needs.

You can read Jesse and Dao's work 'Single nucleotide variants in  populations from sputum correlate with baseline lung function and predict disease progression in individuals with cystic fibrosis' published in Microbial Genomics