Although it is well known that many bacterial genomes are highly variable, it is nonetheless traditional to refer to, analyse, and publish a single fixed genome for bacterial strains. In the process, inherent natural variation is artificially reduced (“only sequence from a single colony”), ignored (“just publish the consensus”), or placed in the “too-hard” basket (“analysis of raw read data is more robust”). Studies of bacterial genome evolution rarely survey existing diversity under normal laboratory conditions, instead focusing on changes occurring over a timecourse or in response to specific environmental pressures.
Helicobacter pylori is a highly studied bacterium due to its ability to cause ulcers and stomach cancer, with key experimental strains used worldwide. Although H. pylori is well known for having an extremely plastic genome with a high mutation and recombination rate, researchers rely heavily on single fixed reference genomes for these strains, and little is known about the degree of variation to expect in typical working stocks.
Here, I will discuss the variability seen in typical laboratory cultures of H. pylori strain SS1 and its parent strain PMSS1, as revealed by a combination of next-generation sequencing and traditional laboratory techniques. Within SS1 alone, the variation includes large inversions, nearly 50 SNPs at over 5% prevalence, movement of the transposon IS607, and dynamic copy-number variation of the cagA gene.
This work reveals that reliance on a single-colony genome or consensus assembly may be misleading, even at the level of a typical laboratory working stock.