Abstract
The human bacterial pathogen Helicobacter pylori has a highly variable genome, with significant allelic and sequence diversity between isolates and even within well-characterised strains, hampering comparative genomics of H. pylori. In this study, pan-genome analysis has been used to identify lineage-specific genes of H. pylori. A total of 346 H. pylori genomes spanning the hpAfrica1, hpAfrica2, hpAsia2, hpEurope, hspAmerind and hspEAsia multilocus sequence typing (MLST) lineages were searched for genes specifically over- or underrepresented in MLST lineages or associated with the cag pathogenicity island (PAI). The only genes overrepresented in cagpositive genomes were the cag PAI genes themselves. In contrast, a total of 125 genes were either overrepresented or underrepresented in one or more MLST-lineages. Of these 125 genes, alcohol/aldehyde-reducing enzymes linked with acid-resistance and production of toxic aldehydes were found to be overrepresented in African lineages. Conversely, the FecA2 ferric citrate receptor was missing from hspAmerind genomes, but present in all other lineages. This work shows the applicability of pan-genome analysis for identification of lineage-specific genes of H. pylori, facilitating further investigation to allow linkage of differential distribution of genes with disease outcome or virulence, and can be used with other microbial pathogens with highly variable genomes.