Skip to content

1. Introduction to pangenome graphs

What is a pangenome?

A pangenome is defined as the comprehensive collection of whole-genome sequences from multiple individuals within a clade, a population or a species. This collective genomic dataset can be further divided into two distinct components: the core genome, which includes regions present in all individuals at the time of analysis, and the accessory genome, consisting of regions only found in a subset of individuals.

bacterial-pangenome.

What is a pangenome graph?

Pangenome graphs represent pangenomes using graph models, effectively capturing the complete genetic variation across the input genomes. These graphs consist of three components: nodes, edges, and paths.

bacterial-pangenome

Nodes

  • DNA segments, which can be any length

Edges

  • Describe the possible ways of walking through the nodes
  • Connect pairs of node strands
  • Can represent inversions

Paths

  • Paths are routes through the nodes of the graph
  • Genomes
  • Haplotypes
  • Alleles/variants

Overview of a pangenome graph construction pipeline

Pangenome concstruction by Pangenome Graph Builder (PGGB)

The PGGB pipeline is a reference-free method. It builds pangenome graphs using an all-to-all whole genome alignment approach with wfmash. Seqwish is employed to induce the graph, followed by progressive normalization with smoothxg and gfaffix.

PGGB

graph manipulation using ODGI and multiQC report

  • The Optimized Dynamic Genome/graph Implementation (ODGI) is used for various graph manipulation tasks, including visualization.
  • MultiQC is used to generate a report, which includes statistics of the seqwish-induced graph, the final graph, and various visualizations of the final graph.

ODGI

Obtain distance for phylogenetic analysis

We use ODGi to extract distances between paths within the graph, enabling further phylogenetic analysis.

ODGI

Varaint calling

By using the pangenome graph created with PGGB, it is possible to concurrently identify a variety of genetic variations. These include structural variations (SVs), rearrangements, and smaller variants such as single nucleotide polymorphisms (SNPs) and insertions/deletions. These can be identified through the process of vg deconstruction.

ODGI

NGS data analysis against graph

The VG toolkit is utilized for NGS data analysis against the graph, including tasks such as read mapping and variant calling

ODGI

Key components of this pipeline

  • Graph construction using the PGGB
  • Graph manipulation using ODGI
  • Variant calling for NGS data using the VG toolkit

It provides an efficient and integrated approach for pangenome analysis.