Analyses Visualisation

Objectives

Generating a genome map

Finding secondary metabolites related to toxin production

Building a phylogenetic tree

1. Genome map

GenoVi is a tool that generates customizable circular genome maps from draft or complete prokaryotic genomes. It is worth nothing that although we will not use that functionality in this workshop, GenoVi can also annotate and parse functional categories present in the genome and plot them onto the map.

1. Create a new directory and fetch data

We will use GenoVi on a single genome (genome_03), using the corresponding GenBank formatted file that Prokka generated.

mkdir -p 8.analyses/genome_map

cp 7.genome_annotation/gene_prediction/genome_07.gbk 8.analyses/genome_map/.

2. Run GenoVi in the terminal

cd 8.analyses/genome_map/

module purge
module load GenoVi/0.2.16-Miniconda3

genovi -i genome_07.gbk -s draft -w 1000 -cu -cs paradise -bc white --size -te -t 'genome_07'

Explanation of GenoVi parameters
-i	input file (.gbk or .gff format)
-s	genome status, i.e. “draft” or “complete”
-w	minimum sequence length in basepair to assign a GC analysis
-cu	do not classify each coding sequence into Clusters of Orthologous Groups of proteins (COGs)
-cs	colour scheme
-bc	background color
–size	displays the genome size value on each map
–te	adds text legend to the different tracks
-t	figure title

Output files

Inside the genovi directory created, we can find: - PNG and SVG files of the genome map. - Gral_Stats.csv containing stats about number of CDSs, tRNA, etc per contig and total.

2. Biosynthetic Gene Cluster

Some of the genomes we are working with are from toxic strains of the cyanobacterial genus Microcoleus. They produce anatoxins, including anatoxin-a (otherwise known as Very Fast Death Factor), as secondary metabolites. We can find secondary metabolite Biosynthetic Gene Clusters (BGCs) using the annotation tool AntiSMASH either via the online server, or via the commandline.

1. Create output directory and copy in predicted gene sequences files

mkdir 8.analyses/BGC

cp 7.genome_annotation/gene_prediction/*.genes.fna 8.analyses/BGC/.

2. Run AntiSMASH

Open a new slurm script.

nano secondary_metabolites.sh

And copy in the following:

#!/bin/bash -e

#SBATCH --account       nesi02659
#SBATCH --job-name      antismash
#SBATCH --time          0:30:00
#SBATCH --mem           5GB
#SBATCH --array         0-9
#SBATCH --cpus-per-task 12
#SBATCH --error         slurm_antismash_%A-%a.err
#SBATCH --output        slurm_antismash_%A-%a.out
#SBATCH --partition     milan

module purge >/dev/null 2>&1  
module load antiSMASH/6.0.1-gimkl-2020a-Python-3.8.2

declare -a array=("01" "02" "03" "04" "05" "06" "07" "08" "09" "10") 

cd ~/mgsr

srun antismash --cb-subclusters --cb-knownclusters --cb-general --smcog-trees -c 12 --taxon bacteria --asf \
  --genefinding-tool prodigal \
  5.assembly_evaluation/all_assembled_genomes/scaffolds_${array[$SLURM_ARRAY_TASK_ID]}.m1000.fasta  \
  --output-dir 8.analyses/BGC/${array[$SLURM_ARRAY_TASK_ID]}

Explanation of AntiSMASH parameters
–cb-subclusters	compare identified clusters against known subclusters responsible for synthesising precursors
–cb-knownclusters	compare identified clusters against known gene clusters from the MIBiG database
–cb-general	compare identified clusters against a database of antiSMASH-predicted clusters
–smcog-trees	generate phylogenetic trees of sec. met. cluster orthologous groups.
-c	number of CPUs
–taxon	taxonomic classification of input sequence
–asf	run active site finder analysis
–genefinding-tool	specify algorithm used for gene finding

Output files - index.html: web browser based summary - .gbk files are created for each biosynthetic gene cluster identified

3. Phylogenetic genome-wide tree

Let’s make a quick and easy phylogenetic tree with the 10 isolates, using the core gene alignment generated by GTDB-Tk and FastTree. FastTree infers approximately-maximum-likelihood phylogenetic trees from alignments of nucleotide or protein sequences.

1. Make a new directory

mkdir -p 8.analyses/phylogenetic_tree

2. Fetch the core gene alignment from GTDB-Tk

Unzip and move over gtdbtk.bac120.user_msa.fasta.gz

gzip -d 7.genome_annotation/taxonomy/align/gtdbtk.bac120.user_msa.fasta.gz > 8.analyses/phylogenetic_tree/gtdbtk.bac120.user_msa.fasta

cd 8.analyses/phylogenetic_tree/

3. Build the phylogenetic tree

In the terminal, load the FastTree module then run it.

module load FastTree/2.1.11-GCCcore-9.2.0

FastTree gtdbtk.bac120.user_msa.fasta > tree_file.txt

We will move to RStudio now to visualise the tree.