Analyses Visualisation
Objectives
- Generating a genome map
- Finding secondary metabolites related to toxin production
- Building a phylogenetic tree
1. Genome map
GenoVi is a tool that generates customizable circular genome maps from draft or complete prokaryotic genomes. It is worth nothing that although we will not use that functionality in this workshop, GenoVi can also annotate and parse functional categories present in the genome and plot them onto the map.
1. Create a new directory and fetch data
We will use GenoVi on a single genome (genome_03), using the corresponding GenBank formatted file that Prokka generated.
mkdir -p 8.analyses/genome_map
cp 7.genome_annotation/gene_prediction/genome_07.gbk 8.analyses/genome_map/.
2. Run GenoVi in the terminal
cd 8.analyses/genome_map/
module purge
module load GenoVi/0.2.16-Miniconda3
genovi -i genome_07.gbk -s draft -w 1000 -cu -cs paradise -bc white --size -te -t 'genome_07'
Explanation of GenoVi parameters | |
---|---|
-i | input file (.gbk or .gff format) |
-s | genome status, i.e. “draft” or “complete” |
-w | minimum sequence length in basepair to assign a GC analysis |
-cu | do not classify each coding sequence into Clusters of Orthologous Groups of proteins (COGs) |
-cs | colour scheme |
-bc | background color |
–size | displays the genome size value on each map |
–te | adds text legend to the different tracks |
-t | figure title |
Output files
Inside the genovi directory created, we can find: - PNG and SVG files of the genome map. - Gral_Stats.csv containing stats about number of CDSs, tRNA, etc per contig and total.
2. Biosynthetic Gene Cluster
Some of the genomes we are working with are from toxic strains of the cyanobacterial genus Microcoleus. They produce anatoxins, including anatoxin-a (otherwise known as Very Fast Death Factor), as secondary metabolites. We can find secondary metabolite Biosynthetic Gene Clusters (BGCs) using the annotation tool AntiSMASH either via the online server, or via the commandline.
1. Create output directory and copy in predicted gene sequences files
mkdir 8.analyses/BGC
cp 7.genome_annotation/gene_prediction/*.genes.fna 8.analyses/BGC/.
2. Run AntiSMASH
Open a new slurm script.
nano secondary_metabolites.sh
And copy in the following:
#!/bin/bash -e
#SBATCH --account nesi02659
#SBATCH --job-name antismash
#SBATCH --time 0:30:00
#SBATCH --mem 5GB
#SBATCH --array 0-9
#SBATCH --cpus-per-task 12
#SBATCH --error slurm_antismash_%A-%a.err
#SBATCH --output slurm_antismash_%A-%a.out
#SBATCH --partition milan
module purge >/dev/null 2>&1
module load antiSMASH/6.0.1-gimkl-2020a-Python-3.8.2
declare -a array=("01" "02" "03" "04" "05" "06" "07" "08" "09" "10")
cd ~/mgsr
srun antismash --cb-subclusters --cb-knownclusters --cb-general --smcog-trees -c 12 --taxon bacteria --asf \
--genefinding-tool prodigal \
${array[$SLURM_ARRAY_TASK_ID]}.m1000.fasta \
5.assembly_evaluation/all_assembled_genomes/scaffolds_--output-dir 8.analyses/BGC/${array[$SLURM_ARRAY_TASK_ID]}
Explanation of AntiSMASH parameters | |
---|---|
–cb-subclusters | compare identified clusters against known subclusters responsible for synthesising precursors |
–cb-knownclusters | compare identified clusters against known gene clusters from the MIBiG database |
–cb-general | compare identified clusters against a database of antiSMASH-predicted clusters |
–smcog-trees | generate phylogenetic trees of sec. met. cluster orthologous groups. |
-c | number of CPUs |
–taxon | taxonomic classification of input sequence |
–asf | run active site finder analysis |
–genefinding-tool | specify algorithm used for gene finding |
Output files - index.html: web browser based summary - .gbk files are created for each biosynthetic gene cluster identified
3. Phylogenetic genome-wide tree
Let’s make a quick and easy phylogenetic tree with the 10 isolates, using the core gene alignment generated by GTDB-Tk and FastTree. FastTree infers approximately-maximum-likelihood phylogenetic trees from alignments of nucleotide or protein sequences.
1. Make a new directory
mkdir -p 8.analyses/phylogenetic_tree
2. Fetch the core gene alignment from GTDB-Tk
Unzip and move over gtdbtk.bac120.user_msa.fasta.gz
gzip -d 7.genome_annotation/taxonomy/align/gtdbtk.bac120.user_msa.fasta.gz > 8.analyses/phylogenetic_tree/gtdbtk.bac120.user_msa.fasta
cd 8.analyses/phylogenetic_tree/
3. Build the phylogenetic tree
In the terminal, load the FastTree module then run it.
module load FastTree/2.1.11-GCCcore-9.2.0
FastTree gtdbtk.bac120.user_msa.fasta > tree_file.txt
We will move to RStudio now to visualise the tree.