Skip to content

Automating a Variant Calling Workflow


  • Put all the steps from the previous lesson into a script.

Variant calling workflow

Remember our variant calling workflow has the following steps:

  • Index the reference genome for use by bwa and samtools.
  • Align reads to reference genome.
  • Convert the format of the alignment to sorted BAM, with some intermediate steps.
  • Calculate the read coverage of positions in the genome.
  • Detect the single nucleotide variants (SNVs).
  • Filter and report the SNVs in VCF (variant calling format).

Let's start with creating a new directory as our script working space and copy all the required resources.

Output - /home/$USER/scripting_workshop
mkdir script_workspace
cd script_workspace
#Don't forget the  . at the end of the line
cp -r /nesi/project/nesi02659/scripting_workshop/variant_calling/* .

Output - ref_genome trimmed_reads

Now we are ready to start building the script.


Prefer Jupyter file over nano ?

You are welcome to choose Jupyter interactive file option over nano. If this is your preferred option, below are the instructions on how to create a file

  1. Make sure to navigate the left file explorer to correct path
  2. Right click on the explorer to open the menu and choose New file
  3. Rename the file as


In the text editor, type the commands


# Jane Doe
# 13 April 2023

# This script runs the variant calling pipeline from mapping to vcf.

set -e
# Load all the required modules
module purge
module load BWA/0.7.17-GCC-9.2.0
module load SAMtools/1.13-GCC-9.2.0
module load BCFtools/1.13-GCC-9.2.0

# create the results directories
mkdir -p results/sam results/bam results/bcf results/vcf

# indexing the genome
bwa index $genome

# create a loop that map reads to the genome, sort the bam files and call variants
for fq1 in trimmed_reads/*_1.trim.sub.fastq
    echo "working with file $fq1"

    base=$(basename $fq1 _1.trim.sub.fastq)
    echo "base name is $base"

   # setting the variables

  # running the analysis steps
  bwa mem $genome $fq1 $fq2 > $sam
  samtools view -S -b $sam > $bam
  samtools sort -o $sorted_bam $bam
  samtools index $sorted_bam
  bcftools mpileup -O b -o $raw_bcf -f $genome $sorted_bam
  bcftools call --ploidy 1 -m -v -o $variants $raw_bcf varFilter $variants > $final_variants


Shell variables

A variable is a character string to which we assign a value. The value assigned could be a number, text, filename, device, or any other type of data. A variable is nothing more than a pointer to the actual data. The shell enables you to create, assign, and delete variables.

Running the script (Running the script)

bash ./

Adding executable permissions

The way the script is written means we have to indicate which program to use whenever we are running it. To be able to run without calling bash, we need to change the script permissions.


ls -l 

Output -rw-rw-r-- 1 fayfa80p fayfa80p 1401 Mar 5 22:29

chmod u+x
ls -l 

Output - -rwxrw-r-- 1 fayfa80p fayfa80p 1401 Mar 5 22:29

  • note colour change on the script filename

Now we can execute the script without calling bash



In the Next Lesson, we will prepare the script to run on the HPC environment.