3.2 - Mapping Illumina sequences to a reference¶
time
- Teaching: 10 minutes
- Exercises: 20 minutes
Objectives and Key points
Objectives¶
- Use
bowtie2
to index a reference genome and map DNA paired-end reads against a reference genome.
Keypoints¶
- Understand how to index a reference sequence for mapping.
- Understand how to apply
bowtie2
to map a set of DNA paired-end reads to the reference.
Indexing the reference sequence¶
Before we can map our sequence data to the reference genome (or gene sequence) obtained in the previous exercise we need to perform a step known as indexing. How this process works is well beyond the scope of this tutorial, but it is a process of performing a scan of the reference sequence and transforming it into an organised format amenable to the bowtie2
rapid mapping algorithm.
Navigate to the /nesi/project/nesi03181/phel/USERNAME/level2/mapping/
directory and perform the following commands:
code
This will create a series of files with the prefix Mbovis_87900.16S_rRNA
and extensions .bt2
in the references/
folder. The number of files depends on the size of the reference sequence which was indexed, but collectively these files comprise the index for the reference. When we perform mapping we specify the target as the file path reference/Mbovis_87900.16S_rRNA
and bowtie2
will automatically find and make sense of the index files.
Exercise
Once this has completed, also index the full genome file (Mbovis_87900.genome.fna
) to practice mapping against a more realistic reference sequence.
Mapping reads with bowtie2
¶
Once you have an index produced, it is now time to map the short sequences against the reference. The nature of the bowtie2
mapping tool is that it can be run with one of several preset configurations depending on your requirements, or you can devote quite a bit of time ot fine-tuning the parameters to optimise your output. As with most tools, you can view these options by running the command with the -h
(help) parameter, or read the online manual for the tool.
For today, we will only discuss the parameters which we are going to use:
Parameter | Value | Purpose |
---|---|---|
--sensitive |
Use the 'sensitive' mapping parameters for end-to-end read mapping. Mapping can be performed on a sliding scale changing sensitivity (thoroughness) for speed. |
|
-x |
references/Mbovis_87900.16S_rRNA |
The path to the index file(s) corresponding to the reference sequence |
-1 |
reads/Mbovis_87900.miseq_R1.fq.gz |
The forward reads file to be mapped |
-2 |
reads/Mbovis_87900.miseq_R2.fq.gz |
The reverse reads file to be mapped |
-S |
Mbovis_87900.16S_rRNA.bowtie2.sam |
The output sam file to which the results are to be written |
Assemble these values into a command and run:
code
Exercise
Repeat the mapping command, this time against the full reference genome.
You will need to increase the number of computing threads to the maximum available in your JupyterHub
session to get a reasonable run time. Use the help manual to find the parameter for adjusting the thread number.