2023 Level 2 proficiency testing (version 1)¶

time

Exercises: 150 minutes

Key information

The aim is to complete each of the exercises before the end of the session.
This testing is open book - you may use any materials from the level 1 or level 2 training and anything online.
This testing is individual - please do not discuss your solutions with other trainees.
The tutors are here to help with significant technical issues such as loss of connection. We are not here to help you identify tools or how to solve specific questions.

All test folders are located in the following location:

code

/nesi/project/nesi03181/phel/proficiency/USERNAME/

Output files

For each exercise, you are free to name your output folders/files whatever you wish.

Please ensure that your results are in the appropriate folder for each exercise, however.

Reporting findings

Whenever an exercise asks your to in some way interpret an output file and report your results, just create a small text file to store your notes.

Name the file according to the exercise that asked for the information.

Exercise 1¶

Navigate to your instance of the 01_illumina_assembly/ folder. You have been provided with the following files:

code

reads/illumina_R1.fq.gz
reads/illumina_R2.fq.gz
reference/reference.fna
results/provided_assembly.fna

Part 1

You are to produce a slurm script that uses an approriate assembly tool to assemble the Illumina MiSeq sequences in the reads/ folder into a draft genome.

Your script will only need about 10 minutes to complete with 16 CPUs.

Part 2

Use an appropriate tool to determine assembly statistics for your assembly file and use the file reference/reference.fna as your reference genome.

If your job does not complete in a reasonable amount of time, you can use the provided results/provided_assembly.fna file instead of your own output.

Report the folllowing metrics from your assembly:

Number of contigs with length greater than or equal than 50 kbp.
The N50 value for the assembly.
Number of indels relative to the reference genome.

Exercise 2¶

Navigate to your instance of the 02_mapping/ folder. You have been provided with the following files:

code

reads/nanopore.fq.gz
reference/reference.fna
results/provided_mapping.sam

Part 1

Use whatever tool is appropriate to map the sequences in the reads/ folder against the reference file, creating a sam file with the mapping results.

Part 2

Once you have completed mapping your reads, sort and compress the mapping information from your output sam file.

If you are unable to complete your mapping job for any reason, use the file supplied at `results/provided_mapping.sam.

Produce a depth report from the compressed mapping file.
When you have produced your compressed file, filter the results to keep only the mapped reads in the mapping output.

Exercise 3¶

Navigate to your instance of the 03_gene_calling/ folder. You have been provided with the following file:

code

input_sequences.fna

Question

You have been provided with a draft assembly of a prokaryotic genome. Create predictions of the protein coding regions of the assembly using whichever tool you believe is appropriate.

Exercise 4¶

Navigate to your instance of the 04_classification_kraken2/ folder. You have been provided with the following files:

code

inputs/input_sequences.fna
results/provided_kraken2.txt

Part 1

Write a slurm script to execute a kraken2 classification of the sequences provided in inputs/input_sequences.fna.

When selecting a database, use the PlusPFP database located at:

code

/nesi/project/nesi03181/phel/databases/k2_pluspfp_16gb_20231009/

Check your notes carefully to make sure you understand how to direct kraken2 towards this database.

Part 2

You have been provided with an output file of the above classification job - results/provided_kraken2.txt. Use your knowledge of the command line to search through the output file and identify the most likely species to be in this sample.

Create a text file containing your findings. You do not need to report every species hit in the provided_kraken2.txt file - just pick the candidates you believe are most likely and give a quick 1-sentence justification for each choice.

Exercise 5¶

Navigate to your instance of the 05_classification_diamond/ folder. You have been provided with the following file:

code

inputs/input_sequences.faa
results/provided_diamond.txt

Part 1

Write a slurm script to execute a diamond classification of the sequences provided in inputs/input_sequences.faa.

When selecting a database, use the uniprot_sprot.dmnd database located at:

code

/nesi/project/nesi03181/phel/databases/swissprot_dmnd/uniprot_sprot.dmnd

Part 2

Examine the outputs of the file results/provided_diamond.txt, remembering the BLAST6 format for column meaning.

Based on the top hit results, determine the most likely genus or species of the organism from which these sequences were obtained. Create a text file containing your conclusion.

Closing out the test¶

When you have completed all of the exercises, and you are happy with your results, please run the following command to log the history of your command line to a file.

code

history > /nesi/project/nesi03181/phel/proficiency/USERNAME/history_log.txt