WORK IN PROGRESS

Genomic Data Carpentry (Aotearoa edition)

Getting started with genomics

This is a beginner-friendly workshop, designed to get you started with the world of genomics. Whatever you’re doing—whether it’s transcriptomics, genome assembly, variant calling, metagenomics, or something else—if you will be using genomic data this workshop is for you!

Prerequisites

Learners are expected to have a basic (undergraduate) level understanding of biological and genetic concepts, but no familiarity with genomics or bioinformatic/computational skill is required.

What’s covered in this workshop

  • Ethical data collection from a New Zealand perspective
  • Organisation—from messy lab books and excel spreadsheets to tidy, computer-friendly data
  • Working with sequencing facilities and understanding genomic data types
  • Data storage repositories and public services and facilities
  • Quality control, wrangling of raw reads and an introduction to genomic terminology

What’s NOT covered in this workshop

  • Basic descriptions of biological and genetic concepts.
  • Traditional sequencing and services (e.g., Sanger sequencing, qPCR, genotyping, probe-based applications such as microarrays and NanoString nCounter).
  • Genomic analysis workflows. We have multiple dedicated workshops for genomic pipelines; see our portfolio here.
  • Using shell or other bioinformatic tools. See our workshops on Introduction to shell and Introduction to R to get you started on this.
  • Understanding the cluster, HPC resourcing and specialised software (e.g., we do not cover schedulers such as SLURM, partitions/CPUs/GPUs, choosing compute allocation allowance). See our workshop on Introduction to Bash Scripting and HPC Job Scheduler for this.

Glossary

Term Definition
Adapter Short synthetic DNA sequence ligated to the DNA molecule during library prep which allows the molecule to bind to the flow cell during sequencing and also provides a primer binding site
bp Base pair
DGE Differential Gene Expression (analysis)
HCS High Capacity Storage
HPC High Performance Computing
HiFi High fidelity (PacBio)
Index Also known as a barcode. Short unique sequence added to each DNA molecule in one library, allowing the identification of that library/sample and thereby enabling pooling of the libraries for sequencing (one run = cheaper).
Gb Gigabase pair (1,000,000,000 bp)
GB Gigabyte (file size / storage size)
Mb Megabase pair (1,000,000 bp)
MB Megabyte (file size / storage size)
Multiplexing Sequencing multiple samples simultaneously in one run by combining libraries into one pool. Samples (i.e., libraries) are de-multiplexed (separated) in silico usually by the technician, based on unique indices.
NGS Next-generation sequencing
ONT Oxford Nanopore Technologies (sequencing company). Often referred to as “Nanopore”.
PacBio Pacific Biosystems (sequencing company)
Resequencing Sequencing part of an individual’s genome in order to detect sequence differences between the individual and the standard genome of the species. Often performed to detect SNPs, genotypes, variants.
SE / PE Single-end / Paired-end
SMRT Single molecule real time (PacBio)

Attribution

This workshop was developed by Dr Chloé van der Burg for the Genomics Aotearoa Bioinformatics Training Programme.

Parts of this workshop were re-used or adapted from The Carpentries Data Carpentry lessons on Genomics.

All Carpentries instructional material is made available under the Creative Commons Attribution license CC BY 4.0. The material in this workshop is not endorsed by the Carpentries and has been adapted by Genomics Aotearoa for our own teaching purposes.

In this workshop, the following lessons were adapted from The Carpentries Data Carpentry in the manner stated below:

Material in this workshop was also re-used from our other Genomics Aotearoa workshops, which includes:

NOTE: Some of these workshops include attribution to other source materials, see the attribution notices enclosed within.

Diagrams and images were also re-used in this workshop from online reference material, as follows:

Definitions:

  • Re-used material: Almost word-for-word, including images, with minor wording or styling modifications.
  • Minimally-adapted material: Inspired by stylistic choices and general workflow, but material is primarily developed by Genomics Aotearoa.

Made with ❤️ and Quarto