WORK IN PROGRESS

Genomic Data Carpentry (Aotearoa edition)

Getting started with genomics

This is a beginner-friendly workshop, designed to get you started with the world of genomics. Whatever you’re doing—whether it’s transcriptomics, genome assembly, variant calling, metagenomics, or something else—if you will be using genomic data this workshop is for you!

What’s covered in this workshop

  • Organisation—from messy lab books and excel spreadsheets to tidy, computer-friendly data
  • Working with sequencing facilities and understanding genomic data types
  • Data storage repositories, public services and facilities, and principles of FAIR and CARE
  • Quality control, wrangling of raw reads and an introduction to genomic terminology

What’s NOT covered in this workshop

  • Basic descriptions of biological and genetic concepts (i.e., we assume the learner is already familiar with DNA/RNA, PCR, transcription, etc. to an undergraduate level).
  • Non-NGS sequencing and services (e.g., Sanger sequencing, qPCR, genotyping, probe-based applications such as microarrays and NanoString nCounter).
  • Genomic analysis workflows (beyond the basics of initial quality checks of raw reads)
  • The basics of cluster or HPC resourcing and specialised software (e.g., we do not cover schedulers such as SLURM, partitions/CPUs/GPUs, chosing compute allocation allowance). See our workshop on Introduction to Bash Scripting and HPC Job Scheduler for this.
  • Using shell or other bioinformatic tools, beyond the very basics (e.g., we do not cover writing/submitting bash scripts, modules, accessing the cluster using ssh).

Glossary

Term Definition
Adapter Short synthetic DNA sequence ligated to the DNA molecule during library prep which allows the molecule to bind to the flow cell during sequencing and also provides a primer binding site
bp Base pair
HCS High Capacity Storage
HPC High Performance Computing
Index Also known as a barcode. Short unique sequence added to each DNA molecule in one library, allowing the identification of that library/sample and thereby enabling pooling of the libraries (one run = cheaper).
Mb Megabase pair (1,000,000 bp)
MB Megabyte
Multiplexing Sequencing multiple samples simultaneously in one run by combining libraries into one pool. Samples (i.e., libraries) are de-multiplexed (separated) in silico usually by the technician, based on unique indices.
Gb Gigabase pair (1,000,000,000 bp)
GB Gigabyte
NGS Next-generation sequencing
SE / PE Single-end / Paired-end

Attribution

Parts of this workshop were adapted from and inspired by content from The Carpentries Data Carpentry lessons on Genomics.

All Carpentries instructional material is made available under the Creative Commons Attribution license CC BY 4.0. The material in this workshop is not endorsed by the Carpentries and has been adapted by Genomics Aotearoa for our own teaching purposes.

In this workshop, the following lessons were adapted from The Carpentries Data Carpentry in the manner stated below:

Material is used in this workshop from our other Genomics Aotearoa workshops, as below:

Other material used in this workshop:

Definitions:

  • Re-used material: Almost word-for-word, including images, with minor wording or styling modifications.
  • Minimally-adapted material: Inspired by stylistic choices and general workflow, but material is primarily developed by Genomics Aotearoa.

Made with ❤️ and Quarto