Skip to content

An introduction to the data

The data that we will be working with today came from sequences generated for this study. Our data is a subset of the sequence data reanalysed using a popular amplicon sequence processing pipeline QIIME2. You can explore how this data was generated in the supplementary materials.

This data consists of 21 sediment samples taken from the Waiwera Estuary (North of Auckland). It was part of a study that looked at how prokaryotic communities and their nitrogen cycling fractions changed along a gradient of mud contents. Alongside sequence data, environmental variables such as mud content (percentage dry weight of clays and silt) and various chemical data such as carbon, nitrogen, sulfur and phosphorus were also measured.

This dataset follows a typical microbial ecology study structure, and has three distinct data files:

  • asv: a count of organisms (here, ASV or amplicon sequence variants) per sample
  • tax: describes the taxonomic lineage of ASVs
  • env: sample metadata describing various non-biological measurements obtained from the samples

The data set, and any inference that is generated from it, is not our primary interest today. Here, it is used as an example to illustrate the functionality of the concepts in the lessons and as a convenient source of data.

Load packages and import data

code

library(vegan)
library(dplyr)
library(stringr)
library(purrr)
library(tidyr)
library(tibble)
library(ggplot2)

asv <- read.delim("https://raw.githubusercontent.com/GenomicsAotearoa/Intermediate-R/main/tables/asv_table.tsv")
env <- read.delim("https://raw.githubusercontent.com/GenomicsAotearoa/Intermediate-R/main/tables/env_table.tsv")
tax <- read.delim("https://raw.githubusercontent.com/GenomicsAotearoa/Intermediate-R/main/tables/taxonomy.tsv")