Skip to content

Data preparation

1. Prepare working environment

Load required libraries and data.

code

# Load libraries
library(dplyr)
library(stringr)
library(readr)
library(tidyr)

# Import data
asv <- read_tsv("filtered_feature_table.tsv")
env_data <- read_tsv("sample_metadata.tsv", name_repair = "universal")
duu <- read_tsv("unweighted_unifrac.tsv")
dwu <- read_tsv("weighted_unifrac.tsv")

# Create output directory
dir.create("./tables/")

2. Clean environmental metadata

There is a block of samples that were originally part of a pilot project. The amount of sample matter collected were insufficient for subsequent chemical analyses. It was retained in the original study as a reference point to determine if there were random effects that would affect the data. For the purposes of a workshop, these samples without chemical data should be removed to streamline the curriculum.

Cleaning tasks:

  • Remove samples with NA in chemical data
  • Arrange rows based on sample ID
  • Convert column names to small letters

code

env_clean <- env_data %>%
  drop_na() %>%
  arrange(Sample) %>%
  rename_with(str_to_lower)

3. Clean ASV table

Cleaning tasks:

  • Retain samples based on env_clean$sample
  • Remove ASVs with row sums of zeroes

code

asv_clean <- asv %>%
  select(ASVID, all_of(env_clean$sample)) %>%
  filter(
    rowSums(
        select(., where(is.numeric))
    ) > 0
  )

4. Clean distance matrices

Cleaning tasks:

  • Retain rows and columns based on env_clean$sample
  • Rearrange rows and columns based on env_clean$sample

code

duu_clean <- duu %>%
  filter(sample %in% env_clean$sample) %>%
  select(sample, all_of(env_clean$sample)) %>%
  arrange(sample)

all(diag(as.matrix(duu_clean[, -1])) == 0)

dwu_clean <- dwu %>%
  filter(sample %in% env_clean$sample) %>%
  select(sample, all_of(env_clean$sample)) %>%
  arrange(sample)

all(diag(as.matrix(dwu_clean[, -1])) == 0)

Write out cleaned data

code

write_tsv(asv_clean, "tables/asv_table.tsv")
write_tsv(env_clean, "tables/env_table.tsv")
write_tsv(duu_clean, "tables/unweighted_unifrac.dist")
write_tsv(dwu_clean, "tables/weighted_unifrac.dist")

Also, manually copy taxonomy.tsv into tables/.

From now on, tables/ is the source of data for lessons.