Data preparation¶
1. Prepare working environment¶
Load required libraries and data.
code
# Load libraries
library(dplyr)
library(stringr)
library(readr)
library(tidyr)
# Import data
asv <- read_tsv("filtered_feature_table.tsv")
env_data <- read_tsv("sample_metadata.tsv", name_repair = "universal")
duu <- read_tsv("unweighted_unifrac.tsv")
dwu <- read_tsv("weighted_unifrac.tsv")
# Create output directory
dir.create("./tables/")
2. Clean environmental metadata¶
There is a block of samples that were originally part of a pilot project. The amount of sample matter collected were insufficient for subsequent chemical analyses. It was retained in the original study as a reference point to determine if there were random effects that would affect the data. For the purposes of a workshop, these samples without chemical data should be removed to streamline the curriculum.
Cleaning tasks:
- Remove samples with NA in chemical data
- Arrange rows based on sample ID
- Convert column names to small letters
3. Clean ASV table¶
Cleaning tasks:
- Retain samples based on
env_clean$sample
- Remove ASVs with row sums of zeroes
code
4. Clean distance matrices¶
Cleaning tasks:
- Retain rows and columns based on
env_clean$sample
- Rearrange rows and columns based on
env_clean$sample
code
duu_clean <- duu %>%
filter(sample %in% env_clean$sample) %>%
select(sample, all_of(env_clean$sample)) %>%
arrange(sample)
all(diag(as.matrix(duu_clean[, -1])) == 0)
dwu_clean <- dwu %>%
filter(sample %in% env_clean$sample) %>%
select(sample, all_of(env_clean$sample)) %>%
arrange(sample)
all(diag(as.matrix(dwu_clean[, -1])) == 0)
Write out cleaned data¶
code
Also, manually copy taxonomy.tsv
into tables/
.
From now on, tables/
is the source of data for lessons.