Etherpad template
Last updated on 2023-09-27 | Edit this page
Introduction to the Command Line for Genomic
Date
Instructor
Helper
Website https://datacarpentry.org/shell-genomics
Timings - EDIT FOR YOUR WORKSHOP
Here’s a tentative schedule for the workshop
- Introducing the Shell 09:30-10:00 (30 mins)
- Navigating Files and Directories 10:00-10:50 (50 mins)
- Break 10:50-11:05 (15 mins)
- Working with Files and Directories 11:05-11:50 (45 mins)
- Redirection 11:50-12:35 (45 mins)
- Break 12:35-13:30 (55 mins)
- Writing Scripts and Working with Data 13:30-14:10 (40 mins)
- Project Organization 14:10-14:40 (30 mins)
- Wrap up and feedback 14:40-14:55 (15 mins)
Attendees
Please write your name below to confirm your attendance and mention one thing you hope to learn from this workshop.
Navigating Files and Directories
Exercise: FINDING HIDDEN DIRECTORIES
First navigate to the shell_data
directory. There is a
hidden directory within this directory. Explore the options for
ls
to find out how to see hidden directories. List the
contents of the directory and identify the name of the text file in that
directory.
Hint: hidden files and folders in Unix start with ., for example .my_hidden_directory
Exercise: NAVIGATING PRACTICE
Navigate to your home directory. From there, list the contents of the
untrimmed_fastq
directory.
Exercise: RELATIVE PATH RESOLUTION
Using the filesystem diagram on the lesson page below,
if pwd displays
/Users/thing
, what will ls
../backup display
?
Put a +
next to the answer you think is correct.
- ../backup: No such file or directory
- 2012-12-01 2013-01-08 2013-01-27
- 2012-12-01/ 2013-01-08/ 2013-01-27/
- original pnas_final pnas_sub
Working with Files and Directories
Exercise:
Do each of the following tasks from your current directory using a
single ls
command for each:
- List all of the files in
/usr/bin
that start with the letter ‘c’. - List all of the files in
/usr/bin
that contain the letter ‘a’. - List all of the files in
/usr/bin
that end with the letter ‘o’.
Bonus: List all of the files in /usr/bin
that contain
the letter ‘a’ or the letter ‘c’.
** | Hi | nt:** The bonus question requires a Unix wildcard that we haven’t talked about yet. Try searching the internet for information about Unix wildcards to find what you need to solve the bonus problem. |
---|---|---|
## | # | Exercise: echo and wildcards |
echo
is a built-in shell command that writes its
arguments, like a line of text to standard output. The echo
command can also be used with pattern matching characters, such as
wildcard characters. Here we will use the echo
command to
see how the wildcard character is interpreted by the shell.
BASH
$ echo *.fastq
What would the output look like if the wildcard could not be matched?
Compare the outputs of
echo *.missing
`l s *.missing` ## # Exercise: command history
Find the line number in your history
for the command
that listed all the .sh
files in /usr/bin
.
Rerun that command.
Exercise: Examining Files
Print out the contents of the
~/shell_data/untrimmed_fastq/SRR097977.fastq
file. What is the last line of the file?From your home directory, and without changing directories, use one short command to print the contents of all of the files in the
~/shell_data/untrimmed_fastq
directory.
Exercise: Examining Files
Use less
on the file SRR097977.fastq and find the next
three nucleotides (characters) after the first instance of the sequence
TTTTT
?
Exercise:
Starting in the shell_data/untrimmed_fastq/
directory,
do the following:
- Make sure that you have deleted your backup directory and all files it contains.
- Create a backup of each of your FASTQ files using cp. (Note: You’ll need to do this individually for each of the two FASTQ files. We haven’t learned yet how to do this with a wildcard.)
- Use a wildcard to move all of your backup files to a new backup directory.
- Change the permissions on all of your backup files to be write-protected.
Redirection
EXERCISE:
Search for the sequence
GNATNACCACTTCC in the
SRR098026.fastq` file. Have your search return all matching lines and the name (or identifier) for each sequence that contains a match.Search for the sequence
AAGTT
in both FASTQ files. Have your search return all matching lines and the name (or identifier) for each sequence that contains a match. - - -