3. Introduction to shell Commands (2)¶
Examining file content¶¶
Terminal
There are a number of ways to examine the content of a file. cat
and less
are two commonly used programs for a quick look. Check the content of SRR097977.fastq by using these commands. Take a note of the differences.
- Take a look at the content of
gen360_1.tsv
withcat
command as below
cat
command will print the content of the file to display . This is not convenient for files with a lot of rows as it Terminals in their default settings will not allow us to scroll all the way back to the top. Usingless
command is slightly better
- Use Up and Down arrow keys to navigate in
less
output
- What if we want to take a look at the "beginning" (
head
) or just the "end"(tail
) of the file
head
andtail
command will print top and bottom 10 lines, respectively.- What If we want to take a look at top 15 lines ? . Both
head
andtail
commands have a-n
(number of lines) which allows us to over-ride the default
Doesn't require a full "view", just want to count the number of lines ?¶
Terminal
wc
(short for word count) is a command line tool in Unix/Linux operating systems, which is used to find out the number of newline count, word count, byte and character count in the files specified
output
996
: Number of lines1992
: Number of Words47552
: Number of bytes
Run wc
command with -l
, -w
and -m
options against the SRR097977.fastq
file and review the outputs ?
Redirection and extraction¶¶
Terminal
- Although using
cat
andless
commands will allow us to view the content of the whole file, most of the time we are in search of particular characters (strings) of interest, rather than the full content of the file. One of the most commonly used command-line utilities to search for strings isgrep
. Let's use this command to search for the string EUR ingen360_1.tsv
file.
- We can think of
grep
as a "extremely" powerful "search" command - Running
grep NNNNNNNNNN SRR098026.fastq
printed the output to terminal which is not reliable during when we have to revise or re-use. In order for "string" of interest to be used for other operations, this has to be "redirected" (captured and written into a file). The command for redirecting output to a file is>
. Redirecting the string ofEUR
that was searched using the grep command to a fileeur.txt
can be done with
- In other words,
>
operates as aSave
command
Loops¶
Loops are a common concept in most programming languages which allow us to execute commands repeatedly with ease. Using loops also reduces the amount of typing (and typing mistakes). Loops are helpful when performing operations on groups
Therefore three basic loop constructs in bash
scripting, for
, while
and until
Let's take a quick look at for
loop
shell-data/untrimmed_fastq
directory has two .tsv files- Let's say we want to take a look at the top four lines of both files,
- We can use
head
command with-n 4
option and execute it to two files separately as below
- We can use
- Not so much of any issue where it's one or two files but not very convenient when we have to deal with tens or hundreds or thousands. We can use a
for
loop to execute this recursive task by using a "common" factor in filename ( or other attributes) .i.e.- Identify and isolate the files by using a common factor. In this instance, we will use
.fastq
file extension - Then assign the values ( filenames) of those files to what is known as a control variable
- apply the command to control variable which is holding the values ( In this instance, file names)
- Identify and isolate the files by using a common factor. In this instance, we will use
- Constructing the
for
loop- Always starts with
for
(When the shell sees the keywordfor
, it knows to repeat a command (or group of commands) once for each item in a list) - control varialble holding the filenames will be
filename
( this can be anything we want. )
- Always starts with
output
@SRR097977.1 209DTAAXX_Lenski2_1_7:8:3:710:178 length=36
TATTCTGCCATAATGAAATTCGCCACTTGTTAGTGT
+SRR097977.1 209DTAAXX_Lenski2_1_7:8:3:710:178 length=36
CCCCCCCCCCCCCCC>CCCCC7CCCCCCACA?5A5<
@SRR098026.1 HWUSI-EAS1599_1:2:1:0:968 length=35
NNNNNNNNNNNNNNNNCNNNNNNNNNNNNNNNNNN
+SRR098026.1 HWUSI-EAS1599_1:2:1:0:968 length=35
!!!!!!!!!!!!!!!!#!!!!!!!!!!!!!!!!!!