1.3 - Manipulating files in the shell¶
time
- Teaching: 20 minutes
- Exercises: 20 minutes
Learning objectives
Objectives
- View the contents of basic text files.
- Copy, move, and rename files and create/remove directories.
- Make a file read only.
- Use the
history
command to view and repeat recently used commands.
Key points
- You can view file contents using
less
,cat
,head
ortail
. - The commands
cp
,mv
, andmkdir
are useful for manipulating existing files and creating new directories. - The
history
command and the up arrow on your keyboard can be used to repeat recently used commands.
Viewing the contents of files¶
From the previous exercises we know how to move around the file system of NeSI, but how do we look at the contents of files? One way to examine a file is to print out all of the contents using the program cat
.
To look at a text file, navigate to the shell_data/
directory in your training folder and try to run the following command:
That wasn't very helpful. It printed the full file content to the screen. It's a very small file, as bioinformatic files go, but still far to much to scan by eye. For smaller files, cat
is a terrific tool but when the file is really big, it can be annoying to use.
Fortunately there is another handy tool to read large files in a more manageable way. The command less
can be used to open a file and navigate through it line by line. Enter the following command:
This will load the content of the file into your terminal, but rather than print every line instantly it will only show those that can fit on one page. Since this is a tool designed to run from the command line only we generally need to navigate using the keyboard. Some of the commonly used navigation commands are:
Key | Action |
---|---|
↓ | Go forward one line |
↑ | Go back one line |
Space | Go forward one page |
b | Go back one page |
g | Return to the beginning of the file |
G | Jump to the end of the file |
q | Quit |
That said, you can also scroll forward and back theough the file using the mouse scroll wheel.
less
can also be used to search through files. Use the / key to begin a search. Enter the word you would like to search for and press Enter. The screen will jump to the next location where that word is found. You can seach for the next word by pressing / repeatedly. Each time, less
searches from the current location forward. If you need to go back one entry, use ?.
Exercise
As an example, let's search forward for the sequence TTTTT
in our file What are the next three nucleotides (characters) after the first instance of this sequence?
Solution
CAC
Sometimes we want to strike a balance between cat
and less
. We need to see a bit of the file, but we don't want to look at it line by line. This is uaually when we need to process a file in some way, and we just need to remind outselves how it's formatted.
The commands head
and tail
are for this task. They let you look at the beginning and end of a file, respectively.
code
Output
@SRR098026.1 HWUSI-EAS1599_1:2:1:0:968 length=35
NNNNNNNNNNNNNNNNCNNNNNNNNNNNNNNNNNN
+SRR098026.1 HWUSI-EAS1599_1:2:1:0:968 length=35
!!!!!!!!!!!!!!!!#!!!!!!!!!!!!!!!!!!
@SRR098026.2 HWUSI-EAS1599_1:2:1:0:312 length=35
NNNNNNNNNNNNNNNNANNNNNNNNNNNNNNNNNN
+SRR098026.2 HWUSI-EAS1599_1:2:1:0:312 length=35
!!!!!!!!!!!!!!!!#!!!!!!!!!!!!!!!!!!
@SRR098026.3 HWUSI-EAS1599_1:2:1:0:570 length=35
NNNNNNNNNNNNNNNNANNNNNNNNNNNNNNNNNN
code
Output
+SRR098026.247 HWUSI-EAS1599_1:2:1:2:1311 length=35
#!##!#################!!!!!!!######
@SRR098026.248 HWUSI-EAS1599_1:2:1:2:118 length=35
GNTGNGGTCATCATACGCGCCCNNNNNNNGGCATG
+SRR098026.248 HWUSI-EAS1599_1:2:1:2:118 length=35
B!;?!A=5922:##########!!!!!!!######
@SRR098026.249 HWUSI-EAS1599_1:2:1:2:1057 length=35
CNCTNTATGCGTACGGCAGTGANNNNNNNGGAGAT
+SRR098026.249 HWUSI-EAS1599_1:2:1:2:1057 length=35
A!@B!BBB@ABAB#########!!!!!!!######
By default the first/last 10 lines are printed. This can be changed by adding the -n
option to the command to change the number.
code
Learning what these symbols all mean
If you want to learn more about the FASTQ file format, and what these symbols mean see the brief description document here
Basic file manipulation¶
We now know how to read files through the command line, but before we worry about any complex bioinformatic work the most basic thing we need to be able to do on the command line, other than navigating directories, is moving, copying, and deleting files and directories. These are operations which you probably perform daily on your desktop computer and these all exist on the command line as well.
There are commands available on the command line which do all of these things. These typically have short names which are derived from the word that represent:
Command | Action |
---|---|
mkdir |
Make a new directory. |
rmdir |
Remove (delete) an empty directory. |
cp |
Copy a file, either to a new location or into a new file. |
mv |
Move a file from one location to another. Can also be used to rename files by 'moving' them to a file with a different name. |
rm |
Remove a file. This is how you permanently delete files. |
Creating and removing directories¶
The mkdir
("make directory") command is used to make a new directory. Enter mkdir
followed by a space, then the directory name you want to create:
If you want to create multiple directories at once, you can specify multiple names:
As long as these directories are empty, they can be removed with the rmdir
("remove directory") command:
If you try to remove a directory with files in it, you will receive an error and the directory will remain intact.
Copying and moving files¶
The cp
("copy") and mv
("move") commands are mostly identical in how they work. Each command can be used in one of two ways. We can either copy/move a file from one name to another, or we can copy/move them into a new directory without changing the file names.
In this pair on commands, we first create a copy of the file SRR097977.fastq
with the name SRR097977.fq_backup
. These are identical in their content.
We then move/rename the SRR097977.fq_backup
file (effectively, moving the contents of the file into a new file) to a different file named SRR097977.fq_bkup
.
If you run the ls
command you will see that SRR097977.fastq
is still present, as it was only copied, but SRR097977.fq_backup
no longer exists.
Note
We can also use the mv
command to rename directories. However, the cp
command by default does not work for directories, it must be invoked with a specific parameter to copy directory and it's contents.
code
The other way we can use these commands is to copy/move one or more files into a different directory. This can be handy when creating backups of data we do not wish to risk losing (copy), or when we want to organise data into different folders to make navigation easier (move).
In these cases, we either create duplicates of the target files in a new location, or move an existing file into a new location.
Removing files¶
To remove a file, it's super easy. Just use the rm
("remove") command:
Boom. It's gone. There is no Recycle Bin on the command line and there is no way to get that file back...
Note
We need to be really careful with the rm
command and be very sure of which files you are removing. To try and avoid unwanted loss of data, the rmdir
command only removes empty directories and by default the rm
command will not delete directories either. At this level, we will not expand upon this further.
Command history¶
We've done a lot on the command line, but how do we keep track of everything so far? What if we forget something we've done recently, and want to check exactly what we did?
You can view previous commands using the up arrow on your keyboard to go back through your recent commands. Likewise, the down arrow takes you forward in the command history. If what you're looking for is only a few commands ago this is the best way to see the information.
If you are looking for something from a long time ago, you can view a list of your last ~1,000 commands with the history
command.
code
You can reuse one of these commands directly by referring to the number of that command. For example, if your history looked like above you could repeat command #1055 by entering:
This can be really useful when you are taking notes on a complicated set of commands you have run, or if you are trying to remember what you did last time you were logged into NeSI.