Intermediate Shell for Bioinformatics¶

Lesson | Overview |
---|---|
1. UNIX,Linux & UNIX Shell | Introduction to UNIX operating system, Linux and UNIX Shell |
2. UNIX Shell Basics & Recap | Navigating Files & Directories and a review of commands used in routine tasks |
3. Download and verify data | Downloading data with wget /curl and check the transferred data’s integrity with check‐sums |
4. Streams, Redirection and Pipe | Combining pipes and redirection, Using "Exit" statuses |
5. Inspecting and Manipulating Text Data with UNIX Tools - Part 1 | Inspect file/s with utilities such as head ,less . Extracting and formatting tabular data. Magical grep . |
6. Inspecting and Manipulating Text Data with UNIX Tools - Part 2 | Substitute matching patterns with sed . Text processing with awk and bioawk |
7. Automating File-Processing with find and xargs | Search files by pattern with find and use xargs to execute a command for those objects matching the pattern |
8. Puzzles 🧩 | Can you use shell scripts to solve these "real" life challenged in molecular biology ? |
9. Supplementary_1 | Escaping, Special Characters |
Attribution Notice¶
- This workshop material is heavily inspired by :
- Buffalo, V (2015). Bioinformatics Data Skills.O'Reilly Media, Inc
- The Carpentries. The Unix Shell . https://swcarpentry.github.io/shell-novice/
- The Carpentries. Introduction to Command Line for Genomics. https://datacarpentry.org/shell-genomics/
- Rosalind Project. https://rosalind.info/about/
License¶
Genomics Aotearoa / New Zealand eScience Infrastructure "Intermediate-Advanced Shell for Bioinformatics" is licensed under the GNU General Public License v3.0, 29 June 2007 . (Follow this link for more information)
Setup¶
If possible, we do recommend using the Remote option over Local ( Especially for Windows hosts). This will eliminate the need to install any additional applications
- Remote option will require an existing NeSI Account
Remote¶
Log into NeSI Mahuika Jupyter Service
- Follow https://jupyter.nesi.org.nz/hub/login
Enter NeSI username, HPC password and 6 digit second factor token
Choose server options as below
>>make sure to choose the correct project codenesi02659
, number of CPUsCPUs=4
, memory8 GB
prior to pressingbutton.
Local
¶
Local host setup - Windows, MacOS & Linux
- Install either
- Git for Windows from https://git-scm.com/download/win OR
- MobaXterm Home (Portable or Installer edition) from https://mobaxterm.mobatek.net/download-home-edition.html
- Portable edition does not require administrative privileges
- Native terminal client is sufficient.
- It might not comes with
wget
download data via command line (can be installed with$ brew install wget
) - However, it is not required as we provide a direct link to download data in .zip format
- Native terminal client is sufficient.
bioawk
install on all hosts
One of the tools used in this workshop is bioawk
which is not a native Linu/UNIX utility. Installing it on MacOS and Linux can be done with $ brew install bioawk
& $ sudo apt install bioawk
, respectively. Windows hosts might have to do it via conda
according to these instructions. However, this will require a prior install of Anaconda Or Miniconda