Getting started with Nextflow¶
Objectives
- Describe the core features of Nextflow.
- Define Nextflow terminology.
- Use the fundamental commands and options for executing pipelines.
What is Nextflow?¶
Nextflow is a workflow orchestration engine that makes it easy to write data-intensive computational pipelines.
It is designed around the idea that the Linux platform is the lingua franca of data science. Linux provides many simple but powerful command-line and scripting tools that, when chained together, facilitate complex data manipulations.
Nextflow extends this approach, adding the ability to define complex program interactions and a high-level parallel computational environment based on the dataflow programming model.
Nextflow’s core features are:
- Pipeline portability and reproducibility
- Scalability of parallelization and deployment
- Integration of existing tools, systems, and industry standards
Whether you are working with genomics data or other large and complex data sets, Nextflow can help you to streamline your pipeline and improve your productivity.
Processes and dataflow¶
In Nextflow, workflows, processes, and dataflow logic are the fundamental building blocks of a pipeline.
A workflow is a specialized function for composing processes and dataflow logic. Workflows connect process inputs and outputs through dataflow logic, defining how data moves through the pipeline. An entry workflow serves as the pipeline’s entry point, while named workflows can be reused and called by other workflows, enabling modular pipeline design.
A process is a unit of execution that represents a single computational step in a pipeline. Each process specifies its inputs and outputs, as well as any directives and conditional statements required for its execution. Processes can be written in any scripting language that can be executed by the Linux platform, such as Bash, Python, Perl, Ruby, or R.
At runtime, each process is invoked as one or more tasks that execute independently. They do not share writable state. Each task consumes an input value, runs the process script, and emits an output value that downstream processes can use. Tasks can run in parallel, making efficient use of available compute resources.
Processes can be parameterised to allow for flexibility and reuse within and across pipelines. Pipeline-level parameters (params) can be passed into processes at runtime to control behaviour, such as specifying input files, output paths, or tool-specific settings.
Dataflow logic defines how data flows between processes through two types of asynchronous dataflow structures:
- A dataflow channel (or simply channel) is an asynchronous sequence of values used to pass data between processes.
- A dataflow value is a single asynchronous value, typically used for inputs shared across all tasks (e.g., a reference genome).
The data dependencies between processes implicitly determine the order of execution, meaning processes run based on their input-output relationships rather than the order they appear in the pipeline script.
Execution abstraction¶
While a process defines what command or script is executed, the executor determines how and where the script is executed.
Nextflow provides an abstraction between the pipeline’s functional logic and the underlying execution system. This means a pipeline can be written once and run on your local machine, an HPC cluster, or a cloud platform without any modification. Only the target executor needs to be defined in the configuration file.
By default, Nextflow executes processes on the local machine, which is useful for development and testing. For production workloads, Nextflow supports major HPC batch schedulers (e.g., SLURM, PBS, Open Grid Engine) and cloud platforms (e.g., AWS, Google Cloud, Azure, Kubernetes).
See Executors for a full list of Nextflow executors.
Installing Nextflow¶
Nextflow is a Groovy-based workflow language (Groovy is a superset of Java) that runs on any POSIX-compatible system (Linux, macOS, WSL on Windows). It requires Bash 3.2 or later and Java 17 or later, and is distributed as a self-installing package. No special installation procedure is required.
Process scripts can be written in any Linux-compatible language (Bash, Python, Perl, Ruby, R, etc.), so you can reuse existing programming knowledge without a steep learning curve.
For today's workshop, Nextflow is already installed on the system we will be using, so no additional steps are needed.
How to install Nextflow locally
- Download the executable package using either
wget -qO- https://get.nextflow.io | bashorcurl -s https://get.nextflow.io | bash - Make the binary executable on your system by running
chmod +x nextflow - Move the nextflow file to a directory accessible by your
$PATHvariable, e.g,mv nextflow ~/bin/
How to load Nextflow on Mahuika
- Check available Nextflow versions:
module avail Nextflow - Load Nextflow version of your choice:
module load Nextflow/<version>
Nextflow options and commands¶
The Nextflow CLI is structured as a set of top-level options and commands.
List them with the -h flag:
Usage: nextflow [options] COMMAND [arg...]
Options:
-C
Use the specified configuration file(s) overriding any defaults
-D
Set JVM properties
-bg
Execute nextflow in background
-c, -config
Add the specified file to configuration set
-config-ignore-includes
Disable the parsing of config includes
-h
Print this help
-log
Set nextflow log file path
-q, -quiet
Do not print information messages
-remote-debug
Enable JVM interactive remote debugging (experimental)
-syslog
Send logs to syslog server (eg. localhost:514)
-trace
Enable trace level logging for the specified package name - multiple packages can be provided separating them with a comma e.g. '-trace nextflow,io.seqera'
-v, -version
Print the program version
Commands:
auth Manage Seqera Platform authentication
clean Clean up project cache and work directories
clone Clone a project into a folder
config Print a project configuration
console Launch Nextflow interactive console
drop Delete the local copy of a project
fs Perform filesystem operations
help Print the usage help for a command
info Print project and system runtime information
inspect Inspect process settings in a pipeline project
kuberun Execute a workflow in a Kubernetes cluster (experimental)
launch Launch a workflow in Seqera Platform
lineage Explore workflows lineage metadata
lint Lint Nextflow scripts and config files
list List all downloaded projects
log Print executions log and runtime info
plugin Execute plugin-specific commands
pull Download or update a project
run Execute a pipeline project
secrets Manage pipeline secrets
self-update Update nextflow runtime to the latest available version
view View project script file(s)
Options for a command can also be viewed by appending the -help option to a Nextflow command.
For example, you can view options for the run command:
Execute a pipeline project
Usage: run [options] Project name or repository url
Options:
-E
Exports all current system environment
Default: false
-ansi-log
Enable/disable ANSI console logging
-bucket-dir
Remote bucket where intermediate result files are stored
-cache
Enable/disable processes caching
-d, -deep
Create a shallow clone of the specified depth
-disable-jobs-cancellation
Prevent the cancellation of child jobs on execution termination
-dump-channels
Dump channels for debugging purpose
-dump-hashes
Dump task hash keys for debugging purpose
-e.
Add the specified variable to execution environment
Syntax: -e.key=value
Default: {}
-entry
Entry workflow name to be executed
-h, -help
Print the command usage
Default: false
-hub
Service hub where the project is hosted
-latest
Pull latest changes before run
Default: false
-lib
Library extension path
-main-script
The script file to be executed when launching a project directory or
repository
-name
Assign a mnemonic name to the a pipeline run
-offline
Do not check for remote project updates
Default: false
-o, -output-dir
Directory where workflow outputs are stored
-params-file
Load script parameters from a JSON/YAML file
-plugins
Specify the plugins to be applied for this run e.g. nf-amazon,nf-tower
-preview
Run the workflow script skipping the execution of all processes
Default: false
-process.
Set process options
Syntax: -process.key=value
Default: {}
-profile
Choose a configuration profile
-qs, -queue-size
Max number of processes that can be executed in parallel by each executor
-resume
Execute the script using the cached results, useful to continue
executions that was stopped by an error
-r, -revision
Revision of the project to run (either a git branch, tag or commit SHA
number)
-stub-run, -stub
Execute the workflow replacing process scripts with command stubs
Default: false
-test
Test a script function with the name specified
-user
Private repository user name
-with-apptainer
Enable process execution in a Apptainer container
-with-charliecloud
Enable process execution in a Charliecloud container runtime
-with-cloudcache
Enable the use of object storage bucket as storage for cache meta-data
-with-conda
Use the specified Conda environment package or file (must end with
.yml|.yaml suffix)
-with-dag
Create pipeline DAG file
-with-docker
Enable process execution in a Docker container
-N, -with-notification
Send a notification email on workflow completion to the specified
recipients
-with-podman
Enable process execution in a Podman container
-with-report
Create processes execution html report
-with-APPTAINER
Enable process execution in a APPTAINER container
-with-spack
Use the specified Spack environment package or file (must end with .yaml
suffix)
-with-timeline
Create processes execution timeline file
-with-tower
Monitor workflow execution with Seqera Platform (formerly Tower Cloud)
-with-trace
Create processes execution tracing file
-with-weblog
Send workflow status messages via HTTP to target URL
-without-conda
Disable the use of Conda environments
-without-docker
Disable process execution with Docker
Default: false
-without-podman
Disable process execution in a Podman container
-without-spack
Disable the use of Spack environments
-w, -work-dir
Directory where intermediate result files are stored
Exercise
Use the help command to find the version command. Then, use the version command to find out which version of Nextflow you are using.
Managing your environment¶
You can use environment variables to control the Nextflow runtime and the underlying Java virtual machine. These variables can be exported before running a pipeline and will be interpreted by Nextflow.
For most users, Nextflow will work without setting any environment variables. However, to improve reproducibility and to optimise your resources, you will benefit from setting some of these variables.
For example, for consistency, it is good practice to pin the version of Nextflow you are using with the NXF_VER variable:
Exercise
Pin the version of Nextflow you are using to 25.04.4 by exporting an environment variable:
Solution
Export the Nextflow version using the NXF_VER environment variable:
Check that the NXF_VER has been applied:
You should see nextflow update and print the following:
In addition to changing the version at the system level, you can set the Nextflow version for a single command:
N E X T F L O W
version 24.10.5 build 5935
created 04-03-2025 17:55 UTC (05-03-2025 06:55 NZDT)
cite doi:10.1038/nbt.3820
http://nextflow.io
Environment variables on Mahuika
The behaviour of Nextflow environment variables won't work as expected if using a Mahuika Nextflow module. If you want to use a different Nextflow version on Mahuika you will need to reload the Nextflow module. To change to version 25.10.0 you could run:
Similarly, if you are using a shared resource, you may also consider including paths to where software is stored and can be accessed using the NXF_APPTAINER_CACHEDIR or the NXF_CONDA_CACHEDIR variables:
Exercise
Export the folder ~/.apptainer_cache as the folder where remote Apptainer images are stored:
See Environment variables for a complete list of environment variables.
How to manage environment variables
You may want to include these, or other environment variables, in your .bashrc file (or alternate) that is loaded when you log in so you don’t need to export variables every session.
Executing a pipeline¶
Nextflow seamlessly integrates with code repositories such as GitHub. This feature allows you to manage your project code and use public Nextflow pipelines quickly, consistently, and transparently.
The Nextflow pull command will download a pipeline from a hosting platform into your global cache $HOME/.nextflow/assets folder.
If you are pulling a project hosted in a remote code repository, you can specify its qualified name or the repository URL.
The qualified name is formed by two parts - the owner name and the repository name separated by a / character. For example, if a Nextflow project bar is hosted in a GitHub repository foo at the address http://github.com/foo/bar, it could be pulled using:
Or by using the complete URL:
Alternatively, the Nextflow clone command can be used to download a pipeline into a local directory of your choice:
The Nextflow run command is used to initiate the execution of a pipeline:
If you run a pipeline, it will look for a local file with the pipeline name you’ve specified. If that file does not exist, it will look for a public repository with the same name on GitHub (unless otherwise specified). If found, Nextflow will automatically pull the pipeline to your global cache and execute it.
Warning
Be aware of what is already in your current working directory where you launch your pipeline. If your current working directory contains Nextflow configuration files you may encounter unexpected results.
Exercise
Execute the hello pipeline directly from nextflow-io GitHub repository.
Solution
Use the run command to execute the nextflow-io/hello pipeline:
N E X T F L O W ~ version 25.10.4
Pulling nextflow-io/hello ...
downloaded from https://github.com/nextflow-io/hello.git
Launching `https://github.com/nextflow-io/hello` [silly_sax] DSL2 - revision: 1d71f857bb [master]
executor > local (4)
[e6/2132d2] process > sayHello (3) [100%] 4 of 4 ✔
Hola world!
Bonjour world!
Ciao world!
Hello world!
See run for more information about the Nextflow run command.
Understanding console outputs¶
When you run a Nextflow pipeline, a series of messages are printed to the terminal. The typical Nextflow output structure is:
- Runtime header:
N E X T F L O W ~ version 25.10.4 - Pipeline retrieval if the pipeline was fetched from a remote repository:
Pulling nextflow-io/hello ... - Launch summary:
Launchinghttps://github.com/nextflow-io/hello[silly_sax] DSL2 - revision: 1d71f857bb [master] - Executor summary:
executor > local (4) - Process execution table:
[e6/2132d2] process > sayHello (3) [100%] 4 of 4 ✔ - Process outputs:
Hola world!
This output summarises a lot of key information about a Nextflow run, including what and how many processes were run.
Executing a revision¶
When a Nextflow pipeline is created or updated using GitHub (or another code repository), a new revision is created. Each revision is identified by a unique git reference (branch, tag, or commit SHA), which can be used to track changes made to the pipeline and to ensure that the same version of the pipeline is used consistently across different runs.
The Nextflow info command can be used to view pipeline properties, such as the project name, repository, local path, main script, and revisions. The * indicates which revision of the pipeline is pinned and will be executed when using the run command.
It is recommended that you use the revision flag every time you execute a pipeline to ensure that the version is correct.
To use a specific revision, you simply need to add it to the command line with the -revision or -r flag. For example, to run a pipeline with the v1.0 revision, you would use the following:
Nextflow automatically provides built-in support for version control using Git. With this, users can easily manage and track changes made to a pipeline over time. A revision can be a git branch, tag or commit SHA number, and can be used interchangeably.
Exercise
Execute the hello pipeline directly from the nextflow-io GitHub using the v1.1 revision tag.
Solution
Use the nextflow run command to execute the nextflow-io/hello pipeline with the v1.1 revision tag:
N E X T F L O W ~ version 25.10.4
NOTE: Your local project version looks outdated - a different revision is available in the remote repository [3b355db864]
Nextflow DSL1 is no longer supported — Update your script to DSL2, or use Nextflow 22.10.x or earlier
This failure was expected!
As the error message reads, v1.1 of the hello pipeline was built on an older version of Nextflow.
But we can still run it by telling Nextflow what version to use.
Nextflow 25.10.4 is available - Please consider updating your version to it
N E X T F L O W ~ version 22.10.0
NOTE: Your local project version looks outdated - a different revision is available in the remote repository [3b355db864]
Launching `https://github.com/nextflow-io/hello` [nice_poitras] DSL1 - revision: baba3959d7 [v1.1]
WARN: The use of `echo` method has been deprecated
executor > local (4)
[16/d99458] process > sayHello (3) [100%] 4 of 4 ✔
Ciao world! (version 1.1)
Bojour world! (version 1.1)
Hola world! (version 1.1)
Hello world! (version 1.1)
Nextflow log¶
It is important to keep a record of the commands you have run to generate your results. Nextflow helps with this by creating and storing metadata and logs about the run in hidden files and folders in your current directory (unless otherwise specified). This data can be used by Nextflow to generate reports. It can also be queried using the Nextflow log command:
The log command has multiple options to facilitate the queries and is especially useful while debugging a pipeline and inspecting execution metadata. You can view all of the possible log options with -h flag:
To query a specific execution you can use the RUN NAME or a SESSION ID:
To get more information, you can use the -f option with named fields. For example:
There are many other fields you can query. You can view a full list of fields with the -l option:
Exercise
Use the log command to view the process, hash, and script fields for your tasks from your most recent Nextflow execution.
Solution
Use the log command to get a list of your recent executions:
TIMESTAMP DURATION RUN NAME STATUS REVISION ID SESSION ID COMMAND
2026-03-13 14:50:00 2.8s nice_poitras OK baba3959d7 cc3baf9c-4f20-4edf-bcd4-8b9704a39878 nextflow run nextflow-io/hello -r v1.1
Query the process, hash, and script using the -f option for the most recent run:
Execution cache and resume¶
Task execution caching is an essential feature of modern pipeline managers, and Nextflow provides an automated caching mechanism for every execution.
When using the Nextflow -resume option, successfully completed tasks from previous executions are skipped and the previously cached results are used in downstream tasks.
Nextflow's caching mechanism works by assigning a unique ID to each task. The task unique ID is generated as a 128-bit hash value composing the complete file path, file size, and last modified timestamp. These IDs are used to create a separate execution directory where the tasks are executed and the outputs are stored. Nextflow will take care of the inputs and outputs in these folders for you.
A multi-step pipeline is required to demonstrate cache and resume.
These concepts will be demonstrated using the nf-core demo pipeline as a part of the Getting started with nf-core section.
Listing and dropping cached pipelines¶
Over time, you might want to remove stored pipelines. Nextflow also has functionality to help you to view and remove pipelines that have been pulled locally.
The Nextflow list command prints the projects stored in your global cache folder ($HOME/.nextflow/assets). These are the pipelines that were pulled when you executed either of the Nextflow pull or run commands:
If you want to remove a pipeline from your cache you can remove it using the Nextflow drop command:
Exercise
View your cached pipelines with the Nextflow list command and remove the nextflow-io/hello pipeline with the drop command.
Key points
- Nextflow is a workflow orchestration engine that makes it easy to write data-intensive computational pipelines
- Environment variables can be used to control your Nextflow runtime and the underlying Java virtual machine
- Nextflow supports version control and has automatic integrations with online code repositories
- Nextflow will cache your runs and they can be resumed with the
-resumeoption - You can manage pipelines with Nextflow commands (e.g.,
pull,clone,list, anddrop)

