Skip to content

Getting started with Nextflow

Objectives

  • Describe the core features of Nextflow.
  • Define Nextflow terminology.
  • Use the fundamental commands and options for executing pipelines.

What is Nextflow?

Nextflow logo Nextflow logo

Nextflow is a workflow orchestration engine that makes it easy to write data-intensive computational pipelines.

It is designed around the idea that the Linux platform is the lingua franca of data science. Linux provides many simple but powerful command-line and scripting tools that, when chained together, facilitate complex data manipulations.

Nextflow extends this approach, adding the ability to define complex program interactions and a high-level parallel computational environment based on the dataflow programming model.

Nextflow’s core features are:

  • Pipeline portability and reproducibility
  • Scalability of parallelization and deployment
  • Integration of existing tools, systems, and industry standards

Whether you are working with genomics data or other large and complex data sets, Nextflow can help you to streamline your pipeline and improve your productivity.

Processes and dataflow

In Nextflow, workflows, processes, and dataflow logic are the fundamental building blocks of a pipeline.

A workflow is a specialized function for composing processes and dataflow logic. Workflows connect process inputs and outputs through dataflow logic, defining how data moves through the pipeline. An entry workflow serves as the pipeline’s entry point, while named workflows can be reused and called by other workflows, enabling modular pipeline design.

A process is a unit of execution that represents a single computational step in a pipeline. Each process specifies its inputs and outputs, as well as any directives and conditional statements required for its execution. Processes can be written in any scripting language that can be executed by the Linux platform, such as Bash, Python, Perl, Ruby, or R.

At runtime, each process is invoked as one or more tasks that execute independently. They do not share writable state. Each task consumes an input value, runs the process script, and emits an output value that downstream processes can use. Tasks can run in parallel, making efficient use of available compute resources.

Processes can be parameterised to allow for flexibility and reuse within and across pipelines. Pipeline-level parameters (params) can be passed into processes at runtime to control behaviour, such as specifying input files, output paths, or tool-specific settings.

Dataflow logic defines how data flows between processes through two types of asynchronous dataflow structures:

  • A dataflow channel (or simply channel) is an asynchronous sequence of values used to pass data between processes.
  • A dataflow value is a single asynchronous value, typically used for inputs shared across all tasks (e.g., a reference genome).

The data dependencies between processes implicitly determine the order of execution, meaning processes run based on their input-output relationships rather than the order they appear in the pipeline script.

Execution abstraction

While a process defines what command or script is executed, the executor determines how and where the script is executed.

Nextflow provides an abstraction between the pipeline’s functional logic and the underlying execution system. This means a pipeline can be written once and run on your local machine, an HPC cluster, or a cloud platform without any modification. Only the target executor needs to be defined in the configuration file.

By default, Nextflow executes processes on the local machine, which is useful for development and testing. For production workloads, Nextflow supports major HPC batch schedulers (e.g., SLURM, PBS, Open Grid Engine) and cloud platforms (e.g., AWS, Google Cloud, Azure, Kubernetes).

See Executors for a full list of Nextflow executors.

Installing Nextflow

Nextflow is a Groovy-based workflow language (Groovy is a superset of Java) that runs on any POSIX-compatible system (Linux, macOS, WSL on Windows). It requires Bash 3.2 or later and Java 17 or later, and is distributed as a self-installing package. No special installation procedure is required.

Process scripts can be written in any Linux-compatible language (Bash, Python, Perl, Ruby, R, etc.), so you can reuse existing programming knowledge without a steep learning curve.

For today's workshop, Nextflow is already installed on the system we will be using, so no additional steps are needed.

How to install Nextflow locally

  1. Download the executable package using either wget -qO- https://get.nextflow.io | bash or curl -s https://get.nextflow.io | bash
  2. Make the binary executable on your system by running chmod +x nextflow
  3. Move the nextflow file to a directory accessible by your $PATH variable, e.g, mv nextflow ~/bin/

How to load Nextflow on Mahuika

  1. Check available Nextflow versions: module avail Nextflow
  2. Load Nextflow version of your choice: module load Nextflow/<version>

Nextflow options and commands

The Nextflow CLI is structured as a set of top-level options and commands.

List them with the -h flag:

nextflow -h
Output
Usage: nextflow [options] COMMAND [arg...]

Options:
  -C
     Use the specified configuration file(s) overriding any defaults
  -D
     Set JVM properties
  -bg
     Execute nextflow in background
  -c, -config
     Add the specified file to configuration set
  -config-ignore-includes
     Disable the parsing of config includes
  -h
     Print this help
  -log
     Set nextflow log file path
  -q, -quiet
     Do not print information messages
  -remote-debug
     Enable JVM interactive remote debugging (experimental)
  -syslog
     Send logs to syslog server (eg. localhost:514)
  -trace
     Enable trace level logging for the specified package name - multiple packages can be provided separating them with a comma e.g. '-trace nextflow,io.seqera'
  -v, -version
     Print the program version

Commands:
  auth          Manage Seqera Platform authentication
  clean         Clean up project cache and work directories
  clone         Clone a project into a folder
  config        Print a project configuration
  console       Launch Nextflow interactive console
  drop          Delete the local copy of a project
  fs            Perform filesystem operations
  help          Print the usage help for a command
  info          Print project and system runtime information
  inspect       Inspect process settings in a pipeline project
  kuberun       Execute a workflow in a Kubernetes cluster (experimental)
  launch        Launch a workflow in Seqera Platform
  lineage       Explore workflows lineage metadata
  lint          Lint Nextflow scripts and config files
  list          List all downloaded projects
  log           Print executions log and runtime info
  plugin        Execute plugin-specific commands
  pull          Download or update a project
  run           Execute a pipeline project
  secrets       Manage pipeline secrets
  self-update   Update nextflow runtime to the latest available version
  view          View project script file(s)

Options for a command can also be viewed by appending the -help option to a Nextflow command.

For example, you can view options for the run command:

nextflow run -help
Output
Execute a pipeline project
Usage: run [options] Project name or repository url
  Options:
    -E
       Exports all current system environment
       Default: false
    -ansi-log
       Enable/disable ANSI console logging
    -bucket-dir
       Remote bucket where intermediate result files are stored
    -cache
       Enable/disable processes caching
    -d, -deep
       Create a shallow clone of the specified depth
    -disable-jobs-cancellation
       Prevent the cancellation of child jobs on execution termination
    -dump-channels
       Dump channels for debugging purpose
    -dump-hashes
       Dump task hash keys for debugging purpose
    -e.
       Add the specified variable to execution environment
       Syntax: -e.key=value
       Default: {}
    -entry
       Entry workflow name to be executed
    -h, -help
       Print the command usage
       Default: false
    -hub
       Service hub where the project is hosted
    -latest
       Pull latest changes before run
       Default: false
    -lib
       Library extension path
    -main-script
       The script file to be executed when launching a project directory or
       repository
    -name
       Assign a mnemonic name to the a pipeline run
    -offline
       Do not check for remote project updates
       Default: false
    -o, -output-dir
       Directory where workflow outputs are stored
    -params-file
       Load script parameters from a JSON/YAML file
    -plugins
       Specify the plugins to be applied for this run e.g. nf-amazon,nf-tower
    -preview
       Run the workflow script skipping the execution of all processes
       Default: false
    -process.
       Set process options
       Syntax: -process.key=value
       Default: {}
    -profile
       Choose a configuration profile
    -qs, -queue-size
       Max number of processes that can be executed in parallel by each executor
    -resume
       Execute the script using the cached results, useful to continue
       executions that was stopped by an error
    -r, -revision
       Revision of the project to run (either a git branch, tag or commit SHA
       number)
    -stub-run, -stub
       Execute the workflow replacing process scripts with command stubs
       Default: false
    -test
       Test a script function with the name specified
    -user
       Private repository user name
    -with-apptainer
       Enable process execution in a Apptainer container
    -with-charliecloud
       Enable process execution in a Charliecloud container runtime
    -with-cloudcache
       Enable the use of object storage bucket as storage for cache meta-data
    -with-conda
       Use the specified Conda environment package or file (must end with
       .yml|.yaml suffix)
    -with-dag
       Create pipeline DAG file
    -with-docker
       Enable process execution in a Docker container
    -N, -with-notification
       Send a notification email on workflow completion to the specified
       recipients
    -with-podman
       Enable process execution in a Podman container
    -with-report
       Create processes execution html report
    -with-APPTAINER
       Enable process execution in a APPTAINER container
    -with-spack
       Use the specified Spack environment package or file (must end with .yaml
       suffix)
    -with-timeline
       Create processes execution timeline file
    -with-tower
       Monitor workflow execution with Seqera Platform (formerly Tower Cloud)
    -with-trace
       Create processes execution tracing file
    -with-weblog
       Send workflow status messages via HTTP to target URL
    -without-conda
       Disable the use of Conda environments
    -without-docker
       Disable process execution with Docker
       Default: false
    -without-podman
       Disable process execution in a Podman container
    -without-spack
       Disable the use of Spack environments
    -w, -work-dir
       Directory where intermediate result files are stored

Exercise

Use the help command to find the version command. Then, use the version command to find out which version of Nextflow you are using.

Solution

Find out which version of Nextflow you are using by executing:

nextflow -version

Your output should look similar to the following:

Output
N E X T F L O W
version 25.10.4 build 11173
created 10-02-2026 15:17 UTC (11-02-2026 04:17 NZDT)
cite doi:10.1038/nbt.3820
http://nextflow.io

Managing your environment

You can use environment variables to control the Nextflow runtime and the underlying Java virtual machine. These variables can be exported before running a pipeline and will be interpreted by Nextflow.

For most users, Nextflow will work without setting any environment variables. However, to improve reproducibility and to optimise your resources, you will benefit from setting some of these variables.

For example, for consistency, it is good practice to pin the version of Nextflow you are using with the NXF_VER variable:

export NXF_VER=<version number>

Exercise

Pin the version of Nextflow you are using to 25.04.4 by exporting an environment variable:

Solution

Export the Nextflow version using the NXF_VER environment variable:

export NXF_VER=25.04.4

Check that the NXF_VER has been applied:

nextflow -version

You should see nextflow update and print the following:

Output
N E X T F L O W
version 25.04.4 build 5957
created 01-04-2025 21:09 UTC (02-04-2025 09:09 NZDT)
cite doi:10.1038/nbt.3820
http://nextflow.io

In addition to changing the version at the system level, you can set the Nextflow version for a single command:

NXF_VER=24.10.5 nextflow -version
Output
N E X T F L O W
version 24.10.5 build 5935
created 04-03-2025 17:55 UTC (05-03-2025 06:55 NZDT)
cite doi:10.1038/nbt.3820
http://nextflow.io

Environment variables on Mahuika

The behaviour of Nextflow environment variables won't work as expected if using a Mahuika Nextflow module. If you want to use a different Nextflow version on Mahuika you will need to reload the Nextflow module. To change to version 25.10.0 you could run:

module purge
module load Nextflow/25.10.0

Similarly, if you are using a shared resource, you may also consider including paths to where software is stored and can be accessed using the NXF_APPTAINER_CACHEDIR or the NXF_CONDA_CACHEDIR variables:

export NXF_APPTAINER_CACHEDIR=<custom/path/to/apptainer/cache>

Exercise

Export the folder ~/.apptainer_cache as the folder where remote Apptainer images are stored:

Solution

Export the Apptainer cache using the NXF_APPTAINER_CACHEDIR environment variable:

export NXF_APPTAINER_CACHEDIR=~/.apptainer_cache

Check that the NXF_APPTAINER_CACHEDIR has been exported:

echo $NXF_APPTAINER_CACHEDIR

See Environment variables for a complete list of environment variables.

How to manage environment variables

You may want to include these, or other environment variables, in your .bashrc file (or alternate) that is loaded when you log in so you don’t need to export variables every session.

Executing a pipeline

Nextflow seamlessly integrates with code repositories such as GitHub. This feature allows you to manage your project code and use public Nextflow pipelines quickly, consistently, and transparently.

The Nextflow pull command will download a pipeline from a hosting platform into your global cache $HOME/.nextflow/assets folder.

If you are pulling a project hosted in a remote code repository, you can specify its qualified name or the repository URL.

The qualified name is formed by two parts - the owner name and the repository name separated by a / character. For example, if a Nextflow project bar is hosted in a GitHub repository foo at the address http://github.com/foo/bar, it could be pulled using:

nextflow pull foo/bar

Or by using the complete URL:

nextflow pull http://github.com/foo/bar

Alternatively, the Nextflow clone command can be used to download a pipeline into a local directory of your choice:

nextflow clone foo/bar <your/path>

The Nextflow run command is used to initiate the execution of a pipeline:

nextflow run foo/bar

If you run a pipeline, it will look for a local file with the pipeline name you’ve specified. If that file does not exist, it will look for a public repository with the same name on GitHub (unless otherwise specified). If found, Nextflow will automatically pull the pipeline to your global cache and execute it.

Warning

Be aware of what is already in your current working directory where you launch your pipeline. If your current working directory contains Nextflow configuration files you may encounter unexpected results.

Exercise

Execute the hello pipeline directly from nextflow-io GitHub repository.

Solution

Use the run command to execute the nextflow-io/hello pipeline:

nextflow run nextflow-io/hello
Output
N E X T F L O W  ~  version 25.10.4
Pulling nextflow-io/hello ...
downloaded from https://github.com/nextflow-io/hello.git
Launching `https://github.com/nextflow-io/hello` [silly_sax] DSL2 - revision: 1d71f857bb [master]
executor >  local (4)
[e6/2132d2] process > sayHello (3) [100%] 4 of 4 ✔
Hola world!

Bonjour world!

Ciao world!

Hello world!

See run for more information about the Nextflow run command.

Understanding console outputs

When you run a Nextflow pipeline, a series of messages are printed to the terminal. The typical Nextflow output structure is:

  • Runtime header: N E X T F L O W ~ version 25.10.4
  • Pipeline retrieval if the pipeline was fetched from a remote repository: Pulling nextflow-io/hello ...
  • Launch summary: Launchinghttps://github.com/nextflow-io/hello[silly_sax] DSL2 - revision: 1d71f857bb [master]
  • Executor summary: executor > local (4)
  • Process execution table: [e6/2132d2] process > sayHello (3) [100%] 4 of 4 ✔
  • Process outputs: Hola world!

This output summarises a lot of key information about a Nextflow run, including what and how many processes were run.

Executing a revision

When a Nextflow pipeline is created or updated using GitHub (or another code repository), a new revision is created. Each revision is identified by a unique git reference (branch, tag, or commit SHA), which can be used to track changes made to the pipeline and to ensure that the same version of the pipeline is used consistently across different runs.

The Nextflow info command can be used to view pipeline properties, such as the project name, repository, local path, main script, and revisions. The * indicates which revision of the pipeline is pinned and will be executed when using the run command.

nextflow info <pipeline>

It is recommended that you use the revision flag every time you execute a pipeline to ensure that the version is correct.

To use a specific revision, you simply need to add it to the command line with the -revision or -r flag. For example, to run a pipeline with the v1.0 revision, you would use the following:

nextflow run <pipeline> -r v1.0

Nextflow automatically provides built-in support for version control using Git. With this, users can easily manage and track changes made to a pipeline over time. A revision can be a git branch, tag or commit SHA number, and can be used interchangeably.

Exercise

Execute the hello pipeline directly from the nextflow-io GitHub using the v1.1 revision tag.

Solution

Use the nextflow run command to execute the nextflow-io/hello pipeline with the v1.1 revision tag:

nextflow run nextflow-io/hello -r v1.1
Output
N E X T F L O W   ~  version 25.10.4

NOTE: Your local project version looks outdated - a different revision is available in the remote repository [3b355db864]
Nextflow DSL1 is no longer supported — Update your script to DSL2, or use Nextflow 22.10.x or earlier

This failure was expected! As the error message reads, v1.1 of the hello pipeline was built on an older version of Nextflow. But we can still run it by telling Nextflow what version to use.

NXF_VER=22.10.0 nextflow run nextflow-io/hello -r v1.1
Output
Nextflow 25.10.4 is available - Please consider updating your version to it
N E X T F L O W  ~  version 22.10.0
NOTE: Your local project version looks outdated - a different revision is available in the remote repository [3b355db864]
Launching `https://github.com/nextflow-io/hello` [nice_poitras] DSL1 - revision: baba3959d7 [v1.1]
WARN: The use of `echo` method has been deprecated
executor >  local (4)
[16/d99458] process > sayHello (3) [100%] 4 of 4 ✔
Ciao world! (version 1.1)

Bojour world! (version 1.1)

Hola world! (version 1.1)

Hello world! (version 1.1)

Nextflow log

It is important to keep a record of the commands you have run to generate your results. Nextflow helps with this by creating and storing metadata and logs about the run in hidden files and folders in your current directory (unless otherwise specified). This data can be used by Nextflow to generate reports. It can also be queried using the Nextflow log command:

nextflow log

The log command has multiple options to facilitate the queries and is especially useful while debugging a pipeline and inspecting execution metadata. You can view all of the possible log options with -h flag:

nextflow log -h

To query a specific execution you can use the RUN NAME or a SESSION ID:

nextflow log <run_name>

To get more information, you can use the -f option with named fields. For example:

nextflow log <run_name> -f process,hash,duration

There are many other fields you can query. You can view a full list of fields with the -l option:

nextflow log -l

Exercise

Use the log command to view the process, hash, and script fields for your tasks from your most recent Nextflow execution.

Solution

Use the log command to get a list of your recent executions:

nextflow log
Output
TIMESTAMP               DURATION        RUN NAME                STATUS  REVISION ID     SESSION ID                              COMMAND
2026-03-13 14:50:00     2.8s            nice_poitras            OK      baba3959d7      cc3baf9c-4f20-4edf-bcd4-8b9704a39878    nextflow run nextflow-io/hello -r v1.1

Query the process, hash, and script using the -f option for the most recent run:

nextflow log nice_poitras -f process,hash,script
Output
sayHello        e6/54b10e
    echo 'Ciao world! (version 1.1)'

sayHello        28/384df3
    echo 'Bojour world! (version 1.1)'

sayHello        6a/47b87f
    echo 'Hola world! (version 1.1)'

sayHello        16/d99458
    echo 'Hello world! (version 1.1)'

Execution cache and resume

Task execution caching is an essential feature of modern pipeline managers, and Nextflow provides an automated caching mechanism for every execution.

When using the Nextflow -resume option, successfully completed tasks from previous executions are skipped and the previously cached results are used in downstream tasks.

Nextflow's caching mechanism works by assigning a unique ID to each task. The task unique ID is generated as a 128-bit hash value composing the complete file path, file size, and last modified timestamp. These IDs are used to create a separate execution directory where the tasks are executed and the outputs are stored. Nextflow will take care of the inputs and outputs in these folders for you.

A multi-step pipeline is required to demonstrate cache and resume.

These concepts will be demonstrated using the nf-core demo pipeline as a part of the Getting started with nf-core section.

Listing and dropping cached pipelines

Over time, you might want to remove stored pipelines. Nextflow also has functionality to help you to view and remove pipelines that have been pulled locally.

The Nextflow list command prints the projects stored in your global cache folder ($HOME/.nextflow/assets). These are the pipelines that were pulled when you executed either of the Nextflow pull or run commands:

nextflow list

If you want to remove a pipeline from your cache you can remove it using the Nextflow drop command:

nextflow drop <pipeline>

Exercise

View your cached pipelines with the Nextflow list command and remove the nextflow-io/hello pipeline with the drop command.

Solution

List your pipeline assets:

nextflow list

Drop the nextflow-io/hello pipeline:

nextflow drop nextflow-io/hello

Check it has been removed:

nextflow list

Key points

  • Nextflow is a workflow orchestration engine that makes it easy to write data-intensive computational pipelines
  • Environment variables can be used to control your Nextflow runtime and the underlying Java virtual machine
  • Nextflow supports version control and has automatic integrations with online code repositories
  • Nextflow will cache your runs and they can be resumed with the -resume option
  • You can manage pipelines with Nextflow commands (e.g., pull, clone, list, and drop)