Appendix¶
Appendix 1 Shared configuration files¶
An includeConfig
statement in the nextflow.config
file is also used to include custom institutional profiles that have been submitted to the nf-core config repository. At run time, nf-core pipelines will fetch these configuration profiles from the nf-core config repository and make them available.
For shared resources such as an HPC cluster, you may consider developing a shared institutional profile. You can follow this tutorial for more help.
Appendix 2 Custom config file exercise¶
Exercise
Give the MultiQC report for the christopher-hakkaart/nf-core-demo
pipeline the name of your favorite food using the multiqc_title
parameter in a parameters file:
Solution
Create a custom .json
file that contains your favourite food, e.g., cheese:
Include the custom .json
file in your execution command with the -params-file
option:
nextflow run christopher-hakkaart/nf-core-demo -profile test,singularity -r main -params-file my_custom_params.json
Check that it has been applied:
Appendix 3 Configuration files¶
Configuration files are .config
files that can contain various pipeline properties. Custom paths passed in the command-line using the -c
option:
Multiple custom .config
files can be included at execution by separating them with a comma (,
).
Custom configuration files follow the same structure as the configuration file included in the pipeline directory.
Configuration properties are organised into scopes by dot prefixing the property names with a scope identifier or grouping the properties in the same scope using the curly brackets notation. For example:
Is equivalent to:
Scopes allow you to quickly configure settings required to deploy a pipeline on different infrastructure using different software management. For example, the executor
scope can be used to provide settings for the deployment of a pipeline on a HPC cluster. Similarly, the singularity
scope controls how Singularity containers are executed by Nextflow.
Multiple scopes can be included in the same .config
file using a mix of dot prefixes and curly brackets. A full list of scopes is described in detail here.
Exercise
Give the MultiQC report for the christopher-hakkaart/nf-core-demo
pipeline the name of your favorite color using the multiqc_title
parameter in a custom .config
file:
Solution
Create a custom .config
file that contains your favourite colour, e.g., blue:
Include the custom .config
file in your execution command with the -c
option:
nextflow run christopher-hakkaart/nf-core-demo -profile test,singularity -r main -resume -c custom.config
Check that it has been applied:
Why did this fail?
You can not use the params
scope in custom configuration files. Parameters can only be configured using the -params-file
option and the command line. While it parameter is listed as a parameter on the STDOUT
, it was not applied to the executed command:
The process
scope allows you to configure pipeline processes and is used extensively to define resources and additional arguments for modules.
By default, process resources are allocated in the conf/base.config
file using the withLabel
selector:
Similarly, the withName
selector enables the configuration of a process by name. By default, module parameters are defined in the conf/modules.config
file:
While some tool arguments are included as a part of a module. To make modules sharable across pipelines, most tool arguments are defined in the conf/modules.conf
file in the pipeline code under the ext.args
entry.
Importantly, having these arguments outside of the module also allows them to be customised at runtime.
For example, if you were trying to add arguments in the MULTIQC
process in the christopher-hakkaart/nf-core-demo
pipeline, you could use the process scope:
However, if a process is used multiple times in the same pipeline, an extended execution path of the module may be required to make it more specific:
process {
withName: "NFCORE_DEMO:DEMO:MULTIQC" {
ext.args = "<your custom parameter>"
}
}
The extended execution path is built from the pipelines, subworkflows, and module used to execute the process.
In the example above, the nf-core MULTIQC
module, was called by the DEMO
pipeline, which was called by the NFCORE_DEMO
pipeline in the main.nf
file.
How to build an extended execution path
It can be tricky to evaluate the path used to execute a module. If you are unsure of how to build the path you can copy it from the conf/modules.conf
file. How arguments are added to a process can also vary. Be vigilant.
Exercise
Create a new .config
file that uses the process
scope to overwrite the args
for the MULTIQC
process. Change the args
to your favourite month of the year, e.g, "--title \"october\""
.
In this example, the \
is used to escape the "
in the string. This is required to ensure the string is passed correctly to the MULTIQC
module.
Solution
Make a custom config file that uses the process
scope to replace the args
for the MULTIQC
process:
Execute your run command again with the custom configuration file:
nextflow run christopher-hakkaart/nf-core-demo -r main -profile test,singularity -resume -c custom.config
Check that it has been applied:
Exercise
Demonstrate the configuration hierarchy using the christopher-hakkaart/nf-core-demo
pipeline by adding a params file (-params-file
), and a command line flag (--multiqc_title
) to your execution. You can use the files you have already created. Make sure that the --multiqc_title
is different to the multiqc_title
in your params file and different to the title you have used in the past.
Solution
Use the .json
file you created previously:
Execute your command with your params file (-params-file
) and a command line flag (--multiqc_title
):
nextflow run christopher-hakkaart/nf-core-demo -r main -profile test,singularity -resume -params-file my_custom_params.json --multiqc_title "cake"
In this example, as the command line is at the top of the hierarchy, the multiqc_title
will be "cake".
Configuration files¶
Sometimes Sarek won't have a parameter that you need to customize the execution of a tool. In this situation, you will need to apply a configuration file to the pipeline.
As shown in the example from Session 1, you can selectively apply a configuration file to a process using the withName
directive.
Be specific with your selector
Remember to make your selector specific to the process you are trying to customise. It can be helpful to use the same selectors that are already included in the configuration file.
Sarek is a little different from other pipelines because it has multiple different config files that are used for different tools or groups of tools. This is stylistic and helps to keep the config files organised.
For example, there are a number of options that are applied when calling variants with FREEBAYES
and are all stored in a freebayes.config file together:
process {
withName: 'MERGE_FREEBAYES' {
ext.prefix = { "${meta.id}.freebayes" }
publishDir = [
mode: params.publish_dir_mode,
path: { "${params.outdir}/variant_calling/freebayes/${meta.id}/" },
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}
withName: 'FREEBAYES' {
ext.args = '--min-alternate-fraction 0.1 --min-mapping-quality 1'
//To make sure no naming conflicts ensure with module BCFTOOLS_SORT & the naming being correct in the output folder
ext.prefix = { meta.num_intervals <= 1 ? "${meta.id}" : "${meta.id}.${target_bed.simpleName}" }
ext.when = { params.tools && params.tools.split(',').contains('freebayes') }
publishDir = [
enabled: false
]
}
withName: 'BCFTOOLS_SORT' {
ext.prefix = { meta.num_intervals <= 1 ? meta.id + ".freebayes" : vcf.name - ".vcf" + ".sort" }
publishDir = [
mode: params.publish_dir_mode,
path: { "${params.outdir}/variant_calling/" },
pattern: "*vcf.gz",
saveAs: { meta.num_intervals > 1 ? null : "freebayes/${meta.id}/${it}" }
]
}
withName : 'TABIX_VC_FREEBAYES' {
publishDir = [
mode: params.publish_dir_mode,
path: { "${params.outdir}/variant_calling/freebayes/${meta.id}/" },
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}
// PAIR_VARIANT_CALLING
if (params.tools && params.tools.split(',').contains('freebayes')) {
withName: '.*:BAM_VARIANT_CALLING_SOMATIC_ALL:BAM_VARIANT_CALLING_FREEBAYES:FREEBAYES' {
ext.args = "--pooled-continuous \
--pooled-discrete \
--genotype-qualities \
--report-genotype-likelihood-max \
--allele-balance-priors-off \
--min-alternate-fraction 0.03 \
--min-repeat-entropy 1 \
--min-alternate-count 2 "
}
}
}
Each of these can be modified independently of the others and be applied using a custom configuration file.
Exercise
Create a custom configuration file that will modify the --min-alternate-fraction
parameter for FREEBAYES
to 0.05
and apply it to the pipeline.