Nextflow Directives
Directives are optional settings that affect the execution of the process.
Example:
directives:
container: rocker/r-ver:4.1
label: highcpu
cpus: 4
memory: 16 GB
accelerator
Type: Map of String to String
Default: Empty
The accelerator
directive allows you to specify the hardware accelerator requirement for the task execution e.g. GPU processor.
Viash implements this directive as a map with accepted keywords: type
, limit
, request
, and runtime
.
See accelerator
.
Example:
[ limit: 4, type: "nvidia-tesla-k80" ]
afterScript
Type: String
Default: Empty
The afterScript
directive allows you to execute a custom (Bash) snippet immediately after the main process has run. This may be useful to clean up your staging area.
See afterScript
.
Example:
source /cluster/bin/cleanup
beforeScript
Type: String
Default: Empty
The beforeScript
directive allows you to execute a custom (Bash) snippet before the main process script is run. This may be useful to initialise the underlying cluster environment or for other custom initialisation.
See beforeScript
.
Example:
source /cluster/bin/setup
cache
Type: Either Boolean or String
Default: Empty
The cache
directive allows you to store the process results to a local cache. When the cache is enabled and the pipeline is launched with the resume option, any following attempt to execute the process, along with the same inputs, will cause the process execution to be skipped, producing the stored data as the actual results.
The caching feature generates a unique key by indexing the process script and inputs. This key is used to identify univocally the outputs produced by the process execution.
The cache
is enabled by default, you can disable it for a specific process by setting the cache directive to false
.
Accepted values are: true
, false
, "deep"
, and "lenient"
.
See cache
.
Examples:
true
false
"deep"
"lenient"
conda
Type: String
/ List of String
Default: Empty
The conda
directive allows for the definition of the process dependencies using the Conda package manager.
Nextflow automatically sets up an environment for the given package names listed by in the conda
directive.
See conda
.
Examples:
"bwa=0.7.15"
"bwa=0.7.15 fastqc=0.11.5"
["bwa=0.7.15", "fastqc=0.11.5"]
container
Type: Either Map of String to String or String
Default: Empty
The container
directive allows you to execute the process script in a Docker container.
It requires the Docker daemon to be running in machine where the pipeline is executed, i.e. the local machine when using the local executor or the cluster nodes when the pipeline is deployed through a grid executor.
Viash implements allows either a string value or a map. In case a map is used, the allowed keys are: registry
, image
, and tag
. The image
value must be specified.
See container
.
Examples:
"foo/bar:tag"
This is transformed to "reg/im:ta"
:
[ registry: "reg", image: "im", tag: "ta" ]
This is transformed to "im:latest"
:
[ image: "im" ]
containerOptions
Type: String
/ List of String
Default: Empty
The containerOptions
directive allows you to specify any container execution option supported by the underlying container engine (ie. Docker, Singularity, etc). This can be useful to provide container settings only for a specific process e.g. mount a custom path.
See containerOptions
.
Examples:
"--foo bar"
["--foo bar", "-f b"]
cpus
Type: Either Int or String
Default: Empty
The cpus
directive allows you to define the number of (logical) CPU required by the process’ task.
See cpus
.
Examples:
1
10
disk
Type: String
Default: Empty
The disk
directive allows you to define how much local disk storage the process is allowed to use.
See disk
.
Examples:
"1 GB"
"2TB"
"3.2KB"
"10.B"
echo
Type: Either Boolean or String
Default: Empty
By default the stdout produced by the commands executed in all processes is ignored. By setting the echo
directive to true, you can forward the process stdout to the current top running process stdout file, showing it in the shell terminal.
See echo
.
Examples:
true
false
errorStrategy
Type: String
Default: Empty
The errorStrategy
directive allows you to define how an error condition is managed by the process. By default when an error status is returned by the executed script, the process stops immediately. This in turn forces the entire pipeline to terminate.
Table of available error strategies: | Name | Executor | |——|———-| | terminate
| Terminates the execution as soon as an error condition is reported. Pending jobs are killed (default) | | finish
| Initiates an orderly pipeline shutdown when an error condition is raised, waiting the completion of any submitted job. | | ignore
| Ignores processes execution errors. | | retry
| Re-submit for execution a process returning an error condition. |
See errorStrategy
.
Examples:
"terminate"
"finish"
executor
Type: String
Default: Empty
The executor
defines the underlying system where processes are executed. By default a process uses the executor defined globally in the nextflow.config file.
The executor
directive allows you to configure what executor has to be used by the process, overriding the default configuration. The following values can be used:
Name | Executor |
---|---|
awsbatch | The process is executed using the AWS Batch service. |
azurebatch | The process is executed using the Azure Batch service. |
condor | The process is executed using the HTCondor job scheduler. |
google-lifesciences | The process is executed using the Google Genomics Pipelines service. |
ignite | The process is executed using the Apache Ignite cluster. |
k8s | The process is executed using the Kubernetes cluster. |
local | The process is executed in the computer where Nextflow is launched. |
lsf | The process is executed using the Platform LSF job scheduler. |
moab | The process is executed using the Moab job scheduler. |
nqsii | The process is executed using the NQSII job scheduler. |
oge | Alias for the sge executor. |
pbs | The process is executed using the PBS/Torque job scheduler. |
pbspro | The process is executed using the PBS Pro job scheduler. |
sge | The process is executed using the Sun Grid Engine / Open Grid Engine. |
slurm | The process is executed using the SLURM job scheduler. |
tes | The process is executed using the GA4GH TES service. |
uge | Alias for the sge executor. |
See executor
.
Examples:
"local"
"sge"
label
Type: String
/ List of String
Default: Empty
The label
directive allows the annotation of processes with mnemonic identifier of your choice.
See label
.
Examples:
"big_mem"
"big_cpu"
["big_mem", "big_cpu"]
machineType
Type: String
Default: Empty
The machineType
can be used to specify a predefined Google Compute Platform machine type when running using the Google Life Sciences executor.
See machineType
.
Example:
"n1-highmem-8"
maxErrors
Type: Either String or Int
Default: Empty
The maxErrors
directive allows you to specify the maximum number of times a process can fail when using the retry
error strategy. By default this directive is disabled.
See maxErrors
.
Examples:
1
3
maxForks
Type: Either String or Int
Default: Empty
The maxForks
directive allows you to define the maximum number of process instances that can be executed in parallel. By default this value is equals to the number of CPU cores available minus 1.
If you want to execute a process in a sequential manner, set this directive to one.
See maxForks
.
Examples:
1
3
maxRetries
Type: Either String or Int
Default: Empty
The maxRetries
directive allows you to define the maximum number of times a process instance can be re-submitted in case of failure. This value is applied only when using the retry error strategy. By default only one retry is allowed.
See maxRetries
.
Examples:
1
3
memory
Type: String
Default: Empty
The memory
directive allows you to define how much memory the process is allowed to use.
See memory
.
Examples:
"1 GB"
"2TB"
"3.2KB"
"10.B"
module
Type: String
/ List of String
Default: Empty
Environment Modules is a package manager that allows you to dynamically configure your execution environment and easily switch between multiple versions of the same software tool.
If it is available in your system you can use it with Nextflow in order to configure the processes execution environment in your pipeline.
In a process definition you can use the module
directive to load a specific module version to be used in the process execution environment.
See module
.
Examples:
"ncbi-blast/2.2.27"
"ncbi-blast/2.2.27:t_coffee/10.0"
["ncbi-blast/2.2.27", "t_coffee/10.0"]
penv
Type: String
Default: Empty
The penv
directive allows you to define the parallel environment to be used when submitting a parallel task to the SGE resource manager.
See penv
.
Example:
"smp"
pod
Type: Map of String to String
/ List of Map of String to String
Default: Empty
The pod
directive allows the definition of pods specific settings, such as environment variables, secrets and config maps when using the Kubernetes executor.
See pod
.
Examples:
[ label: "key", value: "val" ]
[ annotation: "key", value: "val" ]
[ env: "key", value: "val" ]
[ [label: "l", value: "v"], [env: "e", value: "v"]]
publishDir
Type: Either String or Map of String to String
/ List of Either String or Map of String to String
Default: Empty
The publishDir
directive allows you to publish the process output files to a specified folder.
Viash implements this directive as a plain string or a map. The allowed keywords for the map are: path
, mode
, overwrite
, pattern
, saveAs
, enabled
. The path
key and value are required. The allowed values for mode
are: symlink
, rellink
, link
, copy
, copyNoFollow
, move
.
See publishDir
.
Examples:
[]
[ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
This is transformed to [[ path: "/path/to/dir" ]]
:
"/path/to/dir"
This is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
:
[ path: "/path/to/dir", mode: "cache" ]
queue
Type: String
/ List of String
Default: Empty
The queue
directory allows you to set the queue where jobs are scheduled when using a grid based executor in your pipeline.
See queue
.
Examples:
"long"
"short,long"
["short", "long"]
scratch
Type: Either Boolean or String
Default: Empty
The scratch
directive allows you to execute the process in a temporary folder that is local to the execution node.
See scratch
.
Examples:
true
"/path/to/scratch"
'$MY_PATH_TO_SCRATCH'
"ram-disk"
stageInMode
Type: String
Default: Empty
The stageInMode
directive defines how input files are staged-in to the process work directory. The following values are allowed:
Value | Description |
---|---|
copy | Input files are staged in the process work directory by creating a copy. |
link | Input files are staged in the process work directory by creating an (hard) link for each of them. |
symlink | Input files are staged in the process work directory by creating a symbolic link with an absolute path for each of them (default). |
rellink | Input files are staged in the process work directory by creating a symbolic link with a relative path for each of them. |
See stageInMode
.
Examples:
"copy"
"link"
stageOutMode
Type: String
Default: Empty
The stageOutMode
directive defines how output files are staged-out from the scratch directory to the process work directory. The following values are allowed:
Value | Description |
---|---|
copy | Output files are copied from the scratch directory to the work directory. |
move | Output files are moved from the scratch directory to the work directory. |
rsync | Output files are copied from the scratch directory to the work directory by using the rsync utility. |
See stageOutMode
.
Examples:
"copy"
"link"
storeDir
Type: String
Default: Empty
The storeDir
directive allows you to define a directory that is used as a permanent cache for your process results.
See storeDir
.
Example:
"/path/to/storeDir"
tag
Type: String
Default: '$id'
The tag
directive allows you to associate each process execution with a custom label, so that it will be easier to identify them in the log file or in the trace execution report.
For ease of use, the default tag is set to "$id"
, which allows tracking the progression of the channel events through the workflow more easily.
See tag
.
Example:
"foo"
time
Type: String
Default: Empty
The time
directive allows you to define how long a process is allowed to run.
See time
.
Examples:
"1h"
"2days"
"1day 6hours 3minutes 30seconds"