Create and use a module

Creating a VDSL3 module is as simple as adding { type: nextflow } to the runners section in the Viash config. Luckily, our previous example already contained such an entry:

name: example_bash
description: A minimal example component.
arguments:
  - type: file
    name: --input
    example: file.txt
    required: true
  - type: file
    name: --output
    direction: output
    example: output.txt
    required: true
resources:
  - type: bash_script
    path: script.sh
engines:
  - type: docker
    image: "bash:4.0"
  - type: native
runners:
  - type: executable
  - type: nextflow

name: example_csharp
description: A minimal example component.
arguments:
  - type: file
    name: --input
    example: file.txt
    required: true
  - type: file
    name: --output
    direction: output
    example: output.txt
    required: true
resources:
  - type: csharp_script
    path: script.csx
engines:
  - type: docker
    image: "ghcr.io/data-intuitive/dotnet-script:1.3.1"
  - type: native
runners:
  - type: executable
  - type: nextflow

name: example_js
description: A minimal example component.
arguments:
  - type: file
    name: --input
    example: file.txt
    required: true
  - type: file
    name: --output
    direction: output
    example: output.txt
    required: true
resources:
  - type: javascript_script
    path: script.js
engines:
  - type: docker
    image: "node:19-bullseye-slim"
  - type: native
runners:
  - type: executable
  - type: nextflow

name: example_python
description: A minimal example component.
arguments:
  - type: file
    name: --input
    example: file.txt
    required: true
  - type: file
    name: --output
    direction: output
    example: output.txt
    required: true
resources:
  - type: python_script
    path: script.py
engines:
  - type: docker
    image: "python:3.10-slim"
  - type: native
runners:
  - type: executable
  - type: nextflow

name: example_r
description: A minimal example component.
arguments:
  - type: file
    name: --input
    example: file.txt
    required: true
  - type: file
    name: --output
    direction: output
    example: output.txt
    required: true
resources:
  - type: r_script
    path: script.R
engines:
  - type: docker
    image: "eddelbuettel/r2u:22.04"
  - type: native
runners:
  - type: executable
  - type: nextflow

name: example_scala
description: A minimal example component.
arguments:
  - type: file
    name: --input
    example: file.txt
    required: true
  - type: file
    name: --output
    direction: output
    example: output.txt
    required: true
resources:
  - type: scala_script
    path: script.scala
engines:
  - type: docker
    image: "sbtscala/scala-sbt:eclipse-temurin-19_36_1.7.2_2.13.10"
  - type: native
runners:
  - type: executable
  - type: nextflow

Generating a VDSL3 module

We will now turn the Viash component into a VDSL3 module. By default, the viash build command will select the first runner (executable) in the list of runners. To select the nextflow runner, use the --runner nextflow argument, or -r nextflow for short.

viash build config.vsh.yaml -o target --runner nextflow

This will generate a Nextflow module in the target/ directory:

tree target

target
├── main.nf
└── nextflow.config

1 directory, 2 files

This main.nf file is both a standalone Nextflow pipeline and a module which can be imported as part of another pipeline.

Tip

In larger projects it’s recommended to use the viash ns build command to build all of the components in one go. Give it a try!

Running a module as a standalone pipeline

Unlike typical Nextflow modules, VDSL3 modules can actually be used as a standalone pipeline.

To run a VDSL3 module as a standalone pipeline, you need to specify the input parameters and a --publish_dir parameter, as Nextflow will automatically choose the parameter names of the output files.

You can run the executable by providing a value for --input and --publish_dir:

nextflow run target/main.nf --input config.vsh.yaml --publish_dir output/

 N E X T F L O W   ~  version 24.10.6

Launching `target/main.nf` [prickly_bernard] DSL2 - revision: b0e719ee8a

[-        ] exa…essWf:example_bash_process -
[-        ] exa…sSimpleWf:publishFilesProc -
[-        ] exa…SimpleWf:publishStatesProc -

executor >  local (1)
[17/b96316] exa…example_bash_process (run) | 0 of 1
[-        ] exa…sSimpleWf:publishFilesProc -
[-        ] exa…SimpleWf:publishStatesProc -

executor >  local (2)
[17/b96316] exa…example_bash_process (run) | 1 of 1 ✔
[9b/67473d] exa…eWf:publishFilesProc (run) | 0 of 1
[-        ] exa…SimpleWf:publishStatesProc -

executor >  local (3)
[17/b96316] exa…example_bash_process (run) | 1 of 1 ✔
[9b/67473d] exa…eWf:publishFilesProc (run) | 1 of 1 ✔
[46/cb923a] exa…Wf:publishStatesProc (run) | 1 of 1 ✔

This results in the following output:

tree output

output
├── run.example_bash.output.txt
└── run.example_bash.state.yaml

1 directory, 2 files

The pipeline help can be shown by passing the --help parameter (Output not shown).

nextflow run target/main.nf --help

Passing a parameter list

Every VDSL3 can accept a list of parameters to populate a Nextflow channel with.

For example, we create a set of input files which we want to process in parallel.

touch sample1.txt sample2.txt sample3.txt sample4.txt

Next, we create a YAML file param_list.yaml containing an id and an input value for each parameter entry.

- id: sample1
  input: /tmp/RtmpicqNrl/create-a-moduled86c6877360b/bash/sample1.txt
- id: sample2
  input: /tmp/RtmpicqNrl/create-a-moduled86c6877360b/bash/sample2.txt
- id: sample3
  input: /tmp/RtmpicqNrl/create-a-moduled86c6877360b/bash/sample3.txt
- id: sample4
  input: /tmp/RtmpicqNrl/create-a-moduled86c6877360b/bash/sample4.txt

You can run the pipeline on the list of parameters using the --param_list parameter.

nextflow run target/main.nf --param_list param_list.yaml --publish_dir output2

 N E X T F L O W   ~  version 24.10.6

Launching `target/main.nf` [angry_newton] DSL2 - revision: b0e719ee8a

[-        ] exa…essWf:example_bash_process -
[-        ] exa…sSimpleWf:publishFilesProc -
[-        ] exa…SimpleWf:publishStatesProc -

executor >  local (4)
[92/52895a] exa…ple_bash_process (sample2) | 0 of 4
[-        ] exa…sSimpleWf:publishFilesProc -
[-        ] exa…SimpleWf:publishStatesProc -

executor >  local (9)
[4a/f5bb8c] exa…ple_bash_process (sample3) | 4 of 4 ✔
[29/dfdc56] exa…publishFilesProc (sample1) | 0 of 4
[32/7e0e4d] exa…ublishStatesProc (sample2) | 0 of 4

executor >  local (12)
[4a/f5bb8c] exa…ple_bash_process (sample3) | 4 of 4 ✔
[12/554813] exa…publishFilesProc (sample3) | 4 of 4 ✔
[0b/43e084] exa…ublishStatesProc (sample4) | 4 of 4 ✔

This results in the following outputs:

tree output2

output2
├── sample1.example_bash.output.txt
├── sample1.example_bash.state.yaml
├── sample2.example_bash.output.txt
├── sample2.example_bash.state.yaml
├── sample3.example_bash.output.txt
├── sample3.example_bash.state.yaml
├── sample4.example_bash.output.txt
└── sample4.example_bash.state.yaml

1 directory, 8 files

Tip

Instead of a YAML, you can also pass a JSON or a CSV to the --param_list parameter.

Module as part of a pipeline

This module can also be used as part of a Nextflow pipeline. Below is a short preview of what this looks like.

include { mymodule1 } from 'target/nextflow/mymodule1/main.nf'
include { mymodule2 } from 'target/nextflow/mymodule2/main.nf'

workflow {
  Channel.fromList([
    [
      // a unique identifier for this tuple
      "myid", 
      // the state for this tuple
      [
        input: file("in.txt"),
        module1_k: 10,
        module2_k: 4
      ]
    ]
  ])
    | mymodule1.run(
      // use a hashmap to define which part of the state is used to run mymodule1
      fromState: [
        input: "input",
        k: "module1_k"
      ],
      // use a hashmap to define how the output of mymodule1 is stored back into the state
      toState: [
        module1_output: "output"
      ]
    )
    | mymodule2.run(
      // use a closure to define which data is used to run mymodule2
      fromState: { id, state -> 
        [
          input: state.module1_output,
          k: state.module2_k
        ]
      },
      // use a closure to return only the output of module2 as a new state
      toState: { id, output, state ->
        output
      },
      auto: [
        publish: true
      ]
    )
}

We will discuss building pipelines with VDSL3 modules in more detail in Create a pipeline.