[go: nahoru, domu]

BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News WDL 1.2.0: Enhancing Workflow Description Language for Bioinformatics

WDL 1.2.0: Enhancing Workflow Description Language for Bioinformatics

The Workflow Description Language (WDL) team has announced the release of WDL 1.2.0, a significant update to improve workflow descriptions' flexibility and usability in bioinformatics. This new version introduces several key features and enhancements that promise to streamline workflow management and execution, making it easier for developers and researchers to implement and manage complex bioinformatics workflows.

The Workflow Description Language (WDL) is an open standard specification for describing data processing workflows with a human-readable and writeable syntax. WDL makes defining analysis tasks straightforward, connecting them in workflows and parallelizing their execution. The language strives to be accessible and understandable to all users, including programmers, analysts, and production system operators.

One of the key improvements in WDL 1.2.0 is the introduction of the Directory type. This new type allows workflows to handle directories more effectively, enabling users to pass directories between tasks, which simplifies the management of grouped data files. Consider the following example:

version 1.2

task directory_example {

    input {
        Directory dir
    }

    command <<<
    for file in ~{dir}; do
      cat $file
    done
    >>>

    output {
        File concatenated = stdout()
    }
}

Another noteworthy enhancement is the ability to declare inputs as environment variables. This feature allows tasks to access environment variables directly, making it easier to manage configuration settings and sensitive information without hardcoding them into the workflow scripts.

Additionally, WDL 1.2.0 introduces a new requirements and hints section. These sections provide a standardized way to specify the computational requirements and optional hints for workflow execution. This improvement helps optimize workflow performance and resource allocation, ensuring that tasks run efficiently across different environments. Consider the following example:
 

version 1.2

task dynamic_container {
  input {
    String image_tag = "latest"
    String ubuntu_release
  }

  command <<<
    cat /etc/*-release | grep DISTRIB_CODENAME | cut -f 2 -d '=' > ubuntu_release
    nvcc -V > cuda_version
  >>>


  output {
    String is_expected_release = ubuntu_release == read_string("ubuntu_release")
    String cuda_version = read_string("cuda_version")
  }

  requirements {
    container: "nvidia/cuda:~{image_tag}"
    gpu: true
  }

  hints {
    gpu: 2
  }
}

WDL 1.2.0 also brings several new and improved standard library functions. The join_paths function is now the preferred way to concatenate paths, the matches and find functions perform pattern matching on strings, the contains function checks for the existence of a value in an array, and the chunk function splits an array into equal-sized chunks. Additionally, the keys function can now get the names of members in an Object or Struct, the contains_key function checks for the existence of keys in various types, and the select_first function accepts a default value. The size function now handles all compound value inputs, and the length function accepts more types of arguments. Finally, the read_tsv function can read field names from a header row or an array of strings and return an array of objects.

The release of WDL 1.2.0 marks an important milestone in the evolution of workflow management for bioinformatics. The new features and enhancements are designed to address common challenges faced by bioinformatics researchers and developers, including the need for greater flexibility, improved documentation, and better error handling.

Regarding plans for WDL development, Patrick Magee, a member of the WDL governance and senior software developer at DNAStack, shared insights into the team's direction. Magee stated:

We want to establish a more regular release cadence, focusing on delivering a smaller set of impactful features and clarifications. This will allow us to respond rapidly to community feedback and improve the developer and end-user experience.

He also added:

Keep an eye out for proposals of long-awaited features such as enumerations, list comprehensions, and input validation. Additionally, we also want to build on the improvements made in 1.2 and further hone workflow portability and reproducibility, giving the community the tools needed to grow the usage of WDL in all environments.

For more information on WDL 1.2.0 and to access the complete documentation, refer to the official WDL website and GitHub repository.

About the Author

Rate this Article

Adoption
Style

Related Content

BT