Coercion of String to File outside a task: where does it happen? #581

adamnovak · 2023-07-28T21:44:11Z

In WDL 1.1, String has an "obvious and unambiguous" coercion to File:

https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md#type-coercion

What is this conversion, exactly?

For example, a natural way to implement a scatter is to have each copy of the body execute in parallel on a different machine. So if you do something like:

workflow Example {

    input {
        String filename = "/etc/hostname"
    }

    File firstFile = filename
    String firstContent = read_string(firstFile)
    scatter(iteration in range(10000)) {
        File file = filename
        String content = read_string(file)
    }

    output {
        String first = firstContent
        Array[String] others = content
    }

}

What is supposed to be true about the others array? Are the values all the same? Are they all the same as first? For that matter, is firstalways the same thing you would see if you ran cat /etc/hostname instead of executing the workflow?

When working with functions that take either File or String, the spec says that Strings are interpreted "relative to the current working directory of the task"; presumably that would apply to coercions as well. But here we're not in a task, and a workflow as a distributed thing doesn't seem to me to have a natural working directory.

In the Toil workflow runner's WDL interpreter, I really want to be allowed to move workflow code to any machine at any line, or even between lines, so I can ship fragments of it to the cluster when they're ready to run, just like the tasks. But that seems to be breaking some of the GATK workflows which assume that, at least at top-level scope, a workflow can always read the same filesystem its caller sees. So I'm trying to work out how legitimate that assumption is, and whether there are guarantees that go further than that.

The text was updated successfully, but these errors were encountered:

patmagee · 2023-09-28T00:27:58Z

@adamnovak great question and this is only a half answer. I would really like to tease out your use case a bit more because federation is something I am personally interested as well...

For the time being though you are correct ins rating that there seems to be an assumption that the workflow itself has access to the same file system as the individual task. That is, in general how engines have been implemented and what the expected behaviour is.

Whether this is the correct behaviour, or the logical behaviour is a totally different matter. It has lead to a lot of complexity within engines and the community.

The path representation of a file in a workflow is likely different within the context of a task via a workflow. In a workflow, the path often is the absolute reference to the file (think object in object store, or file on father file system). However within a task, the engine has a hit of liberty here or changing the path and making it something relative to within the container.

jdidion added the question label Feb 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Coercion of String to File outside a task: where does it happen? #581

Coercion of String to File outside a task: where does it happen? #581

Coercion of String to File outside a task: where does it happen? #581

Coercion of String to File outside a task: where does it happen? #581

Comments