Python Workflow API

The Python Workflow API is the primary way to author Sophios workflows. It lets you describe a workflow in the same place where you already have names, functions, imports, tests, and reusable Python helpers, while still compiling to portable CWL artifacts.

The API is intentionally centered on a focused set of concepts:

  • a CommandLineTool describes one executable tool,

  • a Step places that tool inside a workflow graph,

  • a Workflow owns the graph, named outputs, and compiled artifacts,

  • assignments such as cat.inputs.file = touch.outputs.file record workflow bindings rather than executing commands immediately.

If you remember one rule, make it this: Python describes the graph; compilation turns that graph into CWL and job inputs; execution happens after that.

Scope

This page covers:

  • what a Step represents,

  • what a Workflow represents,

  • how literal values bind to step inputs,

  • how a step output becomes a later step input,

  • what compilation produces,

  • how to compile and inspect workflow artifacts,

  • when to run locally,

  • when to keep compiled CWL in memory for submission payload construction.

It does not cover every CWL feature. Advanced YAML controls, static dispatch, and program synthesis are documented separately in Advanced YAML and Operations.

Working Directory Assumption

Most examples in this guide assume you are running from the repository root:

cd /path/to/sophios

That makes checked-in adapter paths such as cwl_adapters/echo.cwl resolve cleanly.

When writing reusable scripts, prefer Path objects over fragile string paths:

from pathlib import Path

echo_tool = Path("cwl_adapters") / "echo.cwl"

Step 1: Load a Tool as a Step

A Step is a CWL CommandLineTool placed inside a workflow context.

from pathlib import Path

from sophios.apis.python.workflow import Step


echo = Step(Path("cwl_adapters") / "echo.cwl")

At this point, Sophios has loaded the tool contract. It knows the tool has an input named message and an output named stdout.

You can inspect the ports:

for input_port in echo.inputs:
    print(input_port.name, input_port.parameter_type)

for output_port in echo.outputs:
    print(output_port.name, output_port.parameter_type)

This interface is the contract Sophios uses for validation, binding, edge inference, and generated CWL.

Step 2: Bind a Literal Input

Bind a value to a step input:

echo.inputs.message = "hello from Sophios"

That assignment does not run the tool. It records a workflow binding:

echo.message <- "hello from Sophios"

Sophios will later turn that binding into the appropriate generated CWL job input.

The older shorthand still works:

echo.message = "hello from Sophios"

The docs use the explicit form because it is clearer:

  • step.inputs.<name> is where values enter a step,

  • step.outputs.<name> is where values leave a step.

Step 3: Put Steps in a Workflow

Wrap the step in a workflow:

from sophios.apis.python.workflow import Workflow


workflow = Workflow([echo], "hello_python")

A workflow has a name and an ordered list of children. Children may be concrete steps or nested workflows.

Compile without running:

workflow.write_artifacts()

Run locally:

workflow.run()

The minimal complete example is therefore:

from pathlib import Path

from sophios.apis.python.workflow import Step, Workflow


def build_workflow() -> Workflow:
    echo = Step(Path("cwl_adapters") / "echo.cwl")
    echo.inputs.message = "hello from Sophios"
    return Workflow([echo], "hello_python")


if __name__ == "__main__":
    build_workflow().run()

This example keeps the moving parts visible. A runnable workflow needs a tool, an input binding, and a Workflow that can compile and run.

Linear Edge Inference

The Python API can also use the same edge inference mechanism as .wic workflows. In a linear workflow, if a required step input is not bound, Sophios passes that missing input to the compiler. The compiler then looks backward through earlier steps and connects the most recent compatible output.

That means this also works:

touch = Step(Path("cwl_adapters") / "touch.cwl")
touch.inputs.filename = "empty.txt"

append = Step(Path("cwl_adapters") / "append.cwl")
append.inputs.str = "Hello"

cat = Step(Path("cwl_adapters") / "cat.cwl")

workflow = Workflow([touch, append, cat], "multistep_python")
workflow.write_artifacts()

Here Sophios infers:

touch.outputs.file -> append.inputs.file
append.outputs.file -> cat.inputs.file

Use inference when the workflow is a clear linear chain and the generated graph is easy to inspect. Prefer explicit bindings when multiple prior outputs could match, when names matter, or when the workflow definition must make data flow unambiguous without relying on generated artifacts.

Binding Types

There are two common input binding patterns.

Binding

Python shape

Meaning

Literal value

step.inputs.message = "hello"

The value is known now.

Step output

cat.inputs.file = append.outputs.file

The value comes from an earlier step.

This table is one of the most important concepts in the Python API. Most workflow code is a readable sequence of these bindings.

What Bindings Become

Python assignments in a Sophios workflow are declarative. They do not execute a command when the assignment runs. They record enough information for the compiler to generate a CWL workflow and a CWL job input object.

For example, this literal binding:

echo.inputs.message = "hello"

becomes part of the generated job input data. Sophios keeps the value out of the CWL tool definition so the compiled workflow and runtime inputs remain separate.

This edge binding:

cat.inputs.file = append.outputs.file

becomes a CWL workflow source relationship. The compiled workflow connects the concrete output port from the append step to the concrete input port on the cat step.

The practical rule is: write Python that describes the graph, then inspect the generated CWL when exact execution behavior matters.

Named Workflow Outputs

Outputs should also be deliberate.

workflow.outputs.captured_stdout = echo.outputs.stdout

That line exposes the step output as a stable workflow result.

Use named workflow outputs when a downstream user, test, or execution service needs a stable result name.

Export .wic From Python

Every Workflow can export a .wic source representation. For a quick in-memory view, inspect workflow.yaml:

print(workflow.yaml)

This shows the file representation Sophios can derive from the current Python workflow. It is useful when debugging surprising bindings before compilation or when you want to compare Python-authored workflows with .wic workflows.

When you want a real .wic file, use write_wic():

workflow.write_wic("hello_python.wic")

This writes a source .wic workflow from the Python object. It does not compile the workflow and it does not write generated CWL. Literal bindings, named outputs, explicit edges, and intentionally unbound linear inputs are preserved in the .wic representation so the normal Sophios compiler can still apply edge inference later.

If you need the text instead of a file:

wic_text = workflow.to_wic()

For nested workflows, write_wic() embeds subworkflows in the root document by default. If you want a sibling-file tree instead, pass inline_subworkflows=False:

workflow.write_wic("workflows", inline_subworkflows=False)

Compile Paths

Sophios gives you two common compile paths.

Compilation is the point where the high-level Python workflow becomes concrete CWL. During compilation, Sophios validates the workflow structure, resolves explicit bindings, applies edge inference where inputs were intentionally left unbound, emits runtime input data for literal values, and produces a CWL workflow document that can be inspected or executed by a CWL runner.

Write Artifacts to Disk

workflow.write_artifacts()

Use this when you want to inspect generated files. Typical artifacts include:

  • autogenerated/<workflow>.cwl,

  • autogenerated/<workflow>_inputs.yml,

  • Graphviz files when graph output is enabled.

This is the best path when generated artifacts need to be reviewed, committed to a test fixture, or inspected during debugging.

workflow.compile(write_to_disk=True) is still available. write_artifacts() is the clearer public method for the common “compile and inspect files” path. Neither method writes intermediate .wic compiler trees by default. Use workflow.write_wic(...) when you want a source .wic file.

Keep Compiled CWL in Memory

compiled = workflow.get_cwl_workflow()

Use this when the next step is another Python operation, such as packaging a submission payload.

The returned object contains:

  • the compiled CWL workflow fields,

  • the workflow name,

  • generated yaml_inputs.

Submission payloads commonly expect the CWL workflow document and job inputs as separate pieces. The compute payload guide shows one concrete service-oriented payload shape in detail.

Running Locally

Run locally with:

workflow.run()

Sophios supports two local CWL runners through Workflow.run():

  • cwltool, the default local runner,

  • toil-cwl-runner, for local execution through Toil.

Runtime options can be passed through run_args_dict:

workflow.run(
    run_args_dict={
        "cwl_runner": "cwltool",
        "container_engine": "docker",
        "copy_output_files": "yes",
    }
)

To run the same workflow with Toil, switch the runner:

workflow.run(
    run_args_dict={
        "cwl_runner": "toil-cwl-runner",
        "container_engine": "docker",
        "logLevel": "INFO",
    }
)

Sophios uses recognized local-run options for setup and passes runner-specific key/value options to the selected CWL runner. See Running a Multistep Workflow Locally for a complete example.

Local execution is best for:

  • verifying development-scale workflows,

  • verifying tool contracts,

  • checking generated outputs before submission,

  • verifying end-to-end behavior in a controlled environment.

If the workflow needs a large container image or special hardware, compile first and inspect the generated artifacts before attempting a full run.

Nested Workflows

A workflow can contain another workflow:

preprocess = Workflow([touch, append], "preprocess")
report = Workflow([preprocess, cat], "report")

Nested workflows are how large pipelines are split into named components. A subworkflow can have its own inputs, outputs, tests, and documentation.

Use nesting when a group of steps has a meaningful name:

  • “preprocess”,

  • “segment”,

  • “measure”,

  • “summarize”,

  • “submit”.

Scatter and Conditionals

CWL supports scatter and conditional execution. Sophios exposes those controls through step-level settings:

echo.scatter_on(echo.inputs.message, method="dotproduct")
echo.when = "$(inputs.message != '')"

These features are powerful and should be used with explicit review of the generated CWL. Scatter changes the shape of values. Conditionals can make outputs nullable. Both affect downstream type compatibility and runtime behavior.

The current lightweight when_pyapi.py example runs locally, but CWL correctly warns that outputs from conditional steps may be null. That warning is expected: conditional execution changes the output type contract.

The lightweight scatter_pyapi.py example uses a CWL Any output because the array-producing tool is intentionally generic. Sophios treats Any as permissive for explicit Python bindings while keeping compiler edge inference conservative.

Generated Files

When you compile or run with disk output enabled, Sophios writes useful artifacts:

  • autogenerated/<workflow>.cwl: compiled root CWL workflow.

  • autogenerated/<workflow>_inputs.yml: generated job inputs.

  • autogenerated/schemas/: generated schemas for .wic validation.

  • cachedir*: CWL runner caches and intermediate files.

  • provenance/: CWL provenance data unless disabled.

  • outdir/: copied primary outputs when output copying is enabled.

  • error_<workflow>.txt: tracebacks from compilation or runner startup failures.

Intermediate compiler .wic trees are not written under autogenerated/ during normal compilation. The CLI writes them only when explicitly invoked with --write_intermediate_wic.

These files are the main debugging surface. A production workflow should be reviewed through its generated artifacts, not only through the Python code.

Troubleshooting

When a workflow fails:

  • Compile before running. If compilation fails, the issue is in the workflow structure or tool contract.

  • Inspect workflow.yaml to check bindings before compilation.

  • Inspect autogenerated/<workflow>.cwl and the generated inputs after compilation.

  • Re-run without quiet settings if you need full CWL runner output.

  • Check whether a container image is being pulled.

  • Delete autogenerated/, cachedir*, outdir/, and provenance/ when you need a clean local run.

  • Replace advanced controls with simpler bindings when isolating a failure.

Maintained Examples

The following examples are intended to be quick to read and quick to run:

  • examples/scripts/helloworld_pyapi.py: minimal Python workflow.

  • examples/scripts/multistep1_pyapi.py: step-to-step file flow.

  • examples/scripts/multistep1_toJson_pyapi.py: in-memory compiled workflow JSON.

  • examples/scripts/multistep_runner_pyapi.py: local runner selection for a multistep workflow.

  • examples/scripts/reusable_interface_pyapi.py: reusable workflow interface example.

  • examples/scripts/scatter_pyapi.py: scatter over array-valued bindings.

  • examples/scripts/when_pyapi.py: conditional execution.

  • examples/scripts/tool_builder_workflow.py: generated CLTs composed in memory.

  • examples/scripts/compute_payload_workflow.py: Python workflow to validated compute payload.

The Ichnaea and SAM3 walkthroughs are larger, production-oriented examples with heavier runtime assumptions.

Next Steps

Continue with: