Python Workflow API¶
The Python Workflow API is the primary way to author Sophios workflows. It lets you describe a workflow in the same place where you already have names, functions, imports, tests, and reusable Python helpers, while still compiling to portable CWL artifacts.
The API is intentionally centered on a focused set of concepts:
a
CommandLineTooldescribes one executable tool,a
Stepplaces that tool inside a workflow graph,a
Workflowowns the graph, named outputs, and compiled artifacts,assignments such as
cat.inputs.file = touch.outputs.filerecord workflow bindings rather than executing commands immediately.
If you remember one rule, make it this: Python describes the graph; compilation turns that graph into CWL and job inputs; execution happens after that.
Scope¶
This page covers:
what a
Steprepresents,what a
Workflowrepresents,how literal values bind to step inputs,
how a step output becomes a later step input,
what compilation produces,
how to compile and inspect workflow artifacts,
when to run locally,
when to keep compiled CWL in memory for submission payload construction.
It does not cover every CWL feature. Advanced YAML controls, static dispatch, and program synthesis are documented separately in Advanced YAML and Operations.
Working Directory Assumption¶
Most examples in this guide assume you are running from the repository root:
cd /path/to/sophios
That makes checked-in adapter paths such as cwl_adapters/echo.cwl resolve
cleanly.
When writing reusable scripts, prefer Path objects over fragile string paths:
from pathlib import Path
echo_tool = Path("cwl_adapters") / "echo.cwl"
Step 1: Load a Tool as a Step¶
A Step is a CWL CommandLineTool placed inside a workflow context.
from pathlib import Path
from sophios.apis.python.workflow import Step
echo = Step(Path("cwl_adapters") / "echo.cwl")
At this point, Sophios has loaded the tool contract. It knows the tool has an
input named message and an output named stdout.
You can inspect the ports:
for input_port in echo.inputs:
print(input_port.name, input_port.parameter_type)
for output_port in echo.outputs:
print(output_port.name, output_port.parameter_type)
This interface is the contract Sophios uses for validation, binding, edge inference, and generated CWL.
Step 2: Bind a Literal Input¶
Bind a value to a step input:
echo.inputs.message = "hello from Sophios"
That assignment does not run the tool. It records a workflow binding:
echo.message <- "hello from Sophios"
Sophios will later turn that binding into the appropriate generated CWL job input.
The older shorthand still works:
echo.message = "hello from Sophios"
The docs use the explicit form because it is clearer:
step.inputs.<name>is where values enter a step,step.outputs.<name>is where values leave a step.
Step 3: Put Steps in a Workflow¶
Wrap the step in a workflow:
from sophios.apis.python.workflow import Workflow
workflow = Workflow([echo], "hello_python")
A workflow has a name and an ordered list of children. Children may be concrete steps or nested workflows.
Compile without running:
workflow.write_artifacts()
Run locally:
workflow.run()
The minimal complete example is therefore:
from pathlib import Path
from sophios.apis.python.workflow import Step, Workflow
def build_workflow() -> Workflow:
echo = Step(Path("cwl_adapters") / "echo.cwl")
echo.inputs.message = "hello from Sophios"
return Workflow([echo], "hello_python")
if __name__ == "__main__":
build_workflow().run()
This example keeps the moving parts visible. A runnable workflow needs a tool,
an input binding, and a Workflow that can compile and run.
Step 4: Link Two Steps¶
Most workflows become useful when one step consumes another step’s output.
from pathlib import Path
from sophios.apis.python.workflow import Step, Workflow
touch = Step(Path("cwl_adapters") / "touch.cwl")
touch.inputs.filename = "empty.txt"
append = Step(Path("cwl_adapters") / "append.cwl")
append.inputs.file = touch.outputs.file
append.inputs.str = "Hello"
cat = Step(Path("cwl_adapters") / "cat.cwl")
cat.inputs.file = append.outputs.file
workflow = Workflow([touch, append, cat], "multistep_python")
workflow.write_artifacts()
Read the bindings as arrows:
touch.outputs.file -> append.inputs.file
append.outputs.file -> cat.inputs.file
The Python assignment creates the edge. You do not need to hand-write a CWL
source field.
Linear Edge Inference¶
The Python API can also use the same edge inference mechanism as .wic
workflows. In a linear workflow, if a required step input is not bound, Sophios
passes that missing input to the compiler. The compiler then looks backward
through earlier steps and connects the most recent compatible output.
That means this also works:
touch = Step(Path("cwl_adapters") / "touch.cwl")
touch.inputs.filename = "empty.txt"
append = Step(Path("cwl_adapters") / "append.cwl")
append.inputs.str = "Hello"
cat = Step(Path("cwl_adapters") / "cat.cwl")
workflow = Workflow([touch, append, cat], "multistep_python")
workflow.write_artifacts()
Here Sophios infers:
touch.outputs.file -> append.inputs.file
append.outputs.file -> cat.inputs.file
Use inference when the workflow is a clear linear chain and the generated graph is easy to inspect. Prefer explicit bindings when multiple prior outputs could match, when names matter, or when the workflow definition must make data flow unambiguous without relying on generated artifacts.
Binding Types¶
There are two common input binding patterns.
Binding |
Python shape |
Meaning |
|---|---|---|
Literal value |
|
The value is known now. |
Step output |
|
The value comes from an earlier step. |
This table is one of the most important concepts in the Python API. Most workflow code is a readable sequence of these bindings.
What Bindings Become¶
Python assignments in a Sophios workflow are declarative. They do not execute a command when the assignment runs. They record enough information for the compiler to generate a CWL workflow and a CWL job input object.
For example, this literal binding:
echo.inputs.message = "hello"
becomes part of the generated job input data. Sophios keeps the value out of the CWL tool definition so the compiled workflow and runtime inputs remain separate.
This edge binding:
cat.inputs.file = append.outputs.file
becomes a CWL workflow source relationship. The compiled workflow connects the
concrete output port from the append step to the concrete input port on the
cat step.
The practical rule is: write Python that describes the graph, then inspect the generated CWL when exact execution behavior matters.
Named Workflow Outputs¶
Outputs should also be deliberate.
workflow.outputs.captured_stdout = echo.outputs.stdout
That line exposes the step output as a stable workflow result.
Use named workflow outputs when a downstream user, test, or execution service needs a stable result name.
Export .wic From Python¶
Every Workflow can export a .wic source representation. For a quick
in-memory view, inspect workflow.yaml:
print(workflow.yaml)
This shows the file representation Sophios can derive from the current Python
workflow. It is useful when debugging surprising bindings before compilation or
when you want to compare Python-authored workflows with .wic workflows.
When you want a real .wic file, use write_wic():
workflow.write_wic("hello_python.wic")
This writes a source .wic workflow from the Python object. It does not compile
the workflow and it does not write generated CWL. Literal bindings, named
outputs, explicit edges, and intentionally unbound linear inputs are preserved
in the .wic representation so the normal Sophios compiler can still apply
edge inference later.
If you need the text instead of a file:
wic_text = workflow.to_wic()
For nested workflows, write_wic() embeds subworkflows in the root document by
default. If you want a sibling-file tree instead, pass
inline_subworkflows=False:
workflow.write_wic("workflows", inline_subworkflows=False)
Compile Paths¶
Sophios gives you two common compile paths.
Compilation is the point where the high-level Python workflow becomes concrete CWL. During compilation, Sophios validates the workflow structure, resolves explicit bindings, applies edge inference where inputs were intentionally left unbound, emits runtime input data for literal values, and produces a CWL workflow document that can be inspected or executed by a CWL runner.
Write Artifacts to Disk¶
workflow.write_artifacts()
Use this when you want to inspect generated files. Typical artifacts include:
autogenerated/<workflow>.cwl,autogenerated/<workflow>_inputs.yml,Graphviz files when graph output is enabled.
This is the best path when generated artifacts need to be reviewed, committed to a test fixture, or inspected during debugging.
workflow.compile(write_to_disk=True) is still available. write_artifacts()
is the clearer public method for the common “compile and inspect files” path.
Neither method writes intermediate .wic compiler trees by default. Use
workflow.write_wic(...) when you want a source .wic file.
Keep Compiled CWL in Memory¶
compiled = workflow.get_cwl_workflow()
Use this when the next step is another Python operation, such as packaging a submission payload.
The returned object contains:
the compiled CWL workflow fields,
the workflow name,
generated
yaml_inputs.
Submission payloads commonly expect the CWL workflow document and job inputs as separate pieces. The compute payload guide shows one concrete service-oriented payload shape in detail.
Running Locally¶
Run locally with:
workflow.run()
Sophios supports two local CWL runners through Workflow.run():
cwltool, the default local runner,toil-cwl-runner, for local execution through Toil.
Runtime options can be passed through run_args_dict:
workflow.run(
run_args_dict={
"cwl_runner": "cwltool",
"container_engine": "docker",
"copy_output_files": "yes",
}
)
To run the same workflow with Toil, switch the runner:
workflow.run(
run_args_dict={
"cwl_runner": "toil-cwl-runner",
"container_engine": "docker",
"logLevel": "INFO",
}
)
Sophios uses recognized local-run options for setup and passes runner-specific key/value options to the selected CWL runner. See Running a Multistep Workflow Locally for a complete example.
Local execution is best for:
verifying development-scale workflows,
verifying tool contracts,
checking generated outputs before submission,
verifying end-to-end behavior in a controlled environment.
If the workflow needs a large container image or special hardware, compile first and inspect the generated artifacts before attempting a full run.
Nested Workflows¶
A workflow can contain another workflow:
preprocess = Workflow([touch, append], "preprocess")
report = Workflow([preprocess, cat], "report")
Nested workflows are how large pipelines are split into named components. A subworkflow can have its own inputs, outputs, tests, and documentation.
Use nesting when a group of steps has a meaningful name:
“preprocess”,
“segment”,
“measure”,
“summarize”,
“submit”.
Scatter and Conditionals¶
CWL supports scatter and conditional execution. Sophios exposes those controls through step-level settings:
echo.scatter_on(echo.inputs.message, method="dotproduct")
echo.when = "$(inputs.message != '')"
These features are powerful and should be used with explicit review of the generated CWL. Scatter changes the shape of values. Conditionals can make outputs nullable. Both affect downstream type compatibility and runtime behavior.
The current lightweight when_pyapi.py example runs locally, but CWL correctly
warns that outputs from conditional steps may be null. That warning is
expected: conditional execution changes the output type contract.
The lightweight scatter_pyapi.py example uses a CWL Any output because the
array-producing tool is intentionally generic. Sophios treats Any as
permissive for explicit Python bindings while keeping compiler edge inference
conservative.
Generated Files¶
When you compile or run with disk output enabled, Sophios writes useful artifacts:
autogenerated/<workflow>.cwl: compiled root CWL workflow.autogenerated/<workflow>_inputs.yml: generated job inputs.autogenerated/schemas/: generated schemas for.wicvalidation.cachedir*: CWL runner caches and intermediate files.provenance/: CWL provenance data unless disabled.outdir/: copied primary outputs when output copying is enabled.error_<workflow>.txt: tracebacks from compilation or runner startup failures.
Intermediate compiler .wic trees are not written under autogenerated/ during
normal compilation. The CLI writes them only when explicitly invoked with
--write_intermediate_wic.
These files are the main debugging surface. A production workflow should be reviewed through its generated artifacts, not only through the Python code.
Troubleshooting¶
When a workflow fails:
Compile before running. If compilation fails, the issue is in the workflow structure or tool contract.
Inspect
workflow.yamlto check bindings before compilation.Inspect
autogenerated/<workflow>.cwland the generated inputs after compilation.Re-run without quiet settings if you need full CWL runner output.
Check whether a container image is being pulled.
Delete
autogenerated/,cachedir*,outdir/, andprovenance/when you need a clean local run.Replace advanced controls with simpler bindings when isolating a failure.
Maintained Examples¶
The following examples are intended to be quick to read and quick to run:
examples/scripts/helloworld_pyapi.py: minimal Python workflow.examples/scripts/multistep1_pyapi.py: step-to-step file flow.examples/scripts/multistep1_toJson_pyapi.py: in-memory compiled workflow JSON.examples/scripts/multistep_runner_pyapi.py: local runner selection for a multistep workflow.examples/scripts/reusable_interface_pyapi.py: reusable workflow interface example.examples/scripts/scatter_pyapi.py: scatter over array-valued bindings.examples/scripts/when_pyapi.py: conditional execution.examples/scripts/tool_builder_workflow.py: generated CLTs composed in memory.examples/scripts/compute_payload_workflow.py: Python workflow to validated compute payload.
The Ichnaea and SAM3 walkthroughs are larger, production-oriented examples with heavier runtime assumptions.
Next Steps¶
Continue with:
Building Tool Contracts in Python to author tools.
Using Tool Builder and the Workflow Python API Together to build tools in memory and compose them immediately.
From Python Workflow to Compute Payload to prepare validated submission payloads from compiled workflows.
Advanced YAML and Operations when you need
.wicfiles, schema validation, inference controls, or audit-friendly artifacts.