Canonical Python-to-Compute Flow with ichnaea_compact.py¶
This document describes the recommended Python path in Sophios for taking a tool definition all the way to a validated compute submission payload.
The canonical reference implementation is
examples/scripts/ichnaea_compact.py.
The goal of the example is precise:
define a CWL
CommandLineToolin Python,convert that tool into a Sophios
Step,wrap the step in a
Workflow,compile the workflow fully in memory,
package the compiled workflow and job inputs as a schema-valid compute payload,
submit that payload to the compute service chosen by the user.
This guide is intended to be read after:
Those documents explain the individual APIs. This one explains how they fit together in the current end-to-end path.
Scope¶
This document is specifically about the compute submission path currently implemented by Sophios.
That distinction matters for two reasons:
the payload schema is checked into Sophios,
the submission helper expects the HTTP API shape used by that compute service.
This is not a generic remote-execution tutorial, and it does not describe every possible third-party compute backend.
What this example demonstrates¶
ichnaea_compact.py is the canonical example because it captures the intended
division of responsibilities across the Python surface:
tool_builderdefines the tool contractthe workflow Python API defines orchestration
ComputeWorkflowPayloaddefines the submission payloadsubmit_compute_payload(...)performs submission and status polling
That separation is the architectural point of the example.
Sophios is not asking one object to behave simultaneously as:
a CLT authoring API,
a workflow API,
a JSON payload builder,
and a network client.
Instead, each layer contributes one well-scoped transformation.
The conceptual pipeline¶
The complete flow is:
Python CLT definition
-> Sophios Step
-> Sophios Workflow
-> compiled CWL workflow + job inputs
-> compute payload
-> compute submission
This is the simplest useful mental model for the example.
Where this document fits¶
The Python documentation now forms a sequence:
tool_builder_sam3 explains how to author one CLT in Python
tool_builder_workflow explains how a built CLT becomes a workflow step
compute_payload_workflow explains the generic compute payload API
this document explains the recommended end-to-end compute submission path
For most users, that means:
learn the CLT builder first
learn the CLT-to-workflow bridge second
use this document when moving to real compute submission
What ichnaea_compact.py is responsible for¶
The compact example is intentionally narrow. It does not attempt to demonstrate every capability of the Python APIs. Instead, it demonstrates one coherent workflow:
define the Ichnaea autosegmentation CLT
turn it into a one-step Sophios workflow
compile that workflow
package the compiled result for compute submission
optionally submit it
That narrow scope is deliberate. It makes the example suitable both as documentation and as a reference client.
Layer 1: the CLT definition¶
The first major function in the example is
build_autoseg_CLT().
This function belongs entirely to the tool_builder layer.
It is responsible for the CLT itself:
inputs
outputs
labels and docs
base command
Docker image
GPU hint
staging requirements
resource requests
That boundary is important. The workflow layer should not have to reconstruct tool-level concerns later.
What to look for in the CLT definition¶
When reading build_autoseg_CLT(), focus on three questions:
What is the external command contract?
What runtime assumptions are encoded in the CLT?
Which details are intrinsic to the tool, rather than to any particular workflow?
For example, the following all belong in the CLT:
the input and output zarr directories
the optional model override
the optional tiling and LoRA parameters
the Ichnaea container image
the GPU hint
the
InitialWorkDirRequirementthe resource request
Those are properties of the tool itself.
If you need a slower introduction to this style of CLT construction, return to tool_builder_sam3.
Layer 2: the workflow wrapper¶
The second major function is
workflow(...).
This function is narrow by design. Its purpose is not to redescribe the tool. Its purpose is to place the tool in a Sophios workflow context.
It does two things:
builds a
Stepfrom the generated CLTbinds concrete input values and wraps that step in a
Workflow
Build a step from the CLT¶
The boundary crossing is:
autoseg_clt = build_autoseg_CLT()
autoseg = Step(autoseg_clt, step_name="autoseg")
This is the intended handoff from tool_builder to the workflow API.
No intermediate .cwl file is required.
The CLT remains in memory and becomes a normal Sophios Step.
That is a key part of the current design. It keeps the authoring API and the workflow API loosely coupled while still allowing them to work together directly.
Value binding and workflow construction¶
The example then binds the concrete values:
autoseg.output = input_dicts["output_dir"]
autoseg.input = input_dicts["input_dir"]
autoseg.model = input_dicts["model_file"]
and wraps the step in a workflow:
wkflw = Workflow([autoseg], workflow_name)
This workflow layer is intentionally thin. In this example the workflow is mainly an orchestration wrapper around one already well-specified tool.
That is an acceptable and useful use of the workflow API.
Layer 3: compiled workflow output¶
The next boundary is the compiled workflow object:
workflow_json = autoseg_workflow.get_cwl_workflow()
This object contains:
the workflow name
the generated
yaml_inputsthe compiled CWL workflow document
The example then separates those pieces explicitly:
workflow_name = workflow_json["name"]
workflow_inputs = copy.deepcopy(workflow_json["yaml_inputs"])
workflow_json.pop("name")
workflow_json.pop("yaml_inputs")
compiled_cwl_workflow = copy.deepcopy(workflow_json)
This split is not incidental. It is the exact boundary between:
the result of workflow compilation
the input expected by the compute payload layer
After the split:
compiled_cwl_workflowis the CWL workflow documentworkflow_inputsis the compute job input objectworkflow_nameis the submission identifier
That explicit separation keeps the transition to the compute layer transparent.
Layer 4: compute payload construction¶
The next function,
create_compute_payload(...),
packages those pieces into a schema-backed ComputeWorkflowPayload.
The construction is intentionally direct:
compute_object = ComputeWorkflowPayload(
workflow_id=workflow_id,
cwl_workflow=cwl_workflow,
cwl_job_inputs=cwl_job_inputs,
compute_config=ComputeConfig(
toil=ToilConfig(log_level="INFO"),
output=OutputConfig.workflow_declared(),
slurm=SlurmConfig(partition="normal_gpu", cpus_per_task=4),
),
)
This is where compute-specific concerns are meant to live:
Toil configuration
output handling
Slurm scheduler settings
The workflow layer should not encode those concerns directly. Likewise, the compute payload layer should not need to know how the workflow was authored.
That is why the payload layer stays focused and declarative.
If you want the lower-level payload API in isolation, see compute_payload_workflow.
Submission behavior¶
The final step is submission:
submit_compute_payload(compute_object, submit_url)
The compute service URL is supplied by the user in Python:
SUBMIT_URL = "http://127.0.0.1:7998/compute/"
This is the correct contract for an example client:
the script does not assume a fixed deployment endpoint
the user decides whether a real submission should occur
leaving
SUBMIT_URL = Nonekeeps the script in build-only mode
That makes the script useful both as documentation and as a real client entry point.
Why this path is reliable¶
The value of this design is that each boundary can be checked before the next one is crossed.
Tool boundary¶
The CLT can be validated as a real CWL CommandLineTool.
Workflow boundary¶
The workflow can be compiled fully in memory before any submission occurs.
Compute boundary¶
The payload is constructed through ComputeWorkflowPayload, which validates the
result against the checked-in compute schema.
This means validation is incremental:
first confirm the tool
then confirm the workflow
then confirm the payload
only then submit
That is more reliable than assembling one large opaque object at the end.
The verification-oriented sibling: ichnaea_integrated.py¶
The compact example is the canonical path because it stays in memory as long as possible.
However, Sophios also provides
examples/scripts/ichnaea_integrated.py
for cases where explicit artifacts are desirable.
It follows the same overall logic, but writes outputs at each major boundary.
Generated CLT¶
The CLT is written to disk with validation:
autoseg_clt.save(
Path(__file__).with_name("built-ichnaea-autosegmentation.cwl"),
validate=True,
)
Compiled workflow artifacts¶
The workflow is compiled with disk output enabled:
autoseg_workflow.compile(write_to_disk=True)
This writes the compiled workflow artifacts under autogenerated/.
Compute JSON¶
The exact compute payload is written before submission:
with open(f"compute_{workflow_name}_integrated.json", "w", encoding="utf-8") as f:
json.dump(compute_json, f, indent=4, sort_keys=True)
That makes ichnaea_integrated.py the appropriate choice when:
the generated CLT must be reviewed directly
the compiled workflow artifacts must be preserved
the exact submission body must be inspected or archived
In other words:
use
ichnaea_compact.pyas the example to follow when creating your own end-to-end Python workflow submission scriptsuse
ichnaea_integrated.pywhen the same structure is needed but the CLT, compiled workflow artifacts, and final payload must also be written to disk
Recommended reading order¶
For a first reading of the example, the most useful order is:
build_autoseg_CLT()workflow(...)create_compute_payload(...)main()
That order follows the actual transformation pipeline:
tool definition
workflow construction
payload construction
orchestration and optional submission
Practical guidance¶
Use ichnaea_compact.py as the example to follow when creating or adapting
your own end-to-end compute submission scripts. Its structure is the
recommended baseline:
define the CLT in Python
convert it to a Sophios workflow
compile the workflow in memory
construct the compute payload from the compiled result
submit only when a concrete compute service URL is supplied
Use ichnaea_integrated.py when the same overall structure is required but the
workflow must also produce explicit artifacts:
the generated CLT on disk
the compiled workflow artifacts on disk
the exact compute payload JSON on disk
When diagnosing problems, the most effective order is:
validate the CLT
inspect the compiled workflow
inspect the compute payload
then investigate submission or runtime behavior
That keeps the investigation aligned with the actual system boundaries.
Commands¶
Compact path:
python examples/scripts/ichnaea_compact.py
Integrated path:
python examples/scripts/ichnaea_integrated.py
The integrated command writes the generated CLT, compiled workflow artifacts,
and compute JSON without submission.
To submit from either script, set SUBMIT_URL near the top of the file before
running it.
Summary¶
ichnaea_compact.py is the canonical Sophios Python example for
compute submission because it keeps the four layers of the system clear:
CLT authoring
workflow composition
payload construction
submission
That clarity is the main value of the example. It makes the path from Python-authored tool to compute payload direct, verifiable, and suitable for both documentation and real client use.