Canonical Python-to-Compute Flow with ichnaea_compact.py

This document describes the recommended Python path in Sophios for taking a tool definition all the way to a validated compute submission payload.

The canonical reference implementation is examples/scripts/ichnaea_compact.py.

The goal of the example is precise:

  1. define a CWL CommandLineTool in Python,

  2. convert that tool into a Sophios Step,

  3. wrap the step in a Workflow,

  4. compile the workflow fully in memory,

  5. package the compiled workflow and job inputs as a schema-valid compute payload,

  6. submit that payload to the compute service chosen by the user.

This guide is intended to be read after:

Those documents explain the individual APIs. This one explains how they fit together in the current end-to-end path.

Scope

This document is specifically about the compute submission path currently implemented by Sophios.

That distinction matters for two reasons:

  • the payload schema is checked into Sophios,

  • the submission helper expects the HTTP API shape used by that compute service.

This is not a generic remote-execution tutorial, and it does not describe every possible third-party compute backend.

What this example demonstrates

ichnaea_compact.py is the canonical example because it captures the intended division of responsibilities across the Python surface:

  • tool_builder defines the tool contract

  • the workflow Python API defines orchestration

  • ComputeWorkflowPayload defines the submission payload

  • submit_compute_payload(...) performs submission and status polling

That separation is the architectural point of the example.

Sophios is not asking one object to behave simultaneously as:

  • a CLT authoring API,

  • a workflow API,

  • a JSON payload builder,

  • and a network client.

Instead, each layer contributes one well-scoped transformation.

The conceptual pipeline

The complete flow is:

Python CLT definition
    -> Sophios Step
    -> Sophios Workflow
    -> compiled CWL workflow + job inputs
    -> compute payload
    -> compute submission

This is the simplest useful mental model for the example.

Where this document fits

The Python documentation now forms a sequence:

  1. tool_builder_sam3 explains how to author one CLT in Python

  2. tool_builder_workflow explains how a built CLT becomes a workflow step

  3. compute_payload_workflow explains the generic compute payload API

  4. this document explains the recommended end-to-end compute submission path

For most users, that means:

  • learn the CLT builder first

  • learn the CLT-to-workflow bridge second

  • use this document when moving to real compute submission

What ichnaea_compact.py is responsible for

The compact example is intentionally narrow. It does not attempt to demonstrate every capability of the Python APIs. Instead, it demonstrates one coherent workflow:

  • define the Ichnaea autosegmentation CLT

  • turn it into a one-step Sophios workflow

  • compile that workflow

  • package the compiled result for compute submission

  • optionally submit it

That narrow scope is deliberate. It makes the example suitable both as documentation and as a reference client.

Layer 1: the CLT definition

The first major function in the example is build_autoseg_CLT().

This function belongs entirely to the tool_builder layer. It is responsible for the CLT itself:

  • inputs

  • outputs

  • labels and docs

  • base command

  • Docker image

  • GPU hint

  • staging requirements

  • resource requests

That boundary is important. The workflow layer should not have to reconstruct tool-level concerns later.

What to look for in the CLT definition

When reading build_autoseg_CLT(), focus on three questions:

  1. What is the external command contract?

  2. What runtime assumptions are encoded in the CLT?

  3. Which details are intrinsic to the tool, rather than to any particular workflow?

For example, the following all belong in the CLT:

  • the input and output zarr directories

  • the optional model override

  • the optional tiling and LoRA parameters

  • the Ichnaea container image

  • the GPU hint

  • the InitialWorkDirRequirement

  • the resource request

Those are properties of the tool itself.

If you need a slower introduction to this style of CLT construction, return to tool_builder_sam3.

Layer 2: the workflow wrapper

The second major function is workflow(...).

This function is narrow by design. Its purpose is not to redescribe the tool. Its purpose is to place the tool in a Sophios workflow context.

It does two things:

  1. builds a Step from the generated CLT

  2. binds concrete input values and wraps that step in a Workflow

Build a step from the CLT

The boundary crossing is:

autoseg_clt = build_autoseg_CLT()
autoseg = Step(autoseg_clt, step_name="autoseg")

This is the intended handoff from tool_builder to the workflow API.

No intermediate .cwl file is required. The CLT remains in memory and becomes a normal Sophios Step.

That is a key part of the current design. It keeps the authoring API and the workflow API loosely coupled while still allowing them to work together directly.

Value binding and workflow construction

The example then binds the concrete values:

autoseg.output = input_dicts["output_dir"]
autoseg.input = input_dicts["input_dir"]
autoseg.model = input_dicts["model_file"]

and wraps the step in a workflow:

wkflw = Workflow([autoseg], workflow_name)

This workflow layer is intentionally thin. In this example the workflow is mainly an orchestration wrapper around one already well-specified tool.

That is an acceptable and useful use of the workflow API.

Layer 3: compiled workflow output

The next boundary is the compiled workflow object:

workflow_json = autoseg_workflow.get_cwl_workflow()

This object contains:

  • the workflow name

  • the generated yaml_inputs

  • the compiled CWL workflow document

The example then separates those pieces explicitly:

workflow_name = workflow_json["name"]
workflow_inputs = copy.deepcopy(workflow_json["yaml_inputs"])
workflow_json.pop("name")
workflow_json.pop("yaml_inputs")
compiled_cwl_workflow = copy.deepcopy(workflow_json)

This split is not incidental. It is the exact boundary between:

  • the result of workflow compilation

  • the input expected by the compute payload layer

After the split:

  • compiled_cwl_workflow is the CWL workflow document

  • workflow_inputs is the compute job input object

  • workflow_name is the submission identifier

That explicit separation keeps the transition to the compute layer transparent.

Layer 4: compute payload construction

The next function, create_compute_payload(...), packages those pieces into a schema-backed ComputeWorkflowPayload.

The construction is intentionally direct:

compute_object = ComputeWorkflowPayload(
    workflow_id=workflow_id,
    cwl_workflow=cwl_workflow,
    cwl_job_inputs=cwl_job_inputs,
    compute_config=ComputeConfig(
        toil=ToilConfig(log_level="INFO"),
        output=OutputConfig.workflow_declared(),
        slurm=SlurmConfig(partition="normal_gpu", cpus_per_task=4),
    ),
)

This is where compute-specific concerns are meant to live:

  • Toil configuration

  • output handling

  • Slurm scheduler settings

The workflow layer should not encode those concerns directly. Likewise, the compute payload layer should not need to know how the workflow was authored.

That is why the payload layer stays focused and declarative.

If you want the lower-level payload API in isolation, see compute_payload_workflow.

Submission behavior

The final step is submission:

submit_compute_payload(compute_object, submit_url)

The compute service URL is supplied by the user in Python:

SUBMIT_URL = "http://127.0.0.1:7998/compute/"

This is the correct contract for an example client:

  • the script does not assume a fixed deployment endpoint

  • the user decides whether a real submission should occur

  • leaving SUBMIT_URL = None keeps the script in build-only mode

That makes the script useful both as documentation and as a real client entry point.

Why this path is reliable

The value of this design is that each boundary can be checked before the next one is crossed.

Tool boundary

The CLT can be validated as a real CWL CommandLineTool.

Workflow boundary

The workflow can be compiled fully in memory before any submission occurs.

Compute boundary

The payload is constructed through ComputeWorkflowPayload, which validates the result against the checked-in compute schema.

This means validation is incremental:

  • first confirm the tool

  • then confirm the workflow

  • then confirm the payload

  • only then submit

That is more reliable than assembling one large opaque object at the end.

The verification-oriented sibling: ichnaea_integrated.py

The compact example is the canonical path because it stays in memory as long as possible.

However, Sophios also provides examples/scripts/ichnaea_integrated.py for cases where explicit artifacts are desirable.

It follows the same overall logic, but writes outputs at each major boundary.

Generated CLT

The CLT is written to disk with validation:

autoseg_clt.save(
    Path(__file__).with_name("built-ichnaea-autosegmentation.cwl"),
    validate=True,
)

Compiled workflow artifacts

The workflow is compiled with disk output enabled:

autoseg_workflow.compile(write_to_disk=True)

This writes the compiled workflow artifacts under autogenerated/.

Compute JSON

The exact compute payload is written before submission:

with open(f"compute_{workflow_name}_integrated.json", "w", encoding="utf-8") as f:
    json.dump(compute_json, f, indent=4, sort_keys=True)

That makes ichnaea_integrated.py the appropriate choice when:

  • the generated CLT must be reviewed directly

  • the compiled workflow artifacts must be preserved

  • the exact submission body must be inspected or archived

In other words:

  • use ichnaea_compact.py as the example to follow when creating your own end-to-end Python workflow submission scripts

  • use ichnaea_integrated.py when the same structure is needed but the CLT, compiled workflow artifacts, and final payload must also be written to disk

Practical guidance

Use ichnaea_compact.py as the example to follow when creating or adapting your own end-to-end compute submission scripts. Its structure is the recommended baseline:

  • define the CLT in Python

  • convert it to a Sophios workflow

  • compile the workflow in memory

  • construct the compute payload from the compiled result

  • submit only when a concrete compute service URL is supplied

Use ichnaea_integrated.py when the same overall structure is required but the workflow must also produce explicit artifacts:

  • the generated CLT on disk

  • the compiled workflow artifacts on disk

  • the exact compute payload JSON on disk

When diagnosing problems, the most effective order is:

  1. validate the CLT

  2. inspect the compiled workflow

  3. inspect the compute payload

  4. then investigate submission or runtime behavior

That keeps the investigation aligned with the actual system boundaries.

Commands

Compact path:

python examples/scripts/ichnaea_compact.py

Integrated path:

python examples/scripts/ichnaea_integrated.py

The integrated command writes the generated CLT, compiled workflow artifacts, and compute JSON without submission. To submit from either script, set SUBMIT_URL near the top of the file before running it.

Summary

ichnaea_compact.py is the canonical Sophios Python example for compute submission because it keeps the four layers of the system clear:

  • CLT authoring

  • workflow composition

  • payload construction

  • submission

That clarity is the main value of the example. It makes the path from Python-authored tool to compute payload direct, verifiable, and suitable for both documentation and real client use.