# Building Tool Contracts in Python This walkthrough shows how to build a real CWL `CommandLineTool` using `sophios.apis.python.tool_builder`. The design goal is simple: - the required structure of the tool should be obvious at a glance, - input and output names should come from Python names rather than raw string keys, - and optional CWL details should feel like optional add-ons, not required boilerplate. The full working example lives in [examples/scripts/sam3_tool_builder.py](https://github.com/PolusAI/sophios/blob/main/examples/scripts/sam3_tool_builder.py). ## The core idea There are only three required pieces: 1. a tool name, 2. an `Inputs(...)` collection, 3. an `Outputs(...)` collection. That means the basic shape always looks like this: ```python inputs = Inputs( input=Input(cwl.directory, position=1), output=Input(cwl.directory, position=2), ) outputs = Outputs( output=Output(cwl.directory, from_input=inputs.output), ) tool = CommandLineTool("example", inputs, outputs) ``` Everything else is optional and chainable: ```python tool = ( CommandLineTool("example", inputs, outputs) .base_command("python", "main.py") .docker("python:3.12") .resources(cores=2, ram=4096) ) ``` That split is intentional. The constructor shows the tool contract. The chained calls describe the runtime and metadata details around that contract. ## Why this is easier to read Less structured builder code asks you to mentally assemble the CLT while reading a long chain. The new style makes the shape visible immediately: - `Inputs(...)` gives names to inputs using Python keywords, - `Outputs(...)` gives names to outputs the same way, - `CommandLineTool(...)` requires those named collections up front. That helps non-experts because the code now reads more like: "this tool has these inputs and these outputs" and less like: "start a builder, keep chaining methods, and hope the required bits showed up somewhere in the middle." ## Named inputs without raw string keys One of the important design constraints is that users should not have to write raw string names for input and output definitions. So instead of: ```python inputs = { "input": ..., "output": ..., } ``` you write: ```python inputs = Inputs( input=Input(cwl.directory, position=1), output=Input(cwl.directory, position=2), ) ``` Those Python keyword names become the CWL parameter names. The same thing applies to outputs: ```python outputs = Outputs( output=Output(cwl.directory, from_input=inputs.output), ) ``` Notice that `from_input=inputs.output` uses a real named input reference, not a raw string like `"output"`. The other important convention is that CWL types live under the `cwl` namespace: ```python cwl.int cwl.float cwl.file cwl.directory ``` That keeps CWL vocabulary visually separate from Python builtins and makes intent easier to scan. ## How to think about inputs Each input answers two questions: 1. what type of thing is this? 2. how does the underlying application receive it? Examples: ```python Input(cwl.directory, position=1) Input(cwl.file, flag="--model", required=False) Input(cwl.int, flag="--tile-size", required=False) ``` These read very close to application intent: - positional directory argument, - optional file passed with `--model`, - optional integer passed with `--tile-size`. Optional metadata can then be chained: ```python Input(cwl.file, flag="--model", required=False).label("Model override file").doc("Path to sam3.pt") ``` That is the intended use of chaining in this API: optional polish on top of a complete required core. ## How to think about outputs Outputs follow the same pattern: ```python Output(cwl.directory, from_input=inputs.output) Output(cwl.file, glob="results.json") Output.stdout() ``` Again, the goal is to describe what the output means, not to hand-assemble `outputBinding` YAML. ## The SAM3 example ```python from pathlib import Path from sophios.apis.python.tool_builder import CommandLineTool, Input, Inputs, Output, Outputs, cwl inputs = Inputs( input=Input(cwl.directory, position=1).label("Input Zarr dataset").doc("Path to input zarr dataset"), output=Input(cwl.directory, position=2).label("Output segmentation Zarr").doc( "Path for output segmentation zarr" ), model=Input(cwl.file, flag="--model", required=False) .label("Model override file") .doc("Path containing sam3.pt to override baked-in models/sam3"), tile_size=Input(cwl.int, flag="--tile-size", required=False) .label("Tile size") .doc("Tile size for large slices (default 1024)"), overlap=Input(cwl.int, flag="--overlap", required=False) .label("Tile overlap") .doc("Overlap between adjacent tiles in pixels (default 128)"), iou_threshold=Input(cwl.float, flag="--iou-threshold", required=False) .label("IoU threshold") .doc("IoU threshold for matching labels across tiles (default 0.5)"), batch_size=Input(cwl.int, flag="--batch-size", required=False) .label("Batch size") .doc("Number of tiles per GPU forward pass (default 8)"), lora_weights=Input(cwl.file, flag="--lora-weights", required=False) .label("LoRA weights") .doc("Path to LoRA adapter weights (.pt file) - optional"), lora_rank=Input(cwl.int, flag="--lora-rank", required=False) .label("LoRA rank") .doc("LoRA rank used when lora_weights is set (default 16)"), lora_alpha=Input(cwl.int, flag="--lora-alpha", required=False) .label("LoRA alpha") .doc("LoRA alpha scaling factor used when lora_weights is set (default 32)"), ) outputs = Outputs( output=Output(cwl.directory, from_input=inputs.output).label("Output segmentation Zarr"), ) tool = ( CommandLineTool("sam3_ome_zarr_autosegmentation", inputs, outputs) .describe( "SAM3 OME Zarr autosegmentation", "Run SAM3 autosegmentation on a zarr volume.\n" "Models are baked into the container image at models/sam3, " "so no model staging is required.", ) .edam() .gpu(cuda_version_min="11.7", compute_capability="3.0", device_count_min=2) .docker("polusai/ichnaea-api:latest") .stage(inputs.output, writable=True) .stage(inputs.input) .resources(cores=4, ram=64000) .base_command( "/backend/.venv/bin/python", "/backend/dagster_pipelines/jobs/autosegmentation/logic.py", ) ) output_path = Path("sam3_ome_zarr_autosegmentation.cwl") tool.save(output_path, validate=True) ``` ## What the builder is hiding for you This API is supposed to absorb the repetitive CWL details that regularly trip people up: - nullable unions for optional values, - `inputBinding.prefix` vs `inputBinding.position`, - `outputBinding.glob` expressions derived from input names, - `InitialWorkDirRequirement` entries for staged inputs, - `InlineJavascriptRequirement` when helper-generated expressions are present, - namespaced hints such as `cwltool:CUDARequirement`. That means most users only need to think about: - what command runs, - what the inputs are, - what the outputs are, - and which optional runtime constraints apply. ## Sane defaults For most tools, the happy path is: ```python CommandLineTool(name, inputs, outputs).base_command(...).docker(...).resources(...) ``` Everything else is optional. In particular: - `label` is optional, - `doc` is optional, - namespaces and schemas are optional, - `InlineJavascriptRequirement` is added automatically when helper-generated expressions are present, - resource requirements are optional, - EDAM metadata is optional and available through `.edam()`. ## Validation And Error Prevention There are two ways the builder helps users catch mistakes early. ### 1. The API narrows the common error surface The builder gives you named operations rather than raw nested dictionaries, which means fewer routine mistakes: - malformed optional types, - incorrect binding placement, - incorrect output glob expressions, - missing requirement wrappers, - missing namespace setup for common hints. ### 2. Validation is built in When you call: ```python tool.save(output_path, validate=True) ``` or: ```python tool.validate() ``` Sophios validates the generated CLT as a real CWL `CommandLineTool`. Sophios checks the concrete tool document it will save or hand to the workflow API, so mistakes show up at the tool boundary instead of later inside a larger workflow. ## Escape hatches The main API is intentionally structured, but escape hatches still exist for advanced cases: - `requirement(...)` - `hint(...)` - `argument(...)` - `extra(...)` Those are for the unusual edges of CWL. They should be the exception, not the starting point. ## Using a built CLT in the workflow API The CLT builder can also hand off directly to the workflow Python API without writing a `.cwl` file: ```python tool = CommandLineTool( "echo_tool", Inputs(message=Input(cwl.string, position=1)), Outputs(out=Output.stdout()), ).stdout("stdout.txt") step = Step(tool, step_name="say_hello") step.inputs.message = "hello" workflow = Workflow([step], "wf") ``` The handoff is direct: the same built tool can be validated, written to disk, or wrapped in `Step(tool, step_name=...)` and composed into a workflow without creating an intermediate `.cwl` file first. ## Run the example From the repository root: ```bash python examples/scripts/sam3_tool_builder.py ``` The script writes and validates the generated CLT by default. To change the output path or skip validation, edit the constants near the top of the script.