Advanced YAML and Operations¶
The .wic YAML format is an advanced Sophios usage mode.
The recommended authoring path is Python. YAML becomes valuable when you need a workflow to be a plain file: easy to archive, diff, validate, run headlessly, or inspect without importing project-specific Python code.
Use this guide when you need:
standalone
.wicworkflows,CI or batch execution from files,
audit-friendly workflow artifacts,
explicit inspection of inference and generated CWL,
schema-backed editor validation,
low-level Sophios features such as namespaces, static dispatch, and metadata annotations.
Minimal .wic Workflow¶
steps:
- echo:
in:
message: !ii Hello World
Run it from the command line:
sophios --yaml docs/tutorials/helloworld.wic --run_local --copy_output_files
Compile without running:
sophios --yaml docs/tutorials/helloworld.wic --generate_cwl_workflow
The generated CWL and input artifacts are written under autogenerated/.
Why YAML Is Still Useful¶
YAML is the file-native Sophios workflow representation. It is excellent for operational workflows where the workflow definition should be reviewed, stored, validated, or executed without importing project-specific Python modules.
YAML gives you:
a single workflow file that can be reviewed without executing Python,
reproducible headless commands for CI and batch systems,
stable artifacts for audits and papers,
direct access to inference tags, anchors, and metadata,
schema validation in editors,
optional compiler-internal
.wictrees when debugging the compiler.
CLI Modes¶
Common commands:
sophios --yaml workflow.wic --generate_cwl_workflow
sophios --yaml workflow.wic --run_local
sophios --yaml workflow.wic --generate_run_script
sophios --generate_schemas
sophios --generate_config
Intermediate compiler .wic trees are not written by default. If you need them
while debugging the compiler, opt in explicitly:
sophios --yaml workflow.wic --generate_cwl_workflow --write_intermediate_wic
Useful flags:
--graphviz: write Graphviz sources and rendered diagrams whendotis available.--inputs_file <file>: merge extra job inputs into generated CWL inputs.--copy_output_files: copy primary outputs intooutdir/.--cwl_runner toil-cwl-runner: run locally with Toil instead ofcwltool.--container_engine podman: use Podman instead of Docker.--inference_use_naming_conventions: refine edge inference with naming rules.--insert_steps_automatically: attempt limited automatic insertion when inference fails.
Configuration and Discovery¶
Sophios discovers CWL tools and .wic workflows from a JSON config file.
The main keys are:
{
"search_paths_cwl": {
"global": ["/path/to/cwl_adapters"]
},
"search_paths_wic": {
"global": ["/path/to/workflows"]
}
}
If you do not pass --config_file, Sophios uses ~/wic/global_config.json.
Generate a starter config with:
sophios --generate_config
Practical discovery rules:
Discovery is recursive.
CWL tools are keyed by filename stem inside a namespace.
.wicworkflows are keyed by filename stem inside a namespace.Duplicate stems in the same namespace overwrite earlier discoveries in memory.
Absolute paths are safer than relative paths because relative paths are resolved from the current working directory.
Inline Inputs¶
Use !ii when a value is known directly in the workflow file:
steps:
- echo:
in:
message: !ii Hello World
Sophios extracts inline values into the generated CWL job inputs document during compilation. This keeps simple workflows compact while still producing normal CWL execution artifacts.
Explicit Edges¶
Use anchors when inference is not the right communication tool.
steps:
- touch:
in:
filename: !ii empty.txt
out:
- file: !& created_file
- cat:
in:
file: !* created_file
The output anchor !& created_file is consumed later with !* created_file.
This notation is intentionally similar to YAML anchors, but it is specific to
Sophios workflow edges.
Edge Inference¶
Sophios can infer many edges by comparing input and output types and formats. The compiler only connects a step input to outputs that already exist from earlier steps.
This is the same compiler mechanism used by .wic workflows and Python
workflows. In Python, leaving a required step input unbound allows the compiler
to infer that edge during compilation.
At a high level:
Look backward through previous step outputs.
Compare CWL type and format.
Prefer the most recent compatible output.
Use the first compatible match when multiple candidates remain.
Inference reduces boilerplate, but it is not a substitute for review. Generated DAGs and generated CWL should be inspected when correctness matters.
Naming Conventions¶
When --inference_use_naming_conventions is enabled, Sophios can refine matches
with rename rules from the config file:
"renaming_conventions": [
["energy_", "edr_"],
["structure_", "tpr_"],
["traj_", "trr_"]
]
This can make inference more precise in domains where input and output names follow predictable conventions. It can also be misleading when file conversions or repeated formats make the “nearest” value different from the intended value. Use generated diagrams and explicit anchors when ambiguity matters.
Inference Rules¶
inference_rules can customize matching for specific formats:
"inference_rules": {
"edam:format_3881": "continue",
"edam:format_3987": "continue",
"edam:format_3878": "break",
"edam:format_2033": "break"
}
The implemented break rule stops the search at the current compatible output.
This is useful when older outputs are technically compatible but should not be
considered.
Namespaces¶
Namespaces distinguish tools or workflows that share the same filename stem.
Example config:
{
"search_paths_wic": {
"global": ["workflows/default"],
"alternate": ["workflows/collaborator"]
}
}
Use a namespaced workflow at the call site:
wic:
steps:
(1, min.wic):
wic:
namespace: alternate
Within one namespace, names should still be unique.
Metadata Annotations¶
YAML metadata lives under a top-level wic: key. This keeps Sophios-specific
metadata in one place so it can be merged, inspected, and removed during
compilation.
Graphviz metadata example:
wic:
graphviz:
label: Descriptive Subworkflow Name
ranksame:
- (1, short_step_name_1)
- (5, short_step_name_5)
steps:
(1, short_step_name_1):
wic:
graphviz:
label: Descriptive Step Name 1
Nested metadata can override child workflow metadata. This is useful for parameter passing and customization without editing the child workflow file.
Static Dispatch¶
Static dispatch lets one workflow call a logical step while choosing a concrete implementation at compile time.
Aggregator workflow:
wic:
default_implementation: implementation1
implementations:
implementation1:
steps:
- implementation1.wic:
implementation2:
steps:
- implementation2.wic:
Call-site override:
steps:
- static_dispatch.wic:
wic:
steps:
(1, static_dispatch.wic):
wic:
implementation: implementation2
Use this when several workflows provide the same conceptual operation but differ in algorithm, container, hardware assumptions, or performance profile.
Program Synthesis¶
--insert_steps_automatically enables a limited form of automatic step
insertion when edge inference initially fails.
This is useful for constrained cases such as known file-format conversions. It is not a general AI planner. Treat it as an advanced compiler feature and review the generated DAG carefully.
Subinterpreters¶
Subinterpreters support realtime or repeated auxiliary workflows while the main
workflow is running. The current cwl_subinterpreter path repeatedly runs an
independent auxiliary workflow for a fixed number of iterations.
This is an advanced CWL/Sophios integration feature. Reach for it only after the main workflow is stable and you need monitoring or auxiliary execution behavior that cannot be expressed cleanly as ordinary workflow steps.
Debugging Checklist¶
When a YAML workflow does something unexpected:
Compile without running first.
Inspect
autogenerated/<workflow>.cwl.Inspect generated job inputs.
Generate Graphviz output with
--graphviz.Replace inferred edges with explicit anchors where ambiguity matters.
Regenerate schemas if editor validation looks stale.
Delete
autogenerated/,cachedir*,outdir/, andprovenance/before a clean run.
YAML is strongest when it gives you evidence. Use the artifacts.