Advanced YAML and Operations

The .wic YAML format is an advanced Sophios usage mode.

The recommended authoring path is Python. YAML becomes valuable when you need a workflow to be a plain file: easy to archive, diff, validate, run headlessly, or inspect without importing project-specific Python code.

Use this guide when you need:

  • standalone .wic workflows,

  • CI or batch execution from files,

  • audit-friendly workflow artifacts,

  • explicit inspection of inference and generated CWL,

  • schema-backed editor validation,

  • low-level Sophios features such as namespaces, static dispatch, and metadata annotations.

Minimal .wic Workflow

steps:
- echo:
    in:
      message: !ii Hello World

Run it from the command line:

sophios --yaml docs/tutorials/helloworld.wic --run_local --copy_output_files

Compile without running:

sophios --yaml docs/tutorials/helloworld.wic --generate_cwl_workflow

The generated CWL and input artifacts are written under autogenerated/.

Why YAML Is Still Useful

YAML is the file-native Sophios workflow representation. It is excellent for operational workflows where the workflow definition should be reviewed, stored, validated, or executed without importing project-specific Python modules.

YAML gives you:

  • a single workflow file that can be reviewed without executing Python,

  • reproducible headless commands for CI and batch systems,

  • stable artifacts for audits and papers,

  • direct access to inference tags, anchors, and metadata,

  • schema validation in editors,

  • optional compiler-internal .wic trees when debugging the compiler.

CLI Modes

Common commands:

sophios --yaml workflow.wic --generate_cwl_workflow
sophios --yaml workflow.wic --run_local
sophios --yaml workflow.wic --generate_run_script
sophios --generate_schemas
sophios --generate_config

Intermediate compiler .wic trees are not written by default. If you need them while debugging the compiler, opt in explicitly:

sophios --yaml workflow.wic --generate_cwl_workflow --write_intermediate_wic

Useful flags:

  • --graphviz: write Graphviz sources and rendered diagrams when dot is available.

  • --inputs_file <file>: merge extra job inputs into generated CWL inputs.

  • --copy_output_files: copy primary outputs into outdir/.

  • --cwl_runner toil-cwl-runner: run locally with Toil instead of cwltool.

  • --container_engine podman: use Podman instead of Docker.

  • --inference_use_naming_conventions: refine edge inference with naming rules.

  • --insert_steps_automatically: attempt limited automatic insertion when inference fails.

Configuration and Discovery

Sophios discovers CWL tools and .wic workflows from a JSON config file.

The main keys are:

{
  "search_paths_cwl": {
    "global": ["/path/to/cwl_adapters"]
  },
  "search_paths_wic": {
    "global": ["/path/to/workflows"]
  }
}

If you do not pass --config_file, Sophios uses ~/wic/global_config.json. Generate a starter config with:

sophios --generate_config

Practical discovery rules:

  • Discovery is recursive.

  • CWL tools are keyed by filename stem inside a namespace.

  • .wic workflows are keyed by filename stem inside a namespace.

  • Duplicate stems in the same namespace overwrite earlier discoveries in memory.

  • Absolute paths are safer than relative paths because relative paths are resolved from the current working directory.

Inline Inputs

Use !ii when a value is known directly in the workflow file:

steps:
- echo:
    in:
      message: !ii Hello World

Sophios extracts inline values into the generated CWL job inputs document during compilation. This keeps simple workflows compact while still producing normal CWL execution artifacts.

Explicit Edges

Use anchors when inference is not the right communication tool.

steps:
- touch:
    in:
      filename: !ii empty.txt
    out:
    - file: !& created_file
- cat:
    in:
      file: !* created_file

The output anchor !& created_file is consumed later with !* created_file. This notation is intentionally similar to YAML anchors, but it is specific to Sophios workflow edges.

Edge Inference

Sophios can infer many edges by comparing input and output types and formats. The compiler only connects a step input to outputs that already exist from earlier steps.

This is the same compiler mechanism used by .wic workflows and Python workflows. In Python, leaving a required step input unbound allows the compiler to infer that edge during compilation.

At a high level:

  1. Look backward through previous step outputs.

  2. Compare CWL type and format.

  3. Prefer the most recent compatible output.

  4. Use the first compatible match when multiple candidates remain.

Inference reduces boilerplate, but it is not a substitute for review. Generated DAGs and generated CWL should be inspected when correctness matters.

Naming Conventions

When --inference_use_naming_conventions is enabled, Sophios can refine matches with rename rules from the config file:

"renaming_conventions": [
  ["energy_", "edr_"],
  ["structure_", "tpr_"],
  ["traj_", "trr_"]
]

This can make inference more precise in domains where input and output names follow predictable conventions. It can also be misleading when file conversions or repeated formats make the “nearest” value different from the intended value. Use generated diagrams and explicit anchors when ambiguity matters.

Inference Rules

inference_rules can customize matching for specific formats:

"inference_rules": {
  "edam:format_3881": "continue",
  "edam:format_3987": "continue",
  "edam:format_3878": "break",
  "edam:format_2033": "break"
}

The implemented break rule stops the search at the current compatible output. This is useful when older outputs are technically compatible but should not be considered.

Namespaces

Namespaces distinguish tools or workflows that share the same filename stem.

Example config:

{
  "search_paths_wic": {
    "global": ["workflows/default"],
    "alternate": ["workflows/collaborator"]
  }
}

Use a namespaced workflow at the call site:

wic:
  steps:
    (1, min.wic):
      wic:
        namespace: alternate

Within one namespace, names should still be unique.

Metadata Annotations

YAML metadata lives under a top-level wic: key. This keeps Sophios-specific metadata in one place so it can be merged, inspected, and removed during compilation.

Graphviz metadata example:

wic:
  graphviz:
    label: Descriptive Subworkflow Name
    ranksame:
    - (1, short_step_name_1)
    - (5, short_step_name_5)
  steps:
    (1, short_step_name_1):
      wic:
        graphviz:
          label: Descriptive Step Name 1

Nested metadata can override child workflow metadata. This is useful for parameter passing and customization without editing the child workflow file.

Static Dispatch

Static dispatch lets one workflow call a logical step while choosing a concrete implementation at compile time.

Aggregator workflow:

wic:
  default_implementation: implementation1
  implementations:
    implementation1:
      steps:
      - implementation1.wic:
    implementation2:
      steps:
      - implementation2.wic:

Call-site override:

steps:
- static_dispatch.wic:

wic:
  steps:
    (1, static_dispatch.wic):
      wic:
        implementation: implementation2

Use this when several workflows provide the same conceptual operation but differ in algorithm, container, hardware assumptions, or performance profile.

Program Synthesis

--insert_steps_automatically enables a limited form of automatic step insertion when edge inference initially fails.

This is useful for constrained cases such as known file-format conversions. It is not a general AI planner. Treat it as an advanced compiler feature and review the generated DAG carefully.

Subinterpreters

Subinterpreters support realtime or repeated auxiliary workflows while the main workflow is running. The current cwl_subinterpreter path repeatedly runs an independent auxiliary workflow for a fixed number of iterations.

This is an advanced CWL/Sophios integration feature. Reach for it only after the main workflow is stable and you need monitoring or auxiliary execution behavior that cannot be expressed cleanly as ordinary workflow steps.

Debugging Checklist

When a YAML workflow does something unexpected:

  • Compile without running first.

  • Inspect autogenerated/<workflow>.cwl.

  • Inspect generated job inputs.

  • Generate Graphviz output with --graphviz.

  • Replace inferred edges with explicit anchors where ambiguity matters.

  • Regenerate schemas if editor validation looks stale.

  • Delete autogenerated/, cachedir*, outdir/, and provenance/ before a clean run.

YAML is strongest when it gives you evidence. Use the artifacts.