Developer API

wic.ast

wic.ast.read_ast_from_disk(homedir, yaml_tree_tuple, yml_paths, tools, validator, ignore_validation_errors)

Reads the yml workflow definition files from disk (recursively) and inlines them into an AST

Parameters:
  • homedir (str) – The users home directory

  • yaml_tree_tuple (YamlTree) – A tuple of a filepath and its Yaml file contents.

  • yml_paths (Dict[str, Dict[str, Path]]) – The yml workflow definitions found using get_yml_paths()

  • tools (Tools) – The CWL CommandLineTool definitions found using get_tools_cwl()

  • validator (Draft202012Validator) – Used to validate the yml files against the autogenerated schema.

  • ignore_validation_errors (bool) – Temporarily ignore validation errors. Do not use this permanently!

Raises:

Exception – If the yml file(s) do not exist

Returns:

A tuple of the root filepath and the associated yml AST

Return type:

YamlTree

wic.ast.merge_yml_trees(yaml_tree_tuple, wic_parent, tools)

Implements ‘parameter passing’ by recursively merging wic: yml tags. Values from the parent workflow will overwrite / override subworkflows. See https://github.com/PolusAI/mm-workflows/blob/main/examples/gromacs/basic.wic for details

Parameters:
  • yaml_tree_tuple (YamlTree) – A tuple of a name and a yml AST

  • wic_parent (Yaml) – The wic: yml dict from the parent workflow

  • tools (Tools) – The CWL CommandLineTool definitions found using get_tools_cwl()

Raises:

Exception – If a wic: tag is found as an argument to a CWL CommandLineTool

Returns:

The yml AST with all wic: tags recursively merged.

Return type:

YamlTree

wic.ast.tree_to_forest(yaml_tree_tuple, tools)

The purpose of this function is to abstract away the process of traversing an AST.

Parameters:
  • yaml_tree_tuple (YamlTree) – A tuple of name and yml AST

  • tools (Tools) – The CWL CommandLineTool definitions found using get_tools_cwl()

Returns:

A recursive data structure containing all sub-trees encountered while traversing the yml AST.

Return type:

YamlForest

wic.ast.python_script_generate_cwl(yaml_tree_tuple, root_yml_dir_abs, tools)

Generates a CWL CommandLineTool for each python_script: tag, mutably adds them to tools, and updates the call sites in yaml_tree.

Parameters:
  • yaml_tree_tuple (YamlTree) – A tuple of a name and a yml AST

  • root_yml_dir_abs (Path) – The absolute path to the directory containing the root workflow yml file

  • tools (Tools) – The CWL CommandLineTool definitions found using get_tools_cwl()

Returns:

The yml AST with all python_script tags replaced with references to the auto-generated CWL.

Return type:

YamlTree

wic.cli

wic.cli.get_args(yaml_path='', suppliedargs=[])

This is used to get mock command line arguments, default + suppled args

Returns:

The mocked command line arguments

Return type:

argparse.Namespace

wic.compiler

wic.compiler.compile_workflow(yaml_tree_ast, args, namespaces, subgraphs_, explicit_edge_defs, explicit_edge_calls, input_mapping, output_mapping, tools, is_root, relative_run_path, testing)

fixed-point wrapper around compile_workflow_once

See https://en.wikipedia.org/wiki/Fixed_point_(mathematics)

Parameters:
  • yaml_tree_ast (YamlTree) – A tuple of name and yml AST

  • args (Any) – all of the other positional arguments for compile_workflow_once

  • kwargs (Any) – all of the other keyword arguments for compile_workflow_once

Returns:

Contains the data associated with compiled subworkflows

(in the Rose Tree) together with mutable cumulative environment

information which needs to be passed through the recursion.

Return type:

CompilerInfo

wic.compiler.compile_workflow_once(yaml_tree_ast, args, namespaces, subgraphs, explicit_edge_defs, explicit_edge_calls, input_mapping, output_mapping, tools, is_root, relative_run_path, testing)

STOP: Have you read the Developer’s Guide?? docs/devguide.md

Recursively compiles yml workflow definition ASTs to CWL file contents

Parameters:
  • yaml_tree_ast (YamlTree) – A tuple of name and yml AST

  • args (argparse.Namespace) – The command line arguments

  • namespaces (Namespaces) – Specifies the path in the yml AST to the current subworkflow

  • subgraphs (List[Graph]) – The graphs associated with the parent workflows of the current subworkflow

  • explicit_edge_defs (ExplicitEdgeDefs) – Stores the (path, value) of the explicit edge definition sites

  • explicit_edge_calls (ExplicitEdgeCalls) – Stores the (path, value) of the explicit edge call sites

  • tools (Tools) – The CWL CommandLineTool definitions found using get_tools_cwl().

  • compilation. (yml files that have been compiled to CWL SubWorkflows are also added during) –

  • is_root (bool) – True if this is the root workflow

  • relative_run_path (bool) – Controls whether to use subdirectories or

  • disk (just one directory when writing the compiled CWL files to) –

  • testing (bool) – Used to disable some optional features which are unnecessary for testing.

Raises:

Exception – If any errors occur

Returns:

Contains the data associated with compiled subworkflows

(in the Rose Tree) together with mutable cumulative environment

information which needs to be passed through the recursion.

Return type:

CompilerInfo

wic.compiler.insert_step_into_workflow(yaml_tree_orig, stepid, tools, i)

Inserts the step with given stepid into a workflow at the given index.

Parameters:
  • yaml_tree_orig (Yaml) – The original Yaml tree

  • stepid (StepId) – The name of the workflow step to be inserted.

  • tools (Tools) – The CWL CommandLineTool definitions found using get_tools_cwl().

  • compilation. (yml files that have been compiled to CWL SubWorkflows are also added during) –

  • i (int) – The index to insert the new workflow step

Returns:

A modified Yaml tree with the given stepid inserted at index i

Return type:

Yaml

wic.cwl_subinterpreter

wic.inference

wic.inference.perform_edge_inference(args, tools, tools_lst, steps_keys, yaml_stem, i, steps, arg_key, graph, is_root, namespaces, vars_workflow_output_internal, input_mapping, output_mapping, inputs_workflow, in_name, in_name_in_inputs_file_workflow, arg_key_in_yaml_tree_inputs, insertions, wic_steps, testing)

This function implements the core edge inference feature. NOTE: steps[i], vars_workflow_output_internal, inputs_workflow are mutably updated.

Parameters:
  • args (argparse.Namespace) – The command line arguments

  • tools (Tools) – The CWL CommandLineTool definitions found using get_tools_cwl()

  • tools_lst (List[Tool]) – A list of the CWL CommandLineTools or compiled subworkflows for the current workflow.

  • steps_keys (List[str]) – The name of each step in the current CWL workflow

  • yaml_stem (str) – The name (filename without extension) of the current CWL workflow

  • i (int) – The (zero-based) step number w.r.t. the current subworkflow.

  • outputs (Since we are trying to infer inputs from previous) –

  • not (this will) –

  • inference (perform any) –

  • steps (List[Yaml]) – The steps: tag of the current CWL workflow

  • arg_key (str) – The name of the CWL input tag that needs a concrete input value inferred

  • graph (GraphReps) – A tuple of a GraphViz DiGraph and a networkx DiGraph

  • is_root (bool) – True if this is the root workflow (for debugging only)

  • namespaces (Namespaces) – Specifies the path in the AST of the current subworkflow

  • vars_workflow_output_internal (InternalOutputs) – Keeps track of output

  • workflow (variables which are internal to the root) –

  • subworkflows. (but not necessarily to) –

  • input_mapping (Dict[str, List[str]]) – Maps workflow inputs to workflow step inputs, recursively namespaced.

  • output_mapping (Dict[str, str]) – Maps workflow outputs to workflow step outputs, recursively namespaced.

  • inputs_workflow (WorkflowInputs) – Keeps track of CWL inputs: variables for the current workflow.

  • in_name (str) – The input name

  • in_name_in_inputs_file_workflow (bool) – Used to determine whether

  • error. (failure to find a match should be considered an) –

  • arg_key_in_yaml_tree_inputs (bool) – Determines whether at least one level of recursion has been performed.

  • insertions (List[StepId]) – If exact inference fails, a list of possible steps to automatically insert is stored here.

  • wic_steps (Yaml) – The metadata associated with the given workflow.

  • testing (bool) – Used to disable some optional features which are unnecessary for testing.

Returns:

steps[i] with the input tag arg_key updated with an inferred input value.

Return type:

Yaml

wic.inference.get_inference_rules(wic, step_key_parent)

Recursively traverses the wic: metadata annotation AST and extracts any inference rules.

See docs/userguide.md for more information.

Parameters:
  • wic (Yaml) – The contents of the wic: metadata annotations tag (if any)

  • step_key_parent (str) – The name of one of the steps in the current workflow.

Returns:

A dictionary of the inference rules for the workflow step named step_key_parent.

Return type:

Dict[str, str]

wic.inlineing

wic.inlineing.get_inlineable_subworkflows(yaml_tree_tuple, tools, implementation=False, namespaces_init=[])

Traverses a yml AST and finds all subworkflows which can be inlined into their parent workflow.

Parameters:
  • yaml_tree_tuple (YamlTree) – A tuple of name and yml AST

  • tools (Tools) – The CWL CommandLineTool definitions found using get_tools_cwl()

  • implementation (bool) – True if the immediate parent workflow is a implementation.

  • namespaces_init (Namespaces) – The initial subworkflow to start the traversal ([] == root)

Returns:

The subworkflows which can be inlined into their parent workflows.

Return type:

List[Namespaces]

wic.inlineing.inline_subworkflow(yaml_tree_tuple, namespaces)

Inlines the given subworkflow into its immediate parent workflow.

Parameters:
  • yaml_tree_tuple (YamlTree) – A tuple of name and yml AST

  • namespaces (Namespaces) – Specifies the path in the yml AST to the subworkflow to be inlined.

Returns:

The updated root workflow with the given subworkflow inlined into its immediate parent workflow.

Return type:

YamlTree

wic.inlineing.apply_args(sub_yml_tree, sub_parentargs)
Return type:

Dict[str, Any]

wic.inlineing.inline_subworkflow_wic_tag(wic_tag, namespaces, len_substeps)

Inlines the wic metadata tags associated with the given subworkflow into its immediate parent wic.

Parameters:
  • wic_tag (Yaml) – The wicmetadata tag associated with the given workflow

  • namespaces (Namespaces) – Specifies the path in the yml AST to the subworkflow to be inlined.

  • len_substeps (int) – The number of steps in the subworkflow to be inlined.

Returns:

The updated wic metadata tag with the wic metadata tag associated with the given subworkflow inlined.

Return type:

Yaml

wic.inlineing.move_slash_last(source_new)

Move / to the last ___ position

(Moving to the last position works because we are inlineing recursively.)

Parameters:

source_new (str) – A string representing a CWL dependency, i.e. containing /

Returns:

source_new with / moved to the last ___ position

Return type:

str

wic.inlineing.inline_subworkflow_cwl(rose_tree)

Inlines all compiled CWL subworkflows into the root workflow.

Parameters:

rose_tree (RoseTree) – The data associated with compiled subworkflows

Returns:

The updated root workflow with all compiled CWL subworkflows recursively inlined.

Return type:

RoseTree

wic.input_output

wic.input_output.read_lines_pairs(filename)

Reads a whitespace-delimited file containing two paired entries per line (i.e. a serialized Dict).

Parameters:

filename (Path) – The full path of the file to be read.

Raises:

Exception – If any non-blank, non-comment lines do not contain exactly two entries.

Returns:

The file contents, with blank lines and comments removed.

Return type:

List[Tuple[str, str]]

class wic.input_output.NoAliasDumper(stream, default_style=None, default_flow_style=False, canonical=None, indent=None, width=None, allow_unicode=None, line_break=None, encoding=None, explicit_start=None, explicit_end=None, version=None, tags=None, sort_keys=True)
ignore_aliases(data)
Return type:

bool

wic.input_output.write_to_disk(rose_tree, path, relative_run_path)

Writes the compiled CWL files and their associated yml inputs files to disk.

NOTE: Only the yml input file associated with the root workflow is guaranteed to have all inputs. In other words, subworkflows will all have valid CWL files, but may not be executable due to ‘missing’ inputs.

Parameters:
  • rose_tree (RoseTree) – The data associated with compiled subworkflows

  • path (Path) – The directory in which to write the files

  • relative_run_path (bool) – Controls whether to use subdirectories or just one directory.

Return type:

None

wic.input_output.write_config_to_disk(config, config_file)

Writes config json object to config_file

Parameters:
  • config (Json) – The json object that is to be written to disk

  • config_file (Path) – The file path where it is to be written

Return type:

None

wic.input_output.get_config(config_file, default_config_file)

Returns the config json object from config_file with absolute paths

Parameters:
  • config_file (Path) – The path of the user specified config file

  • default_config_file (Path) – The default path of the config file if user hasn’t specified one

Returns:

The config json object with absolute filepaths

Return type:

Json

wic.input_output.read_config_from_disk(config_file)

Returns the config json object from config_file with absolute paths

Parameters:

config_file (Path) – The path of json file where it is to be read from

Returns:

The config json object with absolute filepaths

Return type:

Json

wic.input_output.get_default_config()

Returns the default config with absolute paths

Returns:

The config json object with absolute filepaths

Return type:

Json

wic.input_output.get_absolute_paths(sub_config)

Makes the paths within the dirs_file file absolute and write them into sub_config object.

Parameters:

sub_config (dict) – The json (sub)object where filepaths are stored

Returns:

The json (sub)object with absolute filepaths

Return type:

Json

wic.input_output.write_absolute_yaml_tags(args, in_dict_in, namespaces, step_name_i, explicit_edge_calls_copy)

cwl_subinterpreter requires all paths to be absolute.

Parameters:
  • args (argparse.Namespace) – The command line arguments

  • in_dict_in (Yaml) – The in: subtag of a cwl_subinterpreter: tag. (Mutates in_dict_in)

  • namespaces (Namespaces) – Specifies the path in the yml AST to the current subworkflow

  • step_name_i (str) – The name of the current workflow step

  • explicit_edge_calls_copy (ExplicitEdgeCalls) – Stores the (path, value) of the explicit edge call sites

Return type:

None

wic.labshare

wic.labshare.delete_previously_uploaded(args, plugins_or_pipelines, name)

Delete plugins/pipelines previously uploaded to labshare.

Parameters:
  • args (argparse.Namespace) – The command line arguments

  • name (str) – ‘plugins’ or ‘pipelines’

Return type:

None

wic.labshare.remove_dot_dollar(tree)

Removes . and $ from dictionary keys, e.g. $namespaces and $schemas. Otherwise, you will get {‘error’: {‘statusCode’: 500, ‘message’: ‘Internal Server Error’}} This is due to MongoDB: See https://www.mongodb.com/docs/manual/reference/limits/#Restrictions-on-Field-Names

Parameters:

tree (Cwl) – A Cwl document

Returns:

A Cwl document with . and $ removed from $namespaces and $schemas

Return type:

Cwl

wic.labshare.pretty_print_request(request)

pretty prints a requests.PreparedRequest

Parameters:

request (requests.PreparedRequest) – The request to be printed

Return type:

None

wic.labshare.upload_plugin(compute_url, access_token, tool, name)

Uploads CWL CommandLineTools to Polus Compute

Parameters:
  • compute_url (str) – The url to the Compute API

  • access_token (str) – The access token used for authentication

  • tool (Cwl) – The CWL CommandLineTool

  • name (str) – The name of the CWL CommandLineTool

Raises:

Exception – If the upload failed for any reason

Returns:

The unique id of the plugin

Return type:

str

wic.labshare.print_plugins(compute_url)

prints information on all currently available Compute plugins

Parameters:

compute_url (str) – The url to the Compute API

Return type:

None

wic.labshare.upload_all(rose_tree, tools, args, is_root)

Uploads all Plugins, Pipelines, and the root Workflow to the Compute platform

Parameters:
  • rose_tree (RoseTree) – The data associated with compiled subworkflows

  • tools (Tools) – The CWL CommandLineTool definitions found using get_tools_cwl()

  • args (argparse.Namespace) – The command line arguments

  • is_root (bool) – True if this is the root workflow

Raises:

Exception – If any of the uploads fails for any reason

Returns:

The unique id of the workflow

Return type:

str

wic.main

wic.plugins

wic.python_cwl_adapter

wic.python_cwl_adapter.import_python_file(python_module_name, python_file_path)

This function import a python file directly, as per the documentation

https://docs.python.org/3/library/importlib.html#importing-a-source-file-directly

Parameters:
  • python_module_name (str) – The name of the python module

  • python_file_path (Path) – The path to the python file.

Returns:

The module that was loaded.

Return type:

ModuleType

wic.python_cwl_adapter.get_main_args(module_)

Uses inspect to get the arguments to the main() function of the given module.

Parameters:

module (ModuleType) – A ModuleType object returned from import_python_file

Returns:

A dictionary of keys value pairs

Return type:

Dict[str, Any]

wic.python_cwl_adapter.check_args_match_inputs(module_, args, check=False)

Checks that the keys (only) of the args dict match the keys of the top-level inputs attribute.

Parameters:
  • module (ModuleType) – A ModuleType object returned from import_python_file

  • args (Dict[str, Any]) – A dictionary of keys value pairs

Return type:

None

wic.python_cwl_adapter.generate_CWL_CommandLineTool(module_inputs, module_outputs, python_script_docker_pull='')

Generates a CWL CommandLineTool for an arbitrary (annotated) python script.

Parameters:
  • module_inputs (Dict[str, Any]) – The top-level inputs attribute of the python module.

  • module_outputs (Dict[str, Any]) – The top-level inputs attribute of the python module.

  • python_script_docker_pull (str) – The username/image to use with docker pull …

Returns:

A CWL CommandLineTool with the given inputs and outputs.

Return type:

Dict[str, Any]

wic.python_cwl_adapter.get_module(python_script_mod, python_script_path, yml_args)

Imports the given python script and validates its top-level annotations.

Parameters:
  • python_script_mod (str) – The module name of the given python script.

  • python_script_path (Path) – The path to the given python script.

  • yml_args (Dict[str, Any]) – The contents of the python_script in: yml tag.

Returns:

The Module object associated with the given python script.

Return type:

ModuleType

wic.python_cwl_adapter.get_inputs_workflow(module_inputs, python_script_path, yml_args)

This generates the contents of the inputs file associated with generate_CWL_CommandLineTool

Note that this is already taken care of in the compiler, but this function

is useful for standalone purposes. (Alternatively, just make a single-step workflow.)

Parameters:
  • module_inputs (Dict[str, Any]) – The top-level inputs attribute of the python module.

  • python_script_path (str) – The path to the given python script.

  • yml_args (Dict[str, Any]) – The contents of the python_script in: yml tag.

Returns:

The contents of the CWL inputs file.

Return type:

Dict[str, Any]

wic.run_local

wic.schemas.wic_schema

wic.schemas.wic_schema.default_schema(url=False)

A basic default schema (to avoid copy & paste).

Parameters:

url (bool, optional) – Determines whether to include the $schema url. Defaults to False.

Returns:

A basic default schema

Return type:

Json

wic.schemas.wic_schema.named_empty_schema(name)

Creates a schema which starts with name, but is otherwise an empty wildcard

Parameters:

name (str) – The identifier of the string

Returns:

A schema which matches anything starting with name

Return type:

Json

wic.schemas.wic_schema.named_null_schema(name)

Creates a schema which starts with name and contains nothing else

Parameters:

name (str) – The identifier of the string

Returns:

A schema which matches name and nothing else

Return type:

Json

wic.schemas.wic_schema.cwl_type_to_jsonschema_type_schema(type_obj)

Converts a canonicalized CWL type into the equivalent jsonschema type schema, if possible.

Parameters:

type_obj (Json) – A canonical CWL type object

Returns:

A JSON type schema corresponding to type_obj if valid else None

Return type:

Json

wic.schemas.wic_schema.cwl_type_to_jsonschema_type(type_obj)

Converts a canonicalized CWL type into the equivalent jsonschema type schema, if possible.

Parameters:

type_obj (Json) – A canonical CWL type object

Returns:

A JSON type schema corresponding to type_obj if valid else None

Return type:

Json

wic.schemas.wic_schema.cwl_schema(name, cwl, id_prefix)

Generates a schema (including documentation) based on the inputs of a CWL CommandLineTool or Workflow.

Parameters:
  • name (str) – The name of the CWL CommandLineTool or Workflow

  • cwl (Json) – The CWL CommandLineTool or Workflow

  • id_prefix (str) – Either the string ‘tools’ or ‘workflows’

Returns:

An autogenerated, documented schema based on the inputs and outputs of a CWL CommandLineTool or Workflow.

Return type:

Json

wic.schemas.wic_schema.wic_tag_schema(hypothesis=False)

The schema of the (recursive) wic: metadata annotation tag.

Parameters:

hypothesis (bool) – Determines whether we should restrict the search space.

Returns:

The schema of the (recursive) wic: metadata annotation tag.

Return type:

Json

wic.schemas.wic_schema.wic_main_schema(tools_cwl, yml_stems, schema_store, hypothesis=False)

The main schema which is used to validate yml files.

Parameters:
  • tools_cwl (Tools) – The CWL CommandLineTool definitions found using get_tools_cwl()

  • yml_stems (List[str]) – The names of the yml workflow definitions found using get_yml_paths()

  • schema_store (Dict[str, Json]) – A global mapping between ids and schemas

  • hypothesis (bool) – Determines whether we should restrict the search space.

Returns:

The main schema which is used to validate yml files.

Return type:

Json

wic.schemas.wic_schema.compile_workflow_generate_schema(homedir, yml_path_str, yml_path, tools_cwl, yml_paths, validator, ignore_validation_errors)

Compiles a workflow and generates a schema which (recursively) includes the inputs/outputs from subworkflows.

Parameters:
  • homedir (str) – The users home directory

  • yml_path_str (str) – The stem of the path to the yml file

  • yml_path (Path) – The path to the yml file

  • tools_cwl (Tools) – The CWL CommandLineTool definitions found using get_tools_cwl()

  • yml_paths (Dict[str, Dict[str, Path]]) – The yml workflow definitions found using get_yml_paths()

  • validator (Draft202012Validator) – Used to validate the yml files against the autogenerated schema.

  • ignore_validation_errors (bool) – Temporarily ignore validation errors. Do not use this permanently!

Returns:

An autogenerated, documented schema based on the inputs and outputs of the Workflow.

Return type:

Json

wic.schemas.wic_schema.get_validator(tools_cwl, yml_stems, schema_store={}, write_to_disk=False, hypothesis=False)

Generates the main schema used to check the yml files for correctness and returns a validator.

Parameters:
  • tools_cwl (Tools) – The CWL CommandLineTool definitions found using get_tools_cwl()

  • yml_stems (List[str]) – The names of the yml workflow definitions found using get_yml_paths()

  • schema_store (Dict[str, Json]) – A global mapping between ids and schemas

  • write_to_disk (bool) – Controls whether to write the schemas to disk.

  • hypothesis (bool) – Determines whether we should restrict the search space.

Returns:

A validator which is used to check the yml files for correctness.

Return type:

Draft202012Validator

wic.utils

wic.utils.step_name_str(yaml_stem, i, step_key)

Returns a string which uniquely and hierarchically identifies a step in a workflow

Parameters:
  • yaml_stem (str) – The name of the workflow (filepath stem)

  • i (int) – The (zero-based) step number

  • step_key (str) – The name of the step (used as a dict key)

Returns:

The parameters (and the word ‘step’) joined together with double underscores

Return type:

str

wic.utils.parse_step_name_str(step_name)

The inverse function to step_name_str()

Parameters:

step_name (str) – A string of the same form as returned by step_name_str()

Raises:

Exception – If the argument is not of the same form as returned by step_name_str()

Returns:

The parameters used to create step_name

Return type:

Tuple[str, int, str]

wic.utils.shorten_namespaced_output_name(namespaced_output_name, sep=' ')

Removes the intentionally redundant yaml_stem prefixes from the list of step_name_str’s embedded in namespaced_output_name which allows each step_name_str to be context-free and unique. This is potentially dangerous, and the only purpose is so we can slightly shorten the output filenames.

Parameters:
  • namespaced_output_name (str) – A string of the form:

  • '___'.join (namespaces + [step_name_i, out_key]) –

  • sep (str) – The separator used to construct the shortened step name strings.

Returns:

the first yaml_stem, so this function can be inverted, and namespaced_output_name, with the embedded yaml_stem prefixes removed and double underscores replaced with a single space.

Return type:

Tuple[str, str]

wic.utils.restore_namespaced_output_name(yaml_stem_init, shortened_output_name, sep=None)

The inverse function to shorten_namespaced_output_name()

Parameters:
  • yaml_stem_init (str) – The initial yaml_stem prefix

  • shortened_output_name (str) – The shortened namespaced_output_name

  • sep (Optional[str], optional) – The separator used for shortening. Defaults to None.

Raises:

Exception – If the argument is not of the same form as returned by shorten_namespaced_output_name

Returns:

The original namespaced_output_name before shortening.

Return type:

str

wic.utils.partition_by_lowest_common_ancestor(nss1, nss2)

See https://en.wikipedia.org/wiki/Lowest_common_ancestor

Parameters:
  • nss1 (Namespaces) – The namespaces associated with the first node

  • nss2 (Namespaces) – The namespaces associated with the second node

Returns:

nss1, partitioned by lowest common ancestor

Return type:

Tuple[Namespaces, Namespaces]

wic.utils.get_steps_keys(steps)

Returns the name (dict key) of each step in the given CWL workflow

Parameters:

steps (List[Yaml]) – The steps: tag of a CWL workflow

Returns:

The name of each step in the given CWL workflow

Return type:

List[str]

wic.utils.get_subkeys(steps_keys, tools_stems)

This function determines which step keys are associated with subworkflows.

This is critical for the control flow in many areas of the compiler.

Parameters:
  • steps_keys (List[str]) – All of the step keys for the current workflow.

  • tools_stems (List[str]) – All of the step keys associated with CommandLineTools.

Returns:

The list of step keys associated with subworkflows of the current workflow.

Return type:

List[str]

wic.utils.extract_implementation(yaml_tree, wic, yaml_path)

Chooses a specific implementation for a given CWL workflow step.

The implementations should be thought of as either ‘exactly’ identical, or at least the same high-level protocol but implemented with a different algorithm.

Parameters:
  • yaml_tree (Yaml) – A Yaml AST dict with sub-dicts for each implementation.

  • yaml_path (Path) – The filepath of yaml_tree, only used for error reporting.

Raises:

Exception – If the steps: and/or implementation: tags are not present.

Returns:

The Yaml AST dict of the chosen implementation.

Return type:

Tuple[str, Yaml]

wic.utils.flatten(lists)

Concatenates a list of lists into a single list.

Parameters:

lists (List[List[Any]]) – A list of lists

Returns:

A single list

Return type:

List[Any]

wic.utils.flatten_rose_tree(rose_tree)

Flattens the data contained in the Rose Tree into a List

Parameters:

rose_tree (RoseTree) – A Rose Tree

Returns:

The list of data associated with each node in the RoseTree

Return type:

List[Any]

wic.utils.pretty_print_forest(forest)

pretty prints a YamlForest

Parameters:

forest (YamlForest) – The forest to be printed

Return type:

None

wic.utils.flatten_forest(forest)

Flattens the sub-trees encountered while traversing an AST

Parameters:

forest (YamlForest) – The yaml AST forest to be flattened

Raises:

Exception – If implementation: tags are missing.

Returns:

The flattened forest

Return type:

List[YamlForest]

wic.utils.recursively_delete_dict_key(key, obj)

Recursively deletes any dict entries with the given key.

Parameters:
  • key (str) – The key to be deleted

  • obj (Any) – The object from which to delete key.

Returns:

The original dict with the given key recursively deleted.

Return type:

Any

wic.utils.recursively_contains_dict_key(key, obj)

Recursively checks whether obj contains entries with the given key.

Parameters:
  • key (str) – The key to be checked

  • obj (Any) – The object from which to check the key.

Returns:

True if key is found, else False.

Return type:

bool

wic.utils.parse_int_string_tuple(string)

Parses a string of the form ‘(int, string)’

Parameters:

string (str) – A string with the above encoding

Returns:

The parsed result

Return type:

Tuple[int, str]

wic.utils.reindex_wic_steps(wic_steps, index, num_steps=1)

Increment 1-based step index starting from the step with the given index by num_steps.

This function can be used to reindex steps after inserting num_steps at the given index: in

the wic: metadata annotations tag whose index (before insertion) is >= the given index.

Parameters:
  • wic_steps (Yaml) – The steps: subtag of the wic: metadata annotations tag.

  • index (int) – The (one-based) start index that needs to be reindexed.

  • num_steps (int) – The number of steps inserted.

Returns:

The updated wic: steps: tag, with the appropriate indices incremented.

Return type:

Yaml

wic.utils.get_step_name_1(step_1_names, yaml_stem, namespaces, steps_keys, subkeys)

Finds the name of the first step in the current subworkflow. If the first step is itself subworkflow, the call site recurses until it finds a node. This is necessary because ranksame in GraphViz can only be applied to individual nodes, not cluster_subgraphs.

Parameters:
  • step_1_names (List[str]) – The list of potential first node names

  • yaml_stem (str) – The name of the current subworkflow (stem of the yaml filepath)

  • namespaces (Namespaces) – Specifies the path in the AST of the current subworkflow

  • steps_keys (List[str]) – The name of each step in the current CWL workflow

  • subkeys (List[str]) – The keys associated with subworkflows

Returns:

The name of the first step

Return type:

str

wic.utils.parse_provenance_output_files(output_json)

Parses the primary workflow provenance JSON object.

Parameters:

output_json (Json) – The JSON results object, containing the metadata for all output files.

Returns:

A List of (location, parentdirs, basename) for each output file.

Return type:

List[Tuple[str, str, str]]

wic.utils.parse_provenance_output_files_(obj, parentdirs)

Parses the primary workflow provenance JSON object.

Parameters:
  • obj (Any) – The provenance object or one of its recursive sub-objects.

  • parentdirs (str) – The directory associated with obj.

Returns:

A List of (location, parentdirs, basename) for each output file.

Return type:

List[Tuple[str, str, str]]

wic.utils.get_input_mappings(input_mapping, arg_keys, arg_key_in_yaml_tree_inputs)

Gets all of the workflow step inputs / call sites that are mapped from the given workflow inputs.

Parameters:
  • input_mapping (Dict[str, List[str]]) – Maps workflow inputs to workflow step inputs, recursively namespaced.

  • arg_keys (List[str]) – A (singleton) list of root workflow inputs.

  • arg_key_in_yaml_tree_inputs (bool) – Determines whether at least one level of recursion has been performed.

Returns:

A list of the workflow step inputs / call sites, recursively namespaced.

Return type:

List[str]

wic.utils.get_output_mapping(output_mapping, out_key)

Gets the workflow step output / return location that is mapped to the given workflow output.

Parameters:
  • output_mapping (Dict[str, str]) – Maps workflow outputs to workflow step outputs, recursively namespaced.

  • out_key (str) – The root workflow output.

Returns:

The workflow step output / return location, recursively namespaced.

Return type:

str

wic.utils_cwl

wic.utils_cwl.maybe_add_requirements(yaml_tree, tools, steps_keys, wic_steps, subkeys)

Adds any necessary CWL requirements

Parameters:
  • yaml_tree (Yaml) – A tuple of name and yml AST

  • tools (Tools) – The CWL CommandLineTool definitions found using get_tools_cwl()

  • steps_keys (List[str]) – The name of each step in the current CWL workflow

  • wic_steps (Yaml) – The metadata associated with the workflow steps

  • subkeys (List[str]) – The keys associated with subworkflows

Return type:

None

wic.utils_cwl.add_yamldict_keyval_in(steps_i, step_key, keyval)

Convenience function used to (mutably) merge two Yaml dicts.

Parameters:
  • steps_i (Yaml) – A partially-completed Yaml dict representing a step in a CWL workflow

  • step_key (str) – The name of the step in a CWL workflow

  • keyval (Yaml) – A Yaml dict with additional details to be merged into the first Yaml dict

Returns:

The first Yaml dict with the second Yaml dict merged into it.

Return type:

Yaml

wic.utils_cwl.add_yamldict_keyval_out(steps_i, step_key, strs)

Convenience function used to (mutably) merge two Yaml dicts.

Parameters:
  • steps_i (Yaml) – A partially-completed Yaml dict representing a step in a CWL workflow

  • step_key (str) – The name of the step in a CWL workflow

  • keyval (Yaml) – A Yaml dict with additional details to be merged into the first Yaml dict

Returns:

The first Yaml dict with the second Yaml dict merged into it.

Return type:

Yaml

wic.utils_cwl.get_workflow_outputs(args, namespaces, is_root, yaml_stem, steps, outputs_workflow, vars_workflow_output_internal, graph, tools_lst, step_node_name)

Chooses a subset of the CWL outputs: to actually output

Parameters:
  • args (argparse.Namespace) – The command line arguments

  • namespaces (Namespaces) – Specifies the path in the AST of the current subworkflow

  • is_root (bool) – True if this is the root workflow

  • yaml_stem (str) – The name of the current subworkflow (stem of the yaml filepath)

  • steps (List[Yaml]) – The steps: tag of a CWL workflow

  • outputs_workflow (WorkflowOutputs) – Contains the contents of the out: tags for each step.

  • vars_workflow_output_internal (InternalOutputs) – Keeps track of output

  • workflow (variables which are internal to the root) –

  • subworkflows. (but not necessarily to) –

  • graph (GraphReps) – A tuple of a GraphViz DiGraph and a networkx DiGraph

  • tools_lst (List[Tool]) – A list of the CWL CommandLineTools or compiled subworkflows for the current workflow.

  • step_node_name (str) – The namespaced name of the current step

Returns:

The actual outputs to be specified in the generated CWL file

Return type:

Dict[str, Dict[str, str]]

wic.utils_cwl.canonicalize_type(type_obj)

Recursively desugars the CWL type: field into a canonical normal form.

In particular, CWL automatically desugars File[] into {‘type’: ‘array’, ‘items’: File}, but File[][] causes a syntax error! Etc.

Parameters:

type_obj (Any) – An object that is a syntactic hodgepodge of valid CWL types.

Returns:

The JSON canonical normal form associated with type_obj

Return type:

Any

wic.utils_cwl.copy_cwl_input_output_dict(io_dict, remove_qmark=False)

Copies the type, format, label, and doc entries. Does NOT copy inputBinding and outputBinding.

Parameters:
  • io_dict (Dict) – A dictionary

  • remove_qmark (bool) – Determines whether to remove question marks and thus make optional types required

Returns:

A copy of the dictionary.

Return type:

Dict

wic.utils_graphs

wic.utils_graphs.add_graph_edge(args, graph, nss1, nss2, label, color='')

Adds edges to (all of) our graph representations, with the ability to collapse all nodes below a given depth to a single node.

This function utilizes the fact that nodes have been carefully designed to have unique, hierarchical names. If we want to hide all of the details below a given depth, we can simply truncate each of the namespaces! (and do the same when creating the nodes)

Parameters:
  • args (argparse.Namespace) – The command line arguments

  • graph (GraphReps) – A tuple of a GraphViz DiGraph and a networkx DiGraph

  • nss1 (Namespaces) – The namespaces associated with the first node

  • nss2 (Namespaces) – The namespaces associated with the second node

  • label (str) – The edge label

  • color (str, optional) – The edge color

Return type:

None

wic.utils_graphs.flatten_graphdata(graphdata, parent='')

Flattens graphdata by recursively inlineing all subgraphs.

Parameters:
  • graphdata (GraphData) – A data structure which contains recursive subgraphs and other metadata.

  • parent (str, optional) – The name of the parent graph is encoded into the node attributes so that

  • flattening. (the subgraph information can be preserved after) –

Returns:

A GraphDath instance with all of the recursive instances inlined

Return type:

GraphData

wic.utils_graphs.graphdata_to_cytoscape(graphdata)

Converts a flattened graph into cytoscape json format.

Parameters:

graphdata (GraphData) – A flattened GraphData instance

Returns:

A Json object compatible with cytoscape.

Return type:

Json

wic.utils_graphs.make_tool_dag(tool_stem, tool, graph_dark_theme)

Uses the dot executable from the graphviz package to make a Directed Acyclic Graph corresponding to the given CWL CommandLineTool

Parameters:
  • tool_stem (str) – The name of the Tool

  • tool (Tool) – The CWL ComandLineTool

  • graph_dark_theme (bool) – See args.graph_dark_theme

Return type:

None

wic.utils_graphs.make_plugins_dag(tools, graph_dark_theme)

Uses the neato executable from the graphviz package to make a Directed Acyclic Graph consisting of a node for each CWL CommandLineTool and no edges.

Parameters:
  • tools (Tools) – The CWL CommandLineTool definitions found using get_tools_cwl()

  • graph_dark_theme (bool) – See args.graph_dark_theme

Return type:

None

wic.utils_graphs.add_subgraphs(args, graph, sibling_subgraphs, namespaces, step_1_names, steps_ranksame)

Add all subgraphs to the current graph, except for GraphViz subgraphs below a given depth, which allows us to hide irrelevant details.

Parameters:
  • args (argparse.Namespace) – The command line arguments

  • graph (GraphReps) – A tuple of a GraphViz DiGraph and a networkx DiGraph

  • sibling_subgraphs (List[Graph]) – The subgraphs of the immediate children of the current workflow

  • namespaces (Namespaces) – Specifies the path in the AST of the current subworkflow

  • step_1_names (List[str]) – The names of the first step

  • steps_ranksame (List[str]) – Additional node names to be aligned using ranksame

Return type:

None

wic.utils_graphs.get_graph_reps(name)

Initialize graph representations

Parameters:

name (str) – The name of the graph

Returns:

A tuple of graph representations

Return type:

GraphReps

wic.wic_types

class wic.wic_types.Tool(run_path, cwl)
run_path: str

Alias for field number 0

cwl: Dict[str, Any]

Alias for field number 1

class wic.wic_types.StepId(stem, plugin_ns)
stem: str

Alias for field number 0

plugin_ns: str

Alias for field number 1

class wic.wic_types.GraphData(name, nodes=[], edges=[], subgraphs=[], ranksame=[])
__init__(name, nodes=[], edges=[], subgraphs=[], ranksame=[])
class wic.wic_types.GraphReps(graphviz, networkx, graphdata)
graphviz: Any

Alias for field number 0

networkx: DiGraph

Alias for field number 1

graphdata: GraphData

Alias for field number 2

class wic.wic_types.RoseTree(data, sub_trees)
data: Any

Alias for field number 0

sub_trees: List[Any]

Alias for field number 1

class wic.wic_types.NodeData(namespaces, name, yml, compiled_cwl, tool, workflow_inputs_file, explicit_edge_defs, explicit_edge_calls, graph, inputs_workflow, step_name_1)
namespaces: List[str]

Alias for field number 0

name: str

Alias for field number 1

yml: Dict[str, Any]

Alias for field number 2

compiled_cwl: Dict[str, Any]

Alias for field number 3

tool: Tool

Alias for field number 4

workflow_inputs_file: Dict[str, Dict[str, str]]

Alias for field number 5

explicit_edge_defs: Dict[str, Tuple[List[str], str]]

Alias for field number 6

explicit_edge_calls: Dict[str, Tuple[List[str], str]]

Alias for field number 7

graph: GraphReps

Alias for field number 8

inputs_workflow: Dict[str, Dict[str, str]]

Alias for field number 9

step_name_1: str

Alias for field number 10

class wic.wic_types.EnvData(input_mapping, output_mapping, inputs_file_workflow, vars_workflow_output_internal, explicit_edge_defs, explicit_edge_calls)
input_mapping: Dict[str, List[str]]

Alias for field number 0

output_mapping: Dict[str, str]

Alias for field number 1

inputs_file_workflow: Dict[str, Dict[str, str]]

Alias for field number 2

vars_workflow_output_internal: List[str]

Alias for field number 3

explicit_edge_defs: Dict[str, Tuple[List[str], str]]

Alias for field number 4

explicit_edge_calls: Dict[str, Tuple[List[str], str]]

Alias for field number 5

class wic.wic_types.CompilerInfo(rose, env)
rose: RoseTree

Alias for field number 0

env: EnvData

Alias for field number 1

class wic.wic_types.YamlTree(step_id, yml)
step_id: StepId

Alias for field number 0

yml: Dict[str, Any]

Alias for field number 1

class wic.wic_types.YamlForest(yaml_tree, sub_forests)
yaml_tree: YamlTree

Alias for field number 0

sub_forests: List[Tuple[StepId, Any]]

Alias for field number 1