Developer API ¶

wic.ast.merge_yml_trees(yaml_tree_tuple, wic_parent, tools)¶

Implements ‘parameter passing’ by recursively merging wic: yml tags. Values from the parent workflow will overwrite / override subworkflows. See https://github.com/PolusAI/mm-workflows/blob/main/examples/gromacs/basic.wic for details

Parameters:

yaml_tree_tuple (YamlTree) – A tuple of a name and a yml AST
wic_parent (Yaml) – The wic: yml dict from the parent workflow
tools (Tools) – The CWL CommandLineTool definitions found using get_tools_cwl()

Raises:

Exception – If a wic: tag is found as an argument to a CWL CommandLineTool

Returns:

The yml AST with all wic: tags recursively merged.

Return type:

wic.ast.tree_to_forest(yaml_tree_tuple, tools)¶

The purpose of this function is to abstract away the process of traversing an AST.

Parameters:

yaml_tree_tuple (YamlTree) – A tuple of name and yml AST
tools (Tools) – The CWL CommandLineTool definitions found using get_tools_cwl()

Returns:

A recursive data structure containing all sub-trees encountered while traversing the yml AST.

Return type:

YamlForest

wic.ast.python_script_generate_cwl(yaml_tree_tuple, root_yml_dir_abs, tools)¶

Generates a CWL CommandLineTool for each python_script: tag, mutably adds them to tools, and updates the call sites in yaml_tree.

Parameters:

yaml_tree_tuple (YamlTree) – A tuple of a name and a yml AST
root_yml_dir_abs (Path) – The absolute path to the directory containing the root workflow yml file
tools (Tools) – The CWL CommandLineTool definitions found using get_tools_cwl()

Returns:

The yml AST with all python_script tags replaced with references to the auto-generated CWL.

Return type:

See https://en.wikipedia.org/wiki/Fixed_point_(mathematics)

wic.cli¶

wic.cli.get_args(yaml_path='', suppliedargs=[])¶

This is used to get mock command line arguments, default + suppled args

Returns:: The mocked command line arguments
Return type:: argparse.Namespace

wic.compiler¶

wic.compiler.compile_workflow(yaml_tree_ast, args, namespaces, subgraphs_, explicit_edge_defs, explicit_edge_calls, input_mapping, output_mapping, tools, is_root, relative_run_path, testing)¶

fixed-point wrapper around compile_workflow_once

Parameters:

yaml_tree_ast (YamlTree) – A tuple of name and yml AST
args (Any) – all of the other positional arguments for compile_workflow_once
kwargs (Any) – all of the other keyword arguments for compile_workflow_once

Returns:

Contains the data associated with compiled subworkflows

(in the Rose Tree) together with mutable cumulative environment

information which needs to be passed through the recursion.

Return type:

CompilerInfo

wic.compiler.compile_workflow_once(yaml_tree_ast, args, namespaces, subgraphs, explicit_edge_defs, explicit_edge_calls, input_mapping, output_mapping, tools, is_root, relative_run_path, testing)¶

STOP: Have you read the Developer’s Guide?? docs/devguide.md

Recursively compiles yml workflow definition ASTs to CWL file contents

Parameters:

yaml_tree_ast (YamlTree) – A tuple of name and yml AST
args (argparse.Namespace) – The command line arguments
namespaces (Namespaces) – Specifies the path in the yml AST to the current subworkflow
subgraphs (List[Graph]) – The graphs associated with the parent workflows of the current subworkflow
explicit_edge_defs (ExplicitEdgeDefs) – Stores the (path, value) of the explicit edge definition sites
explicit_edge_calls (ExplicitEdgeCalls) – Stores the (path, value) of the explicit edge call sites
tools (Tools) – The CWL CommandLineTool definitions found using get_tools_cwl().
compilation. (yml files that have been compiled to CWL SubWorkflows are also added during) –
is_root (bool) – True if this is the root workflow
relative_run_path (bool) – Controls whether to use subdirectories or
disk (just one directory when writing the compiled CWL files to) –
testing (bool) – Used to disable some optional features which are unnecessary for testing.

Raises:

Exception – If any errors occur

Returns:

Contains the data associated with compiled subworkflows

(in the Rose Tree) together with mutable cumulative environment

information which needs to be passed through the recursion.

Return type:

CompilerInfo

wic.compiler.insert_step_into_workflow(yaml_tree_orig, stepid, tools, i)¶

Inserts the step with given stepid into a workflow at the given index.

Parameters:

yaml_tree_orig (Yaml) – The original Yaml tree
stepid (StepId) – The name of the workflow step to be inserted.
tools (Tools) – The CWL CommandLineTool definitions found using get_tools_cwl().
compilation. (yml files that have been compiled to CWL SubWorkflows are also added during) –
i (int) – The index to insert the new workflow step

Returns:

A modified Yaml tree with the given stepid inserted at index i

Return type:

Yaml

wic.cwl_subinterpreter¶

wic.inference¶

wic.inference.perform_edge_inference(args, tools, tools_lst, steps_keys, yaml_stem, i, steps, arg_key, graph, is_root, namespaces, vars_workflow_output_internal, input_mapping, output_mapping, inputs_workflow, in_name, in_name_in_inputs_file_workflow, arg_key_in_yaml_tree_inputs, insertions, wic_steps, testing)¶

This function implements the core edge inference feature. NOTE: steps[i], vars_workflow_output_internal, inputs_workflow are mutably updated.

Parameters:

args (argparse.Namespace) – The command line arguments
tools (Tools) – The CWL CommandLineTool definitions found using get_tools_cwl()
tools_lst (List[Tool]) – A list of the CWL CommandLineTools or compiled subworkflows for the current workflow.
steps_keys (List[str]) – The name of each step in the current CWL workflow
yaml_stem (str) – The name (filename without extension) of the current CWL workflow
i (int) – The (zero-based) step number w.r.t. the current subworkflow.
outputs (Since we are trying to infer inputs from previous) –
not (this will) –
inference (perform any) –
steps (List[Yaml]) – The steps: tag of the current CWL workflow
arg_key (str) – The name of the CWL input tag that needs a concrete input value inferred
graph (GraphReps) – A tuple of a GraphViz DiGraph and a networkx DiGraph
is_root (bool) – True if this is the root workflow (for debugging only)
namespaces (Namespaces) – Specifies the path in the AST of the current subworkflow
vars_workflow_output_internal (InternalOutputs) – Keeps track of output
workflow (variables which are internal to the root) –
subworkflows. (but not necessarily to) –
input_mapping (Dict[str, List[str]]) – Maps workflow inputs to workflow step inputs, recursively namespaced.
output_mapping (Dict[str, str]) – Maps workflow outputs to workflow step outputs, recursively namespaced.
inputs_workflow (WorkflowInputs) – Keeps track of CWL inputs: variables for the current workflow.
in_name (str) – The input name
in_name_in_inputs_file_workflow (bool) – Used to determine whether
error. (failure to find a match should be considered an) –
arg_key_in_yaml_tree_inputs (bool) – Determines whether at least one level of recursion has been performed.
insertions (List[StepId]) – If exact inference fails, a list of possible steps to automatically insert is stored here.
wic_steps (Yaml) – The metadata associated with the given workflow.
testing (bool) – Used to disable some optional features which are unnecessary for testing.

Returns:

steps[i] with the input tag arg_key updated with an inferred input value.

Return type:

Yaml

wic.inference.get_inference_rules(wic, step_key_parent)¶

Recursively traverses the wic: metadata annotation AST and extracts any inference rules.

See docs/userguide.md for more information.

Parameters:

wic (Yaml) – The contents of the wic: metadata annotations tag (if any)
step_key_parent (str) – The name of one of the steps in the current workflow.

Returns:

A dictionary of the inference rules for the workflow step named step_key_parent.

Return type:

Dict[str, str]

wic.inlineing¶

wic.inlineing.get_inlineable_subworkflows(yaml_tree_tuple, tools, implementation=False, namespaces_init=[])¶

Traverses a yml AST and finds all subworkflows which can be inlined into their parent workflow.

Parameters:

yaml_tree_tuple (YamlTree) – A tuple of name and yml AST
tools (Tools) – The CWL CommandLineTool definitions found using get_tools_cwl()
implementation (bool) – True if the immediate parent workflow is a implementation.
namespaces_init (Namespaces) – The initial subworkflow to start the traversal ([] == root)

Returns:

The subworkflows which can be inlined into their parent workflows.

Return type:

List[Namespaces]

wic.inlineing.inline_subworkflow(yaml_tree_tuple, namespaces)¶

Inlines the given subworkflow into its immediate parent workflow.

Parameters:

yaml_tree_tuple (YamlTree) – A tuple of name and yml AST
namespaces (Namespaces) – Specifies the path in the yml AST to the subworkflow to be inlined.

Returns:

The updated root workflow with the given subworkflow inlined into its immediate parent workflow.

Return type:

wic.inlineing.apply_args(sub_yml_tree, sub_parentargs)¶

Return type:: Dict[str, Any]

wic.inlineing.inline_subworkflow_wic_tag(wic_tag, namespaces, len_substeps)¶

Inlines the wic metadata tags associated with the given subworkflow into its immediate parent wic.

Parameters:

wic_tag (Yaml) – The wicmetadata tag associated with the given workflow
namespaces (Namespaces) – Specifies the path in the yml AST to the subworkflow to be inlined.
len_substeps (int) – The number of steps in the subworkflow to be inlined.

Returns:

The updated wic metadata tag with the wic metadata tag associated with the given subworkflow inlined.

Return type:

Yaml

wic.inlineing.move_slash_last(source_new)¶

Move / to the last ___ position

(Moving to the last position works because we are inlineing recursively.)

Parameters:: source_new (str) – A string representing a CWL dependency, i.e. containing /
Returns:: source_new with / moved to the last ___ position
Return type:: str

wic.inlineing.inline_subworkflow_cwl(rose_tree)¶

Inlines all compiled CWL subworkflows into the root workflow.

Parameters:: rose_tree (RoseTree) – The data associated with compiled subworkflows
Returns:: The updated root workflow with all compiled CWL subworkflows recursively inlined.
Return type:: RoseTree

wic.input_output¶

wic.input_output.read_lines_pairs(filename)¶

Reads a whitespace-delimited file containing two paired entries per line (i.e. a serialized Dict).

Parameters:: filename (Path) – The full path of the file to be read.
Raises:: Exception – If any non-blank, non-comment lines do not contain exactly two entries.
Returns:: The file contents, with blank lines and comments removed.
Return type:: List[Tuple[str, str]]

class wic.input_output.NoAliasDumper(stream, default_style=None, default_flow_style=False, canonical=None, indent=None, width=None, allow_unicode=None, line_break=None, encoding=None, explicit_start=None, explicit_end=None, version=None, tags=None, sort_keys=True)¶

ignore_aliases(data)¶

Return type:: bool

wic.input_output.write_to_disk(rose_tree, path, relative_run_path)¶

Writes the compiled CWL files and their associated yml inputs files to disk.

NOTE: Only the yml input file associated with the root workflow is guaranteed to have all inputs. In other words, subworkflows will all have valid CWL files, but may not be executable due to ‘missing’ inputs.

Parameters:

rose_tree (RoseTree) – The data associated with compiled subworkflows
path (Path) – The directory in which to write the files
relative_run_path (bool) – Controls whether to use subdirectories or just one directory.

Return type:

wic.input_output.write_config_to_disk(config, config_file)¶

Writes config json object to config_file

Parameters:

config (Json) – The json object that is to be written to disk
config_file (Path) – The file path where it is to be written

Return type:

wic.input_output.get_config(config_file, default_config_file)¶

Returns the config json object from config_file with absolute paths

Parameters:

config_file (Path) – The path of the user specified config file
default_config_file (Path) – The default path of the config file if user hasn’t specified one

Returns:

The config json object with absolute filepaths

Return type:

Json

wic.input_output.read_config_from_disk(config_file)¶

Returns the config json object from config_file with absolute paths

Parameters:: config_file (Path) – The path of json file where it is to be read from
Returns:: The config json object with absolute filepaths
Return type:: Json

wic.input_output.get_default_config()¶

Returns the default config with absolute paths

Returns:: The config json object with absolute filepaths
Return type:: Json

wic.input_output.get_absolute_paths(sub_config)¶

Makes the paths within the dirs_file file absolute and write them into sub_config object.

Parameters:: sub_config (dict) – The json (sub)object where filepaths are stored
Returns:: The json (sub)object with absolute filepaths
Return type:: Json

wic.input_output.write_absolute_yaml_tags(args, in_dict_in, namespaces, step_name_i, explicit_edge_calls_copy)¶

cwl_subinterpreter requires all paths to be absolute.

Parameters:

args (argparse.Namespace) – The command line arguments
in_dict_in (Yaml) – The in: subtag of a cwl_subinterpreter: tag. (Mutates in_dict_in)
namespaces (Namespaces) – Specifies the path in the yml AST to the current subworkflow
step_name_i (str) – The name of the current workflow step
explicit_edge_calls_copy (ExplicitEdgeCalls) – Stores the (path, value) of the explicit edge call sites

Return type:

wic.labshare¶

wic.labshare.delete_previously_uploaded(args, plugins_or_pipelines, name)¶

Delete plugins/pipelines previously uploaded to labshare.

Parameters:

args (argparse.Namespace) – The command line arguments
name (str) – ‘plugins’ or ‘pipelines’

Return type:

wic.labshare.remove_dot_dollar(tree)¶

Removes . and $ from dictionary keys, e.g. $namespaces and $schemas. Otherwise, you will get {‘error’: {‘statusCode’: 500, ‘message’: ‘Internal Server Error’}} This is due to MongoDB: See https://www.mongodb.com/docs/manual/reference/limits/#Restrictions-on-Field-Names

Parameters:: tree (Cwl) – A Cwl document
Returns:: A Cwl document with . and $ removed from $namespaces and $schemas
Return type:: Cwl

wic.labshare.pretty_print_request(request)¶

pretty prints a requests.PreparedRequest

Parameters:: request (requests.PreparedRequest) – The request to be printed
Return type:: None

wic.labshare.upload_plugin(compute_url, access_token, tool, name)¶

Uploads CWL CommandLineTools to Polus Compute

Parameters:

compute_url (str) – The url to the Compute API
access_token (str) – The access token used for authentication
tool (Cwl) – The CWL CommandLineTool
name (str) – The name of the CWL CommandLineTool

Raises:

Exception – If the upload failed for any reason

Returns:

The unique id of the plugin

Return type:

wic.labshare.print_plugins(compute_url)¶

prints information on all currently available Compute plugins

Parameters:: compute_url (str) – The url to the Compute API
Return type:: None

wic.labshare.upload_all(rose_tree, tools, args, is_root)¶

Uploads all Plugins, Pipelines, and the root Workflow to the Compute platform

Parameters:

rose_tree (RoseTree) – The data associated with compiled subworkflows
tools (Tools) – The CWL CommandLineTool definitions found using get_tools_cwl()
args (argparse.Namespace) – The command line arguments
is_root (bool) – True if this is the root workflow

Raises:

Exception – If any of the uploads fails for any reason

Returns:

The unique id of the workflow

Return type:

https://docs.python.org/3/library/importlib.html#importing-a-source-file-directly

wic.main¶

wic.plugins¶

wic.python_cwl_adapter¶

wic.python_cwl_adapter.import_python_file(python_module_name, python_file_path)¶

This function import a python file directly, as per the documentation

Parameters:

python_module_name (str) – The name of the python module
python_file_path (Path) – The path to the python file.

Returns:

The module that was loaded.

Return type:

ModuleType

wic.python_cwl_adapter.get_main_args(module_)¶

Uses inspect to get the arguments to the main() function of the given module.

Parameters:: module (ModuleType) – A ModuleType object returned from import_python_file
Returns:: A dictionary of keys value pairs
Return type:: Dict[str, Any]

wic.python_cwl_adapter.check_args_match_inputs(module_, args, check=False)¶

Checks that the keys (only) of the args dict match the keys of the top-level inputs attribute.

Parameters:

module (ModuleType) – A ModuleType object returned from import_python_file
args (Dict[str, Any]) – A dictionary of keys value pairs

Return type:

wic.python_cwl_adapter.generate_CWL_CommandLineTool(module_inputs, module_outputs, python_script_docker_pull='')¶

Generates a CWL CommandLineTool for an arbitrary (annotated) python script.

Parameters:

module_inputs (Dict[str, Any]) – The top-level inputs attribute of the python module.
module_outputs (Dict[str, Any]) – The top-level inputs attribute of the python module.
python_script_docker_pull (str) – The username/image to use with docker pull …

Returns:

A CWL CommandLineTool with the given inputs and outputs.

Return type:

Dict[str, Any]

wic.python_cwl_adapter.get_module(python_script_mod, python_script_path, yml_args)¶

Imports the given python script and validates its top-level annotations.

Parameters:

python_script_mod (str) – The module name of the given python script.
python_script_path (Path) – The path to the given python script.
yml_args (Dict[str, Any]) – The contents of the python_script in: yml tag.

Returns:

The Module object associated with the given python script.

Return type:

ModuleType

wic.python_cwl_adapter.get_inputs_workflow(module_inputs, python_script_path, yml_args)¶

This generates the contents of the inputs file associated with generate_CWL_CommandLineTool

Note that this is already taken care of in the compiler, but this function

is useful for standalone purposes. (Alternatively, just make a single-step workflow.)

Parameters:

module_inputs (Dict[str, Any]) – The top-level inputs attribute of the python module.
python_script_path (str) – The path to the given python script.
yml_args (Dict[str, Any]) – The contents of the python_script in: yml tag.

Returns:

The contents of the CWL inputs file.

Return type:

Dict[str, Any]

wic.run_local¶

wic.schemas.wic_schema¶

wic.schemas.wic_schema.default_schema(url=False)¶

A basic default schema (to avoid copy & paste).

Parameters:: url (bool, optional) – Determines whether to include the $schema url. Defaults to False.
Returns:: A basic default schema
Return type:: Json

wic.schemas.wic_schema.named_empty_schema(name)¶

Creates a schema which starts with name, but is otherwise an empty wildcard

Parameters:: name (str) – The identifier of the string
Returns:: A schema which matches anything starting with name
Return type:: Json

wic.schemas.wic_schema.named_null_schema(name)¶

Creates a schema which starts with name and contains nothing else

Parameters:: name (str) – The identifier of the string
Returns:: A schema which matches name and nothing else
Return type:: Json

wic.schemas.wic_schema.cwl_type_to_jsonschema_type_schema(type_obj)¶

Converts a canonicalized CWL type into the equivalent jsonschema type schema, if possible.

Parameters:: type_obj (Json) – A canonical CWL type object
Returns:: A JSON type schema corresponding to type_obj if valid else None
Return type:: Json

wic.schemas.wic_schema.cwl_type_to_jsonschema_type(type_obj)¶

Converts a canonicalized CWL type into the equivalent jsonschema type schema, if possible.

Parameters:: type_obj (Json) – A canonical CWL type object
Returns:: A JSON type schema corresponding to type_obj if valid else None
Return type:: Json

wic.schemas.wic_schema.cwl_schema(name, cwl, id_prefix)¶

Generates a schema (including documentation) based on the inputs of a CWL CommandLineTool or Workflow.

Parameters:

name (str) – The name of the CWL CommandLineTool or Workflow
cwl (Json) – The CWL CommandLineTool or Workflow
id_prefix (str) – Either the string ‘tools’ or ‘workflows’

Returns:

An autogenerated, documented schema based on the inputs and outputs of a CWL CommandLineTool or Workflow.

Return type:

Json

wic.schemas.wic_schema.wic_tag_schema(hypothesis=False)¶

The schema of the (recursive) wic: metadata annotation tag.

Parameters:: hypothesis (bool) – Determines whether we should restrict the search space.
Returns:: The schema of the (recursive) wic: metadata annotation tag.
Return type:: Json

wic.schemas.wic_schema.wic_main_schema(tools_cwl, yml_stems, schema_store, hypothesis=False)¶

The main schema which is used to validate yml files.

Parameters:

tools_cwl (Tools) – The CWL CommandLineTool definitions found using get_tools_cwl()
yml_stems (List[str]) – The names of the yml workflow definitions found using get_yml_paths()
schema_store (Dict[str, Json]) – A global mapping between ids and schemas
hypothesis (bool) – Determines whether we should restrict the search space.

Returns:

The main schema which is used to validate yml files.

Return type:

Json

wic.schemas.wic_schema.compile_workflow_generate_schema(homedir, yml_path_str, yml_path, tools_cwl, yml_paths, validator, ignore_validation_errors)¶

Compiles a workflow and generates a schema which (recursively) includes the inputs/outputs from subworkflows.

Parameters:

homedir (str) – The users home directory
yml_path_str (str) – The stem of the path to the yml file
yml_path (Path) – The path to the yml file
tools_cwl (Tools) – The CWL CommandLineTool definitions found using get_tools_cwl()
yml_paths (Dict[str, Dict[str, Path]]) – The yml workflow definitions found using get_yml_paths()
validator (Draft202012Validator) – Used to validate the yml files against the autogenerated schema.
ignore_validation_errors (bool) – Temporarily ignore validation errors. Do not use this permanently!

Returns:

An autogenerated, documented schema based on the inputs and outputs of the Workflow.

Return type:

Json

wic.schemas.wic_schema.get_validator(tools_cwl, yml_stems, schema_store={}, write_to_disk=False, hypothesis=False)¶

Generates the main schema used to check the yml files for correctness and returns a validator.

Parameters:

tools_cwl (Tools) – The CWL CommandLineTool definitions found using get_tools_cwl()
yml_stems (List[str]) – The names of the yml workflow definitions found using get_yml_paths()
schema_store (Dict[str, Json]) – A global mapping between ids and schemas
write_to_disk (bool) – Controls whether to write the schemas to disk.
hypothesis (bool) – Determines whether we should restrict the search space.

Returns:

A validator which is used to check the yml files for correctness.

Return type:

Draft202012Validator

wic.utils¶

wic.utils.step_name_str(yaml_stem, i, step_key)¶

Returns a string which uniquely and hierarchically identifies a step in a workflow

Parameters:

yaml_stem (str) – The name of the workflow (filepath stem)
i (int) – The (zero-based) step number
step_key (str) – The name of the step (used as a dict key)

Returns:

The parameters (and the word ‘step’) joined together with double underscores

Return type:

wic.utils.parse_step_name_str(step_name)¶

The inverse function to step_name_str()

Parameters:: step_name (str) – A string of the same form as returned by step_name_str()
Raises:: Exception – If the argument is not of the same form as returned by step_name_str()
Returns:: The parameters used to create step_name
Return type:: Tuple[str, int, str]

wic.utils.shorten_namespaced_output_name(namespaced_output_name, sep=' ')¶

Removes the intentionally redundant yaml_stem prefixes from the list of step_name_str’s embedded in namespaced_output_name which allows each step_name_str to be context-free and unique. This is potentially dangerous, and the only purpose is so we can slightly shorten the output filenames.

Parameters:

namespaced_output_name (str) – A string of the form:
'___'.join (namespaces + [step_name_i, out_key]) –
sep (str) – The separator used to construct the shortened step name strings.

Returns:

the first yaml_stem, so this function can be inverted, and namespaced_output_name, with the embedded yaml_stem prefixes removed and double underscores replaced with a single space.

Return type:

Tuple[str, str]

wic.utils.restore_namespaced_output_name(yaml_stem_init, shortened_output_name, sep=None)¶

The inverse function to shorten_namespaced_output_name()

Parameters:

yaml_stem_init (str) – The initial yaml_stem prefix
shortened_output_name (str) – The shortened namespaced_output_name
sep (Optional[str], optional) – The separator used for shortening. Defaults to None.

Raises:

Exception – If the argument is not of the same form as returned by shorten_namespaced_output_name

Returns:

The original namespaced_output_name before shortening.

Return type:

See https://en.wikipedia.org/wiki/Lowest_common_ancestor

wic.utils.partition_by_lowest_common_ancestor(nss1, nss2)¶

Parameters:

nss1 (Namespaces) – The namespaces associated with the first node
nss2 (Namespaces) – The namespaces associated with the second node

Returns:

nss1, partitioned by lowest common ancestor

Return type:

Tuple[Namespaces, Namespaces]

wic.utils.get_steps_keys(steps)¶

Returns the name (dict key) of each step in the given CWL workflow

Parameters:: steps (List[Yaml]) – The steps: tag of a CWL workflow
Returns:: The name of each step in the given CWL workflow
Return type:: List[str]

wic.utils.get_subkeys(steps_keys, tools_stems)¶

This function determines which step keys are associated with subworkflows.

This is critical for the control flow in many areas of the compiler.

Parameters:

steps_keys (List[str]) – All of the step keys for the current workflow.
tools_stems (List[str]) – All of the step keys associated with CommandLineTools.

Returns:

The list of step keys associated with subworkflows of the current workflow.

Return type:

List[str]

wic.utils.extract_implementation(yaml_tree, wic, yaml_path)¶

Chooses a specific implementation for a given CWL workflow step.

The implementations should be thought of as either ‘exactly’ identical, or at least the same high-level protocol but implemented with a different algorithm.

Parameters:

yaml_tree (Yaml) – A Yaml AST dict with sub-dicts for each implementation.
yaml_path (Path) – The filepath of yaml_tree, only used for error reporting.

Raises:

Exception – If the steps: and/or implementation: tags are not present.

Returns:

The Yaml AST dict of the chosen implementation.

Return type:

Tuple[str, Yaml]

wic.utils.flatten(lists)¶

Concatenates a list of lists into a single list.

Parameters:: lists (List[List[Any]]) – A list of lists
Returns:: A single list
Return type:: List[Any]

wic.utils.flatten_rose_tree(rose_tree)¶

Flattens the data contained in the Rose Tree into a List

Parameters:: rose_tree (RoseTree) – A Rose Tree
Returns:: The list of data associated with each node in the RoseTree
Return type:: List[Any]

wic.utils.pretty_print_forest(forest)¶

pretty prints a YamlForest

Parameters:: forest (YamlForest) – The forest to be printed
Return type:: None

wic.utils.flatten_forest(forest)¶

Flattens the sub-trees encountered while traversing an AST

Parameters:: forest (YamlForest) – The yaml AST forest to be flattened
Raises:: Exception – If implementation: tags are missing.
Returns:: The flattened forest
Return type:: List[YamlForest]

wic.utils.recursively_delete_dict_key(key, obj)¶

Recursively deletes any dict entries with the given key.

Parameters:

key (str) – The key to be deleted
obj (Any) – The object from which to delete key.

Returns:

The original dict with the given key recursively deleted.

Return type:

Any

wic.utils.recursively_contains_dict_key(key, obj)¶

Recursively checks whether obj contains entries with the given key.

Parameters:

key (str) – The key to be checked
obj (Any) – The object from which to check the key.

Returns:

True if key is found, else False.

Return type:

bool

wic.utils.parse_int_string_tuple(string)¶

Parses a string of the form ‘(int, string)’

Parameters:: string (str) – A string with the above encoding
Returns:: The parsed result
Return type:: Tuple[int, str]

wic.utils.reindex_wic_steps(wic_steps, index, num_steps=1)¶

Increment 1-based step index starting from the step with the given index by num_steps.

This function can be used to reindex steps after inserting num_steps at the given index: in

the wic: metadata annotations tag whose index (before insertion) is >= the given index.

Parameters:

wic_steps (Yaml) – The steps: subtag of the wic: metadata annotations tag.
index (int) – The (one-based) start index that needs to be reindexed.
num_steps (int) – The number of steps inserted.

Returns:

The updated wic: steps: tag, with the appropriate indices incremented.

Return type:

Yaml

wic.utils.get_step_name_1(step_1_names, yaml_stem, namespaces, steps_keys, subkeys)¶

Finds the name of the first step in the current subworkflow. If the first step is itself subworkflow, the call site recurses until it finds a node. This is necessary because ranksame in GraphViz can only be applied to individual nodes, not cluster_subgraphs.

Parameters:

step_1_names (List[str]) – The list of potential first node names
yaml_stem (str) – The name of the current subworkflow (stem of the yaml filepath)
namespaces (Namespaces) – Specifies the path in the AST of the current subworkflow
steps_keys (List[str]) – The name of each step in the current CWL workflow
subkeys (List[str]) – The keys associated with subworkflows

Returns:

The name of the first step

Return type:

wic.utils.parse_provenance_output_files(output_json)¶

Parses the primary workflow provenance JSON object.

Parameters:: output_json (Json) – The JSON results object, containing the metadata for all output files.
Returns:: A List of (location, parentdirs, basename) for each output file.
Return type:: List[Tuple[str, str, str]]

wic.utils.parse_provenance_output_files_(obj, parentdirs)¶

Parses the primary workflow provenance JSON object.

Parameters:

obj (Any) – The provenance object or one of its recursive sub-objects.
parentdirs (str) – The directory associated with obj.

Returns:

A List of (location, parentdirs, basename) for each output file.

Return type:

List[Tuple[str, str, str]]

wic.utils.get_input_mappings(input_mapping, arg_keys, arg_key_in_yaml_tree_inputs)¶

Gets all of the workflow step inputs / call sites that are mapped from the given workflow inputs.

Parameters:

input_mapping (Dict[str, List[str]]) – Maps workflow inputs to workflow step inputs, recursively namespaced.
arg_keys (List[str]) – A (singleton) list of root workflow inputs.
arg_key_in_yaml_tree_inputs (bool) – Determines whether at least one level of recursion has been performed.

Returns:

A list of the workflow step inputs / call sites, recursively namespaced.

Return type:

List[str]

wic.utils.get_output_mapping(output_mapping, out_key)¶

Gets the workflow step output / return location that is mapped to the given workflow output.

Parameters:

output_mapping (Dict[str, str]) – Maps workflow outputs to workflow step outputs, recursively namespaced.
out_key (str) – The root workflow output.

Returns:

The workflow step output / return location, recursively namespaced.

Return type:

wic.utils_cwl¶

wic.utils_cwl.maybe_add_requirements(yaml_tree, tools, steps_keys, wic_steps, subkeys)¶

Adds any necessary CWL requirements

Parameters:

yaml_tree (Yaml) – A tuple of name and yml AST
tools (Tools) – The CWL CommandLineTool definitions found using get_tools_cwl()
steps_keys (List[str]) – The name of each step in the current CWL workflow
wic_steps (Yaml) – The metadata associated with the workflow steps
subkeys (List[str]) – The keys associated with subworkflows

Return type:

wic.utils_cwl.add_yamldict_keyval_in(steps_i, step_key, keyval)¶

Convenience function used to (mutably) merge two Yaml dicts.

Parameters:

steps_i (Yaml) – A partially-completed Yaml dict representing a step in a CWL workflow
step_key (str) – The name of the step in a CWL workflow
keyval (Yaml) – A Yaml dict with additional details to be merged into the first Yaml dict

Returns:

The first Yaml dict with the second Yaml dict merged into it.

Return type:

Yaml

wic.utils_cwl.add_yamldict_keyval_out(steps_i, step_key, strs)¶

Convenience function used to (mutably) merge two Yaml dicts.

Parameters:

steps_i (Yaml) – A partially-completed Yaml dict representing a step in a CWL workflow
step_key (str) – The name of the step in a CWL workflow
keyval (Yaml) – A Yaml dict with additional details to be merged into the first Yaml dict

Returns:

The first Yaml dict with the second Yaml dict merged into it.

Return type:

Yaml

wic.utils_cwl.get_workflow_outputs(args, namespaces, is_root, yaml_stem, steps, outputs_workflow, vars_workflow_output_internal, graph, tools_lst, step_node_name)¶

Chooses a subset of the CWL outputs: to actually output

Parameters:

args (argparse.Namespace) – The command line arguments
namespaces (Namespaces) – Specifies the path in the AST of the current subworkflow
is_root (bool) – True if this is the root workflow
yaml_stem (str) – The name of the current subworkflow (stem of the yaml filepath)
steps (List[Yaml]) – The steps: tag of a CWL workflow
outputs_workflow (WorkflowOutputs) – Contains the contents of the out: tags for each step.
vars_workflow_output_internal (InternalOutputs) – Keeps track of output
workflow (variables which are internal to the root) –
subworkflows. (but not necessarily to) –
graph (GraphReps) – A tuple of a GraphViz DiGraph and a networkx DiGraph
tools_lst (List[Tool]) – A list of the CWL CommandLineTools or compiled subworkflows for the current workflow.
step_node_name (str) – The namespaced name of the current step

Returns:

The actual outputs to be specified in the generated CWL file

Return type:

Dict[str, Dict[str, str]]

wic.utils_cwl.canonicalize_type(type_obj)¶

Recursively desugars the CWL type: field into a canonical normal form.

In particular, CWL automatically desugars File[] into {‘type’: ‘array’, ‘items’: File}, but File[][] causes a syntax error! Etc.

Parameters:: type_obj (Any) – An object that is a syntactic hodgepodge of valid CWL types.
Returns:: The JSON canonical normal form associated with type_obj
Return type:: Any

wic.utils_cwl.copy_cwl_input_output_dict(io_dict, remove_qmark=False)¶

Copies the type, format, label, and doc entries. Does NOT copy inputBinding and outputBinding.

Parameters:

io_dict (Dict) – A dictionary
remove_qmark (bool) – Determines whether to remove question marks and thus make optional types required

Returns:

A copy of the dictionary.

Return type:

Dict

wic.utils_graphs¶

wic.utils_graphs.add_graph_edge(args, graph, nss1, nss2, label, color='')¶

Adds edges to (all of) our graph representations, with the ability to collapse all nodes below a given depth to a single node.

This function utilizes the fact that nodes have been carefully designed to have unique, hierarchical names. If we want to hide all of the details below a given depth, we can simply truncate each of the namespaces! (and do the same when creating the nodes)

Parameters:

args (argparse.Namespace) – The command line arguments
graph (GraphReps) – A tuple of a GraphViz DiGraph and a networkx DiGraph
nss1 (Namespaces) – The namespaces associated with the first node
nss2 (Namespaces) – The namespaces associated with the second node
label (str) – The edge label
color (str, optional) – The edge color

Return type:

wic.utils_graphs.flatten_graphdata(graphdata, parent='')¶

Flattens graphdata by recursively inlineing all subgraphs.

Parameters:

graphdata (GraphData) – A data structure which contains recursive subgraphs and other metadata.
parent (str, optional) – The name of the parent graph is encoded into the node attributes so that
flattening. (the subgraph information can be preserved after) –

Returns:

A GraphDath instance with all of the recursive instances inlined

Return type:

GraphData

wic.utils_graphs.graphdata_to_cytoscape(graphdata)¶

Converts a flattened graph into cytoscape json format.

Parameters:: graphdata (GraphData) – A flattened GraphData instance
Returns:: A Json object compatible with cytoscape.
Return type:: Json

wic.utils_graphs.make_tool_dag(tool_stem, tool, graph_dark_theme)¶

Uses the dot executable from the graphviz package to make a Directed Acyclic Graph corresponding to the given CWL CommandLineTool

Parameters:

tool_stem (str) – The name of the Tool
tool (Tool) – The CWL ComandLineTool
graph_dark_theme (bool) – See args.graph_dark_theme

Return type:

wic.utils_graphs.make_plugins_dag(tools, graph_dark_theme)¶

Uses the neato executable from the graphviz package to make a Directed Acyclic Graph consisting of a node for each CWL CommandLineTool and no edges.

Parameters:

tools (Tools) – The CWL CommandLineTool definitions found using get_tools_cwl()
graph_dark_theme (bool) – See args.graph_dark_theme

Return type:

wic.utils_graphs.add_subgraphs(args, graph, sibling_subgraphs, namespaces, step_1_names, steps_ranksame)¶

Add all subgraphs to the current graph, except for GraphViz subgraphs below a given depth, which allows us to hide irrelevant details.

Parameters:

args (argparse.Namespace) – The command line arguments
graph (GraphReps) – A tuple of a GraphViz DiGraph and a networkx DiGraph
sibling_subgraphs (List[Graph]) – The subgraphs of the immediate children of the current workflow
namespaces (Namespaces) – Specifies the path in the AST of the current subworkflow
step_1_names (List[str]) – The names of the first step
steps_ranksame (List[str]) – Additional node names to be aligned using ranksame

Return type: