User-defined transformers

Simple Ad-hoc Transformers

Transformers are the core feature of OntoWeaver. They are essentially a function object that ingest an atomic piece of data (e.g. a row of a table), transform it, and output a set of nodes and edges.

On top of the transformers provided by OntoWeaver, users can implement their own. This is especially useful if what you need to do with the data requires nested loops or if/then constructs, which would be less readable and maintainable if they were to be implemented with a declarative language. OntoWeaver takes good care of not providing transformers that would need a complex YAML configuration, and favors user-defined Python implementation for such cases.

To implement a transformer, you have several options, from the simplest ad-hoc function-object that will work only on your data (see the next sections), to the most generic functor that can work on any data (see below).

Simplest User-Defined Transformer

In essence, a transformer is an object that can be called like a function. At construction, it can declare the types of nodes and edges it uses, and it is called as a generator of nodes and edges.

To use such an ad-hoc transformer, you would put its name in the mapping, just like any regular one:

row:
    map: # A regular transformer.
        column: variant
        to_subject: variant
tranformers:
    - adhoc: # Our user-defined transformer.
        to_object: patient
        via_relation: patient_has_variant
        # We don't need more options, as every other thing is hard-coded.

Its minimal implementation is straightforward:

class adhoc(ontoweaver.base.Transformer):

    def __init__(self, **kwargs):
        super().__init__(**kwargs)

        # Declares two type of nodes.
        self.declare_types.make_node_class("patient")
        self.declare_types.make_node_class("variant")

        # Declare one type of edge:
        self.declare_types.make_edge_class("patient_has_variant", "patient", "variant")

    def __call__(self, row, i):
        # Extract only one (patient)--[patient_has_variant]->(variant) edge.
        # The `create` function takes care of the underlying machinery.
        # Note how we don't use the `column` option but hard-code the
        # "patient" column.
        yield self.create(row["patient"], row)

To make a user-defined transformer available from the mapping file, don’t forget to register it.

When calling from the ontoweave command, pass the path to your module file with the --register argument, for example:

ontoweave data.csv:mapping.yaml --register my_transformer.py

If you use your own weaving script, use ontoweave.transformer.register, for example:

import my_transformer
ontoweaver.transformer.register(my_transformer.adhoc)

Complete User-defined Transformer

This section shows a slightly more complex ad-hoc transformer, handling explicit options, properties and a more complex code. However, the architecture is essentially the same.

from ontoweaver import transformer, validate
from ontoweaver import types as owtypes

class MyTransformer(base.Transformer):
    """Custom end-user transformer."""

    def __init__(self, properties_of, value_maker = None, label_maker = None, branching_properties = None, columns=None, output_validator: validate.OutputValidator = None, multi_type_dict = None, raise_errors = True, **kwargs):

        super().__init__(properties_of, value_maker, label_maker, branching_properties, columns, output_validator,
                         multi_type_dict, raise_errors=raise_errors, **kwargs)

        # First declare all node and edge classes needed for your mapping. The declaration is done by using the
        # `declare_types` member variable, which is an instance of the ``ontoweaver.base.Declare`` class. Node classes are
        # declared by using the `` self.declare_types.make_node_class`` function. We first declare the name of the
        # possible source and target node classes (``my_source_node_class``, ``my_target_node_class``, ``another_node_class"``).
        # Then we extract the properties of those node classes from the `branching_properties` member variable, which is a dictionary
        # containing all the properties defined in the mapping file for each node and edge class (``self.branching_properties.get("my_source_node_class", {})``).

        self.declare_types.make_node_class("my_source_node_class", self.branching_properties.get("my_source_node_class", {}))
        self.declare_types.make_node_class("my_target_node_class", self.branching_properties.get("my_target_node_class", {}))
        self.declare_types.make_node_class("another_node_class", self.branching_properties.get("another_node_class", {}))

        # Edge classes are declared by using the `` self.declare_types.make_edge_class`` function. Again, we declare the
        # name of the edge class (``my_edge_class``) and the source and target node classes it connects. These are
        # retrieved by using the ``getattr`` function on the ``types`` module, which contains all the declared types in the ontology, as
        # well as the node classes we just declared above (``getattr(owtypes, "my_source_node_class")``) .
        # Finally, we extract the properties of the edge class from the ``branching_properties`` member variable
        # (``self.branching_properties.get("my_edge_class", {})``)

        self.declare_types.make_edge_class("my_edge_class", getattr(owtypes, "my_source_node_class"), getattr(owtypes, "my_target_node_class"), self.branching_properties.get("my_edge_class", {}))


    def __call__(self, row, i):

        # Initialize final type and properties_of member variables to ``None`` for each row processed. This is beacuase
        # the final type and properties may change depending on the values extracted from the current row.

        self.final_type = None
        self.properties_of = None

        # Extract branching information from the current row, as well as node ID. We branch based on the values of the
        # ``type`` and ``entity_type_target`` columns.

        node_id = row["target"]
        relationship_type = row["type"]
        entity = row["entity_type_target"]

        # Create branching logic and return correct elements. Elements are returned by using the ``yield`` statement,
        # which yields a tuple containing the node ID, edge type, target node type, and reverse edge type (if any).
        # At each step we can additionally set the ``final_type`` (See ``How to`` section for more details on ``final_type``) and
        # ``properties_of`` member variables, which will be used to extract properties for the current node.

        if relationship_type == "my_relationship_type":
            if entity == "my_entity_type":
                self.final_type = # Possible to set final type if feature is needed.
                self.properties_of = self.branching_properties.get("my_target_node_class", {})
                # Here, we don't use `create` but we are explicit.
                yield node_id, self.declare_types.get_edge_class("my_edge_class"), self.declare_types.get_node_class("my_target_node_class"), None

            else:  ...

        else: ...

An example of such a transformer-like function is provided in tests/custom_transformer/custom.py.

Once your transformer class is implemented, you should make it available to the ontoweaver module which will process the mapping:

ontoweaver.transformer.register(my_transformer)