How To

How to Add Properties to Nodes and Edges

If you do not need to create a new node, but simply attach some data to an existing node, use the to_property predicate, for example:

row:
   rowIndex:
      to_subject: phenotype
transformers:
    - map:
        column: patient
        to_object: case
        via_relation: case_to_phenotype
    - map:
        column: age
        to_property: patient_age
        for_object: case

This will add a “patient_age” property to nodes of type “case”.

Note

Note that you can add the same property value to several property fields of several node types:

- map:
    column: age
    to_properties:
        - patient_age
        - age_patient
    for_object:
        - case
        - phenotype

Note

Note that the properties declared in the BioCypher schema_config.yaml must match the properties declared in the mapping configuration file. Furthermore, when declaring the properties in the schema configuration file, take care that the property must always be a string (str) type - in order to avoid errors when importing the data into the Neo4j graph database.

How to Extract Additional Edges

Edges can be extracted from the mapping configuration, by defining a from_subject and to_object in the mapping configuration, where the from_subject is the node type from which the edge will start, and the to_object is the node type to which the edge will end.

For example, for the sample dataset below:

id	patient	sample
0	patient1	sample1
1	patient2	sample2
2	patient3	sample3
3	patient4	sample4

Consider the following mapping configuration:

row:
    map:
        column: id
        to_subject: variant
transformers:
    - map:
          column: patient
          to_object: patient
          via_relation: patient_has_variant
    - map:
          column: sample
          to_object: sample
          via_relation: variant_in_sample

If the user would like to extract an additional edge from the node type patient to the node type sample, they would need to add the following section to the transformers in the mapping configuration:

- map:
    column: patient
    from_subject: sample
    to_object: patient
    via_relation: sample_to_patient

How to add the same metadata properties to all nodes and edges

Metadata can be added to nodes and edges by defining a metadata section in the mapping configuration. You can specify all the property keys and values that you wish to add to your nodes and edges in a metadata section. For example:

metadata:
        - name: oncokb
        - url: https://oncokb.org/
        - license: CC BY-NC 4.0
        - version: 0.1

The metadata defined in the metadata section will be added to all nodes and edges created during the mapping process.

How to add the column of origin as a property to all nodes

In addition to the user-defined metadata, a property field add_source_column_names_as is also available. It allows to indicate the column name in which the data was found, as a property to each node.

Note

Note that this is not added to edges, because they are not mapped from a column per se.

For example, if the label of a node is extracted from the “indication” column, and you indicate add_source_column_name_as: source_column, the node will have a property: source_column: indication.

This can be added to the metadata section as follows:

metadata:
    - name: oncokb
    - url: https://oncokb.org/
    - license: CC BY-NC 4.0
    - version: 0.1
    - add_source_column_names_as: sources

Now each of the nodes contains a property sources that contains the names of the source columns from which it was extracted. Be sure to include all the added node properties in the schema configuration file, to ensure that the properties are correctly added to the nodes.

How to create user-defined adapters

You may manually define your own adapter class, inheriting from the OntoWeaver’s class that manages tabular mappings.

For example:

class MYADAPTER(ontoweaver.tabular.PandasAdapter):

    def __init__(self,
        df: pd.DataFrame,
        config: dict,
        type_affix: Optional[ontoweaver.tabular.TypeAffixes] = ontoweaver.tabular.TypeAffixes.prefix,
        type_affix_sep: Optional[str] = "//",
    ):
        # Default mapping as a simple config.
        from . import types
        parser = ontoweaver.tabular.YamlParser(config, types)
        mapping = parser()

        super().__init__(
            df,
            *mapping,
        )

When manually defining adapter classes, be sure to define the affix type and separator you wish to use in the mapping. Unless otherwise defined, the affix type defaults to suffix, and the separator defaults to :. In the example above, the affix type is defined as prefix and the separator is defined as //. If you wish to define affix as none, you should use type_affix: Optional[ontoweaver.tabular.TypeAffixes] = ontoweaver.tabular.TypeAffixes.none, and if you wish to define affix type as suffix, use type_affix: Optional[ontoweaver.tabular.TypeAffixes] = ontoweaver.tabular.TypeAffixes.suffix.

How to access dynamic Node and Edge Types

OntoWeaver relies a lot on meta-programming, as it actually creates Python types while parsing the mapping configuration. By default, those classes are dynamically created into the ontoweaver.types module.

You may manually define your own types, derivating from ontoweaver.base.Node or ontoweaver.base.Edge.

The ontoweaver.types module automatically gathers the list of available types in the ontoweaver.types.all submodule. This allows accessing the list of node and edge types:

node_types  = types.all.nodes()
edge_types  = types.all.edges()

How to map properties on several nodes of the same type

In some cases there might be a need to filter properties of the same ontological type. For example, if you have a table of proteins defining sources and targets of interactions, and you want to have the uniProt IDs as a property of these nodes:

SOURCE	TARGET	UNIPROT_ID_SOURCE	UNIPROT_ID_TARGET
A	B	uniprot_id_A	uniprot_id_B
C	A	uniprot_id_C	uniprot_id_A

In a conventional way of mapping, you would map the SOURCE column to the node type protein and the TARGET column to the node type protein.

By default, OntoWeaver will attach properties to all nodes of the same type. The UNIPROT_ID_SOURCE and UNIPROT_ID_TARGET columns would hence be mapped as properties to the type protein.

However, you might want to map the properties of the protein nodes either on the source or the target, but not both. In this case you can use the final_type keyword in the mapping configuration. The final_type keyword allows you to define a final node type to which the node will be converted, at the very end of the mapping process.

In a nutshell: you map the source node to a temporary protein_source and map properties to it. You map the target node to a temporary protein_target and map properties to it. You also set the final_type: protein , so that, after having mapped all properties, OntoWeaver will change the node type from the temporary protein_source and protein_target to the final protein. Thus, you can attach different properties to different nodes of the same type.

For example:

row:
    map:
        column: SOURCE
        to_subject: protein_source # Temporary type.
        final_type: protein # The final type of the node.

transformers:
    - map:
        column: TARGET
        to_object: protein_target # Temporary type.
        via_relation: protein_protein_interaction
        final_type: protein # The final type of the node.

    # Properties of for the node type 'source'
    - map:
        column: UNIPROT_ID_SOURCE
        to_property: uniprot_id # Give name of the property.
        for_object: protein_source # Temporary node type to which the property will be linked.
    # Properties of for the node type 'target'
    - map:
        column: UNIPROT_ID_TARGET
        to_property: uniprot_id
        for_object: protein_target # Temporary node type to which the property will be linked.

Note

Notice how in this way, we avoid mapping the source properties to the target node types, and instead map then to the source node type. We also avoid mapping the target properties to the source node types, and instead map them to the target node type.

The mapping thus results in the creation of three nodes: A, B, and C, all having the type protein, and the property uniprot_id.

Note

Note that node A have now been instantiated twice, with different properties attached to each instance. However, the expected result would be to have a single instance, with all the properties combined. To solve this kind of issue, OntoWeaver provides a “reconciliation” feature, that can be called after the mapping, onto the list of nodes. For more information see the Information Fusion section.

An edge of type protein_protein_interaction, will be created from node A to node B, as well as from node C to node A.

How to Extract Reverse Relations For Declared Edges

Reverse relations can be extracted for each edge in a declarative manner. Let’s assume you have a mapping file mapping each row index to the node type disease, and each cell value from the patient column to the node type patient. The two nodes are connected via a relation disease_affects_patient, but you would also wish to indicate a reverse edge of type patient_has_disease.

This can be done by using the reverse_relation keyword, which extracts the reverse edge of the type you declared. You may consult the Keyword Synonyms section for more synonyms.

row:
   rowIndex:
      to_subject: disease
transformers:
    - map:
        column: patient
        to_object: patient
        via_relation: disease_affects_patient
        reverse_relation: patient_has_disease

How to Compose Multiple Transformers

Custom transformers (See the User-defined Transformers and User-defined Transformer-Like Functions sections) can be configured to compose multiple transformers together. This is useful when you want to apply a series of transformations to your data.

For example, let’s look again at the example provided in the User-defined Transformer-Like Functions section.

Below we declare a custom transformer MyTransformer, which branches based on the values of the type and entity_type_target columns.

What if, instead of simply returning the extracted node_id, edge_type, target_node_type, and reverse_relation, we wanted to apply a concatenation transformer (See cat) to the node_id before yielding it?

In this case we can instantiate a concatenation transformer inside our custom transformer, and apply it to the node_id with the desired columns which values are to be concatenated, and later yield the results of this concatenation.

The example below follows the exact logic of the MyTransformer class we created in the User-defined Transformer-Like Functions section, but instantiates a concatenation transformer in the __init__ method. This cat transformer is then called in the __call__ method to concatenate the values of the desired columns before yielding the results.

from ontoweaver import transformer, validate
from ontoweaver import types as owtypes

class MyTransformer(base.Transformer):
    """Custom end-user transformer."""

    def __init__(self, properties_of, value_maker = None, label_maker = None, branching_properties = None, columns=None, output_validator: validate.OutputValidator = None, multi_type_dict = None, raise_errors = True, **kwargs):

        super().__init__(properties_of, value_maker, label_maker, branching_properties, columns, output_validator,
                         multi_type_dict, raise_errors=raise_errors, **kwargs)

        # First declare all node and edge classes needed for your mapping. The declaration is done by using the
        # `declare_types` member variable, which is an instance of the ``ontoweaver.base.Declare`` class. Node classes are
        # declared by using the `` self.declare_types.make_node_class`` function. We first declare the name of the
        # possible source and target node classes (``my_source_node_class``, ``my_target_node_class``, ``another_node_class"``).
        # Then we extract the properties of those node classes from the `branching_properties` member variable, which is a dictionary
        # containing all the properties defined in the mapping file for each node and edge class (``self.branching_properties.get("my_source_node_class", {})``).

        self.declare_types.make_node_class("my_source_node_class", self.branching_properties.get("my_source_node_class", {}))
        self.declare_types.make_node_class("my_target_node_class", self.branching_properties.get("my_target_node_class", {}))
        self.declare_types.make_node_class("another_node_class", self.branching_properties.get("another_node_class", {}))

        # Edge classes are declared by using the `` self.declare_types.make_edge_class`` function. Again, we declare the
        # name of the edge class (``my_edge_class``) and the source and target node classes it connects. These are
        # retrieved by using the ``getattr`` function on the ``types`` module, which contains all the declared types in the ontology, as
        # well as the node classes we just declared above (``getattr(owtypes, "my_source_node_class")``) .
        # Finally, we extract the properties of the edge class from the ``branching_properties`` member variable
        # (``self.branching_properties.get("my_edge_class", {})``)

        self.declare_types.make_edge_class("my_edge_class", getattr(owtypes, "my_source_node_class"), getattr(owtypes, "my_target_node_class"), self.branching_properties.get("my_edge_class", {}))

        # We instantiate a cat transformer to concatenate the columns ``column1`` and ``column2``. We pass the properties of the
        # target node class ``my_target_node_class``. We also define a multi_type_dict to indicate the possible types of the target node class.
        # We instantiate a ``multi_type_dict`` which holds the information about possible branching needed for the
        # types created by the transformer. Since the branching logic is already handled by our custom made transformer, we only need to
        # define a single entry in the ``multi_type_dict``. In this case we define a single entry for the key ``None``
        # (indicating no branching is needed), along with the corresponding ``to_object``, ``via_relation``, ``final_type``, and ``reverse_relation``
        # values. Finally, we use a ``SimpleLabelMaker`` to create labels for the concatenated nodes.

        self.cat = transformer.cat(columns=["target", "entity_type_target"],
                                    properties_of=self.branching_properties.get("my_target_node_class", {}),
                                    multi_type_dict={"None" : {"to_object": getattr(owtypes, "my_target_node_class"),
                                                               "via_relation" : getattr(owtypes, "my_edge_class"),
                                                               "final_type": None,
                                                               "reverse_relation": None}},
                                    label_maker=make_labels.SimpleLabelMaker())

    def __call__(self, row, i):

        # Initialize final type and properties_of member variables to ``None`` for each row processed. This is beacuase
        # the final type and properties may change depending on the values extracted from the current row.

        self.final_type = None
        self.properties_of = None

        # Extract branching information from the current row, as well as node ID. We branch based on the values of the
        # ``type`` and ``entity_type_target`` columns.

        node_id = row["target"]
        relationship_type = row["type"]
        entity = row["entity_type_target"]

        # Create branching logic and return correct elements. Elements are returned by using the ``yield`` statement,
        # which yields a tuple containing the node ID, edge type, target node type, and reverse edge type (if any).
        # At each step we can additionally set the ``final_type`` (See ``How to`` section for more details on ``final_type``) and
        # ``properties_of`` member variables, which will be used to extract properties for the current node.

        if relationship_type == "my_relationship_type":
            if entity == "my_entity_type":
                self.properties_of = self.branching_properties.get("my_target_node_class", {})
                # We call the ``cat`` transformer to concatenate the desired columns before yielding the results.
                for node_id, edge_type, target_type, reverse_relation in self.cat(row, i):
                    yield node_id, edge_type, target_type, reverse_relation

            else:  ...

        else: ...

How to Declare Properties On-the-Fly

Similarly as with the composition of transformers and their declaration on-the-fly within custom transformers, property transformers can also be declared on-the-fly within custom transformers.

For example, let’s use the same custom transformer MyTransformer from the previous sections (See the User-defined Transformers and User-defined Transformer-Like Functions sections) .

Let’s say you have two columns in your database called property_key and property_value, and you want to map these columns as properties to the target node type.

You can declare a property transformer on-the-fly within your custom transformer, in the properties_of member variable.

from ontoweaver import transformer, validate
from ontoweaver import types as owtypes

class MyTransformer(base.Transformer):
    """Custom end-user transformer."""

    def __init__(self, properties_of, value_maker = None, label_maker = None, branching_properties = None, columns=None, output_validator: validate.OutputValidator = None, multi_type_dict = None, raise_errors = True, **kwargs):

        super().__init__(properties_of, value_maker, label_maker, branching_properties, columns, output_validator,
                         multi_type_dict, raise_errors=raise_errors, **kwargs)

        # First declare all node and edge classes needed for your mapping. The declaration is done by using the
        # `declare_types` member variable, which is an instance of the ``ontoweaver.base.Declare`` class. Node classes are
        # declared by using the `` self.declare_types.make_node_class`` function. We first declare the name of the
        # possible source and target node classes (``my_source_node_class``, ``my_target_node_class``, ``another_node_class"``).
        # Then we extract the properties of those node classes from the `branching_properties` member variable, which is a dictionary
        # containing all the properties defined in the mapping file for each node and edge class (``self.branching_properties.get("my_source_node_class", {})``).

        self.declare_types.make_node_class("my_source_node_class", self.branching_properties.get("my_source_node_class", {}))
        self.declare_types.make_node_class("my_target_node_class", self.branching_properties.get("my_target_node_class", {}))
        self.declare_types.make_node_class("another_node_class", self.branching_properties.get("another_node_class", {}))

        # Edge classes are declared by using the `` self.declare_types.make_edge_class`` function. Again, we declare the
        # name of the edge class (``my_edge_class``) and the source and target node classes it connects. These are
        # retrieved by using the ``getattr`` function on the ``types`` module, which contains all the declared types in the ontology, as
        # well as the node classes we just declared above (``getattr(owtypes, "my_source_node_class")``) .
        # Finally, we extract the properties of the edge class from the ``branching_properties`` member variable
        # (``self.branching_properties.get("my_edge_class", {})``)

        self.declare_types.make_edge_class("my_edge_class", getattr(owtypes, "my_source_node_class"), getattr(owtypes, "my_target_node_class"), self.branching_properties.get("my_edge_class", {}))

    def __call__(self, row, i):

        # Initialize final type and properties_of member variables to ``None`` for each row processed. This is beacuase
        # the final type and properties may change depending on the values extracted from the current row.

        self.final_type = None
        self.properties_of = None

        # Extract branching information from the current row, as well as node ID. We branch based on the values of the
        # ``type`` and ``entity_type_target`` columns.

        node_id = row["target"]
        relationship_type = row["type"]
        entity = row["entity_type_target"]

        # Here we extract the value of the property key column.
        property_key = row["property_key"]

        # Create branching logic and return correct elements. Elements are returned by using the ``yield`` statement,
        # which yields a tuple containing the node ID, edge type, target node type, and reverse edge type (if any).
        # At each step we can additionally set the ``final_type`` (See ``How to`` section for more details on ``final_type``) and
        # ``properties_of`` member variables, which will be used to extract properties for the current node.

        if relationship_type == "my_relationship_type":
            if entity == "my_entity_type":
                self.final_type = # Possible to set final type if feature is needed.
                # We then declare a property transformer on-the-fly within the ``properties_of`` member variable.
                # setting ``property_key`` as the property name, and the  ``property_value`` column as the property value
                # to be extracted.
                self.properties_of = {transformer.map(columns="property_value", properties_of=None, label_maker=make_labels.SimpleLabelMaker()): property_key}
                yield node_id, getattr(owtypes, "my_edge_class"), getattr(owtypes, "my_target_node_class"), None

            else:  ...

        else: ...

In case of using this feature, remember to include all the dynamically created properties in the schema configuration file of BioCypher.

How to load multiple Parquet files?

In some specific cases, you may want to load several data files at once, and merge them in a single table before mapping the data. For instance, “parquet” files often come as a set of files.

To do so, you can use the “globbing” syntax that you may know from your command line shell.

For instance, if you want to select all the files ending with the .parquet extension in the my_dir directory:

ontoweave 'my_dir/*.parquet:my_mapping.yaml'

Warning

You have to encompass the globbing file syntax with quotes, or else your shell is going to expand it for you in a list of files, which is not supported by ontoweave. However, OntoWeaver is going to expand the common syntax by itself. You can thus safely use the common globbing syntax.

How to access several keys in nested dictionaries?

The get transformer allows you to access a value located in nested key-stores. But it can only access one value.

If you want to access several different keys in the same cell, then you will have to call the get transformer again, with the same first key, but with different sequence of keys.

For instance, if you have this data table:

LINE	WORDS
0	{“en”: “good”, “fr”: “ça va”}
1	{“en”: “awesome”, “fr”: “pas mal”}

Then, you will want to access first the column named “WORDS”, and the key named “en” in the nested JSON object.

To do so with get, you need to indicate the sequence of keys, in the order of the nesting. For instance:

transformers:
    - get:
        keys:
            - WORDS
            - en
        to_object: word  # The usual.
        via_relation: has_en_translation
    - get:
        keys:
            - WORDS
            - fr
        to_object: word  # The usual.
        via_relation: has_fr_translation