YARRRML

Unofficial Draft

Editors:
Ben De Meester, imec — Ghent University — IDLab,
Pieter Heyvaert, imec — Ghent University — IDLab,

Abstract

YARRRML is a human readable text-based representation for declarative generation rules. It is a subset of [YAML], a widely used data serialization language designed to be human-friendly.

Status of This Document

This document is draft of a potential specification. It has no official standing of any kind and does not represent the support or consensus of any standards organisation.

1. Terminology

subject
the subject of an RDF triple
predicate
the predicate of an RDF triple
object
the object of an RDF triple
class
the type of an entity
datatype
the type of a literal value
reference
function
a programmatic function that takes 0 or more input parameters and returns a single result

2. Profiles

This specification includes the following main profiles for tools that process YARRRML documents:

These profiles can be extended with additional profiles, which cannot be used on their own:

3. Base IRI

A base IRI can be defined. This IRI will be used for the creation of the IRIs of the term maps and sources of the [R2]RML rules.

A base IRI can be added by adding key base with as value the IRI itself to the root of the document. In the following example the base IRI is set to http://mybaseiri.com#.

Example 1: base IRI
base: http://mybaseiri.com#

4. Prefixes and namespaces

A set of prefixes and namespaces are predefined by default in every YARRRML document. There are the same as the predefined prefixes for RDFa.

Custom prefixes can be added by adding the collection prefixes to the root of the document. Each combination of a prefix and namespace is added to this collection as a key-value pair. In the following example two prefixes are defined: ex and test.

Example 2: custom prefixes
prefixes:
  ex: http://www.example.com
  test: http://www.test.com

5. Template

A template contains 0 or more constant strings and 0 or more references to data in a data. References are prefixed with $( and suffixed with ). For example, foo is a template with one constant string. $(id) is a template with one reference to id. foo-$(id) is a template with one constant string foo- and one reference to id.

6. Data sources

The data sources are can be defined in the sources collection in the root of the document. Each source is added to this collection via a key-value pair. The key has to be unique for every source. The value is a collection with three keys access (required), referenceFormulation (required), and iterator (conditionally required).

6.1 Keys

access
the local or remote location of the data source
driver (required profiles: RML, D2RQ)
the local or remote location of the data source
referenceFormulation (required profiles: RML)
the reference formulation used on to access data source
iterator (required profiles: RML)
the path to the different records over which to iterate
delimiter (required profiles: RML, CSVW)
the delimiter to separate fields in a record
encoding (required profiles: RML, CSVW)
the encoding used for the data
query (required profiles: RML, R2RML)
the query to execute on the data source to select only a part of the data
queryFormulation (required profiles: RML, R2RML)
the query formulation used on to query the data source
credentials (required profiles: RML, D2RQ)
the credentials required to access the data source
username (required profiles: RML, D2RQ)
the username required to access the data source
password (required profiles: RML, D2RQ)
the password required to access the data source
Note

To be precise, the iterator contains a "query" as a string, as per specification of the referenceFormulation. E.g., when using referenceFormulation xpath, the iterator will be interpreted as an xPath statement. When using referenceFormulation sparql, the iterator will be interpreted as a SPARQL query. This is similar for queryFormulation and query. The latter however is executed when retrieving the data, a so-called "view" of the data. The former is executed on top of this view.

6.2 Reference formulations

The following reference formulations are supported for these data formats.
CSV (required profiles: RML)
The reference formulation csv is used for data sources in the CSV format. No iterator is required. Every row of a CSV data source is considered as a record.
JSON (required profiles: RML)
The reference formulation jsonpath is used for data sources that can be access via JSONPath expressions. An iterator is required.
XML (required profiles: RML)
The reference formulation xpath is used for data sources that can be access via XPath expressions. An iterator is required. The reference formulation xquery is used for data sources that can be access via XQuery queries. No iterator is required. Every row of the result of the query is considered as a record.

6.3 Query formulations

The following query formulation are supported for these query languages.
SQL
The following query formulations can be used for data sources that can be queried via a SQL query:
  • sql2008
  • oracle
  • mysql
  • mssql
  • hsql
  • postgresql
  • db2
  • informix
  • ingres
  • progress
  • sybasease
  • sybasesqlanywhere
  • virtuoso
  • firebird
This list is based on version IRIs identified by RDB2RDF.
SPARQL (required profiles: RML, SD)
The query formulation sparql11 is used for data sources that can be queried via a SPARQL 1.1 query.

6.4 Examples

In the following example a single data source is defined person-source.

Example 3: one data source
sources:
  person-source:
    access: data/person.json
    referenceFormulation: jsonpath
    iterator: $

A shortcut version of this example looks as follows.

Example 5: one data source using shortcuts
sources:
  person-source: [data/person.json~jsonpath, $]

The collection is replaced with an array where the first element contains the value for access appended with the ~ and the value for referenceFormulation, and the second element contains the iterator.

The following mapping access a SQL database and select the required data via a query.

Example 7: mapping with database as source
mapping:
  person:
    sources:
      access: http://localhost/example
      type: mysql
      credentials:
        username: root
        password: root
      queryFormulation: sql2008
      query: |
        SELECT DEPTNO, DNAME, LOC,
        (SELECT COUNT(*) FROM EMP WHERE EMP.DEPTNO=DEPT.DEPTNO) AS STAFF
        FROM DEPT;
      referenceFormulation: csv

7. Mappings

The mappings collection contains all the mappings of the document. Each mapping is added to this collection via key-value pair. The key is unique for each mapping. The value is collection containing rules to generate the subjects, predicates, and objects. In the following example two mappings are defined: person and project.

Example 9: two mappings
Example 10: two mappings
mappings:
  person: ...
  project: ...

7.1 Data sources

Besides defining data sources at the root of the document, data sources can also be defined inside a mapping via the collection sources. However, no unique key is specified for a source, and, thus, it cannot be referred to from other mappings. The key-value to add to a source are the same when defining sources at the root of the document. In the following example the mapping person has one source.

Example 12: mapping with one data source
mapping:
  person:
    sources:
      access: data/person.json
      referenceFormulation: jsonpath
      iterator: $
A shortcut version of this example looks as follows.
Example 14: mapping with one data source using shortcuts
mapping:
  person:
    sources: [data/person.json~jsonpath, $]
In case a mapping needs to be applied to multiple data sources, multiple sources can be added to the scources collection. In the following example the person mapping has two data sources.
Example 16: mapping with two data sources
mapping:
  person:
    sources:
      - access: data/person.json
        referenceFormulation: jsonpath
        iterator: $
      - access: data/person2.json
        referenceFormulation: jsonpath
        iterator: "$.persons[*]"
A shortcut version of this example looks as follows.
Example 18: mapping with two data sources using shortcuts
mapping:
  person:
    sources:
      - [data/person.json~jsonpath, $]
      - [data/person2.json~jsonpath, "$.persons[*]"]
If you describe a data source outside of the mappings, you can include via their unique key.
Example 20: mapping with one data sources
sources:
  person-source:
    access: data/person.json
    referenceFormulation: jsonpath
    iterator: $

mapping:
  person:
    sources: person-source
Multiple sources can be used by using an array of source keys as value for the key sources.
Example 22: mapping with two data sources
sources:
  person-source:
    access: data/person.json
    referenceFormulation: jsonpath
    iterator: $
  person-source2:
    access: data/person2.json
    referenceFormulation: jsonpath
    iterator: "$.person[*]"

mapping:
  person:
    sources:
      - [person-source, person-source2]
A combination of both sources defined outside and inside a mapping is possible.
Example 24: mapping with two data sources
sources:
  person-source:
    access: data/person.json
    referenceFormulation: jsonpath
    iterator: $

mapping:
  person:
    sources:
     - person-source
     - access: data/person2.json
       referenceFormulation: jsonpath
       iterator: "$.persons[*]"

7.2 Subjects

For every triple is required to define whether a IRI or blank node needs to used. This information is added to a mapping via the collection subjects. In the case of an IRI , this collection contains 0 or more templates that specify the IRI. In the case of a blank node, this collection is set to null or is not specified at all. In the following example the mapping person generate IRI for the subjects based on the template http://wwww.example.com/person/$(id).

Example 26: mapping with one subject
mappings:
  person:
    subjects: http://wwww.example.com/person/$(id)
It is also possible to specify multiple subjects. In this case an array of templates is used. In the following example the mapping person generate subjects based on the templates http://wwww.example.com/person/$(id) and http://www.test.com/$(firstname).
Example 28: mapping with two subjects
mappings:
  person:
    subjects: [http://wwww.example.com/person/$(id), http://www.test.com/$(firstname)]
It is possible to apply functions on subjects.

7.3 Predicates and objects

In the following example the mapping person generates combinations of predicates and objects, where the predicate is foaf:firstName and the object is the firstname of each person.
Example 30: mapping with one predicate and object
mappings:
  person:
    predicateobjects:
      - predicates: foaf:firstName
        objects: $(firstname)
A shortcut version of this example looks as follows.
Example 32: mapping with one predicate and object using shortcuts
mappings:
  person:
    predicateobjects:
      - [foaf:firstName, $(firstname)]
It is possible to specify multiple predicates and objects. In this case an array of templates is used.
Example 34: mapping with two predicates and objects
mappings:
  person:
    predicateobjects:
      - predicates: [foaf:name, rdfs:label]
        objects: [$(firstname), $(lastname)]
A shortcut version of this example looks as follows.
Example 36: mapping with two predicates and objects using shortcuts
mappings:
  person:
    predicateobjects:
      - [[foaf:name, rdfs:label], [$(firstname), $(lastname)]]
Example 38: mapping with object that generates an IRI
mappings:
  person:
    predicateobjects:
      - predicates: foaf:knows
        objects:
          value: $(colleague)
          type: iri
A shortcut version of this example looks as follows.
Example 40: mapping with object that generates an IRI using shortcuts
mappings:
  person:
    predicateobjects:
      - [[foaf:name, rdfs:label], $(colleague)~iri]
The inverse predicate can also be added. This is only valid when the object is an IRI or a blank node.
Example 42: mapping with one inverse predicate
mappings:
  work:
    predicateobjects:
      - predicates: ex:createdBy
        inversepredicates: ex:created
        objects: $(foafprofile)
        type: iri

7.4 Datatypes

Example 43: mapping with one datatype
mappings:
  person:
    predicateobjects:
      - predicates: foaf:firstName
        objects:
          value: $(firstname)
          datatype: xsd:string
A shortcut version of this example looks as follows.
Example 45: mapping with one datatype using shortcuts
mappings:
  person:
    predicateobjects:
      - [foaf:firstName, $(firstname), xsd:string]
Example 47: mapping with two datatypes
mappings:
  person:
    predicateobjects:
      - predicates: foaf:name
        objects:
          - value: $(firstname)
            datatype: ex:string
          - value: $(lastname)
            datatype: ex:anotherString
A shortcut version of this example looks as follows.
Example 49: mapping with two datatypes using shortcuts
mappings:
  person:
    predicateobjects:
      - predicates: [foaf:name, rdfs:label]
        objects: [[$(firstname), ex:string], [$(lastname), ex:anotherString]]

7.5 Languages

Example 51: mapping with one language
mappings:
  person:
    predicateobjects:
      - predicates: foaf:firstName
        objects:
          value: $(firstname)
          language: en
A shortcut version of this example looks as follows.
Example 53: mapping with one language using shortcuts
mappings:
  person:
    predicateobjects:
      - [foaf:firstName, $(firstname), en~lang]
Example 55: mapping with two languages
mappings:
  person:
    predicateobjects:
      - predicates: foaf:name
        objects:
          - value: $(firstname)
            language: en
          - value: $(lastname)
            language: nl
A shortcut version of this example looks as follows.
Example 57: mapping with two languages using shortcuts
mappings:
  person:
    predicateobjects:
      - predicates: [foaf:name, rdfs:label]
        objects: [[$(firstname), en~lang], [$(lastname), nl~lang]]

7.6 Referring to other mappings

In certain use cases triples need to be generated between the records of two mappings. For example, in the following example we have two mappings: persons and projects. In the existing data there is for every person a field projectID referring to the project on which the person is working. Therefore, we want to generate triples between every person and his/her project. The objects collection has an object with the key mapping. The value of this key refers to the mapping that provides the IRIs that will serve as object for the predicate-object combination. Furthermore, a condition is added, so that only persons and projects are linked when they are actually related, based on the projectID of the person and the ID of the project. Note that a condition is not required. In this example it would result in relationships between every person and every project.

Example 59: interlinking two mappings
mappings:
  person:
    subjects: http://example.com/person/$(ID)
    predicateobjects:
      - predicates: foaf:worksFor
        objects:
        - mapping: project
          condition:
            function: equal
            parameters:
              - [str1, $(projectID)]
              - [str2, $(ID)]
  project:
    subjects: http://example.com/project/$(ID)

7.7 Graphs

7.7.1 All triples

Example 61: mapping with graph for all triples
mappings:
  person:
    graphs: ex:myGraph

7.7.2 All triples with a specific predicate and object

Example 63: mapping with graph for all triples with a specific predicate and object
mappings:
  person:
    predicateobjects:
     - predicates: foaf:firstName
       objects: $(firstname)
       graphs: ex:myGraph

8. Functions

Functions can be added to subjects, predicates, and objects.
Example 65: mapping with function on firstname
mappings:
  person:
    predicateobjects:
     - predicates: foaf:firstName
       objects:
        - function: ex:toLowerCase
          parameters:
           - parameter: ex:input
             value: $(firstname)
A shortcut version of this example looks as follows.
Example 67: mapping with function on firstname using shortcuts
mappings:
  person:
    predicateobjects:
     - predicates: foaf:firstName
       objects:
        - function: ex:toLowerCase
          parameters:
           - [ex:input, $(firstname)]
It possible to combine multiple functions, i.e., the value of a parameter of a function is the result of another function.
Example 69: mapping with multiple functions
mappings:
  person:
    predicateobjects:
     - predicates: schema:name
       objects:
        - function: ex:escape
          parameters:
           - parameter: ex:valueParameter
             value:
               function: ex:toUpperCase
               parameters:
                 - [ex:valueParameter, $(name)]
           - [ex:modeParameter, html]

9. Conditions

A subject or predicate-object combination is in certain cases only generated when a condition is fulfilled. In the following example, the predicate-object is only generated when the firstname is valid.

Example 71: mapping with condition on predicate object
mappings:
  person:
    predicateobjects:
     - predicates: foaf:firstName
       objects: $(firstname)
       condition:
        function: ex:isValid
        parameters:
         - [ex:input, $(firstname)]
In the following example, the mapping is only executed for every record that has set its ID.
Example 72: mapping with condition
mappings:
  person:
    subjects: http://example.com/{ID}
    condition:
        function: ex:isSet
        parameters:
         - [ex:input, $(ID)]
    predicateobjects:
     - predicates: foaf:firstName
       objects: $(firstname)

10. Shortcuts

10.1 Keys

10.2 Predicates

A. References

A.1 Informative references

[csv2rdf]
Generating RDF from Tabular Data on the Web. Jeremy Tandy; Ivan Herman; Gregg Kellogg. W3C. 17 December 2015. W3C Recommendation. URL: https://www.w3.org/TR/csv2rdf/
[R2RML]
R2RML: RDB to RDF Mapping Language. Souripriya Das; Seema Sundara; Richard Cyganiak. W3C. 27 September 2012. W3C Recommendation. URL: https://www.w3.org/TR/r2rml/
[YAML]
YAML Ain’t Markup Language (YAML™) Version 1.2. Oren Ben-Kiki; Clark Evans; Ingy döt Net.1 October 2009. URL: http://yaml.org/spec/1.2/spec.html