YARRRML

Unofficial Draft

Editors:
Ben De Meester, Ghent University - IDLab, imec,
Pieter Heyvaert, Ghent University - IDLab, imec,
Anastasia Dimou, Ghent University - IDLab, imec,

Abstract

YARRRML is a human readable text-based representation for declarative generation rules. It is a subset of [YAML], a widely used data serialization language designed to be human-friendly.

Status of This Document

This document is draft of a potential specification. It has no official standing of any kind and does not represent the support or consensus of any standards organisation.

1. Terminology

subject
the subject of an RDF triple
predicate
the predicate of an RDF triple
object
the object of an RDF triple
class
the type of an entity
datatype
the type of a literal value
reference
the reference to a data fraction in a data source
function
a programmatic function that takes 0 or more input parameters and returns a single result

2. Profiles

This specification includes the following main profiles for tools that process YARRRML documents:

These profiles can be extended with additional profiles, which cannot be used on their own:

3. Base IRI

A base IRI can be defined. This IRI will be used for the creation of the IRIs of the term maps and sources of the [R2]RML rules.

A base IRI can be added by adding key base with as value the IRI itself to the root of the document. In the following example the base IRI is set to http://mybaseiri.com#.

Example 1: base IRI
base: http://mybaseiri.com#

4. Prefixes and namespaces

A set of prefixes and namespaces are predefined by default in every YARRRML document. There are the same as the predefined prefixes for RDFa.

Custom prefixes can be added by adding the collection prefixes to the root of the document. Each combination of a prefix and namespace is added to this collection as a key-value pair. In the following example two prefixes are defined: ex and test.

Example 2: custom prefixes
prefixes:
  ex: http://www.example.com
  test: http://www.test.com

5. Authors

The authors of the YARRRML rules can be added via the key authors. The value is an array with an object for each author. Each other can have the following keys:

name
the name of the author
email
the email of the author
website
the website of the author

In the following example two authors are defined.

Example 3: mapping with multiple authors
authors:
  - name: John Doe
    email: john@doe.com
  - name: Jane Doe
    website: https://janedoe.com

A string template exists that provides a shortcut version: name <email> (website). In the following example the same two authors are defined.

Example 5: mapping with multiple authors
authors:
  - John Doe <john@doe.com>
  - Jane Doe (https://janedoe.com)

In the case that authors have a WebID, it can be used instead of providing the name, email and so on. In the following example the same two authors are added via their WebIDs.

Example 7: mapping with multiple authors
authors:
  - http://johndoe.com/#me
  - http://janedoe.com/#me

In all the above cases when there is only author, an array is not needed. In the following example one author is defined.

Example 9: mapping with multiple authors
authors: John Doe <john@doe.com>

6. Template

A template contains 0 or more constant strings and 0 or more references to data in a data. References are prefixed with $( and suffixed with ). For example, foo is a template with one constant string. $(id) is a template with one reference to id. foo-$(id) is a template with one constant string foo- and one reference to id.

7. Data sources

The data sources are defined in the sources collection in the root of the document. Each source is added to this collection via a key-value pair. The key has to be unique for every source. The value is a collection with the keys described below.

7.1 Keys

type
Type of the data source.
Required profiles: RML.
Datatype: string.
access
Local or remote location of the data source.
Required profiles: RML.
Datatype: string.
credentials
Credentials required to access the data source.
Required profiles: RML.
Datatype: collection.
queryFormulation
Query formulation used to query the data source.
Required profiles: RML or R2RML.
Datatype: string.
query
Query to execute on the data source to retrieve the desired data.
Required profiles: RML or R2RML.
Datatype: string.
encoding
Encoding used by the data.
Required profiles: RML and CSVW.
Datatype: string.
delimiter
Delimiter to separate fields in a record.
Required profiles: RML and CSVW.
Datatype: string.
referenceFormulation
Reference formulation used to access the data retrieved from the data source.
Required profiles: RML.
Datatype: string.
iterator
Path to the records over which to iterate.
Required profiles: RML.
Datatype: string.

7.2 Type

This key's value describes what type of data source is used, so that the correct way of connecting to the data source can be determined. By default a local file is assumed when the access value is a path such as file.json. The value of type is then implicitly localfile. By default a remote file is assumed and retrieved via an HTTP GET when a URL is given such as http://example.org/file.json. The value of type is then implicitly remotefile. The following values are supported:

Data source typeValueRequired profiles
Oracle DatabaseoracleRML and D2RQ
MySQLmysqlRML and D2RQ
HSQLDBhsqlRML and D2RQ
PostgreSQLpostgresqlRML and D2RQ
IBM DB2db2RML and D2RQ
IBM InformixinformixRML and D2RQ
IngresingresRML and D2RQ
SAP Adaptive Server EnterprisesapaseRML and D2RQ
SAP SQL AnywheresapsqlanywhereRML and D2RQ
FirebirdfirebirdRML and D2RQ
SPARQL endpointsparqlRML and SD
Local filelocalfileRML
Remote fileremotefileRML
Note

R2RML rules do not include this information: it is supplied directly to the used R2RML processor. Also, it does not support SPARQL endpoints, local files, and remote files.

7.3 Access

This key's value describes where the data source can be accessed. Examples are file.json and http://example.org/my/db.

7.4 Query formulation

Query formulations define what type of query is used query to a data source. This key's supported values are the same as the values for the type. Therefore, if only a type is provided the query formulation is implicitly the same. But if you want for example to use a MySQL query with an Oracle Database, then you need to specify both the type and the query formulation.

In the case of SPARQL endpoints, defined by the value sparql for type, the query formulation is by default sparql11: the data sources are queried via a SPARQL 1.1 query.

7.5 Query

This key's value is a query that conforms the selected query formulation. For example, if the query formulation is mysql, then the value of query needs to be a valid MySQL query.

7.6 Reference formulation

Reference formulations define how to access the data retrieved from the data source. For example, this retrieved data can be query results coming from a database or a JSON file on the local file system. The value of this key is a string. This key's supported values are.

Reference formulationValueRequired profiles
CSV (tables with columns and rows)csvRML
JSONPathjsonpathRML
XPathxpathRML
XQueryxqueryRML

7.7 Iterator

This key's value defines what records are processed. It has to conform to the selected reference formulation.

Consider the following JSON example with an array of two people. One person is one record.

{
  "people": [
    {...},
    {...}
  ]
}
        

To iterate over all the people, the iterator is $.people[*] when using jsonpath as a reference formulation. If no iterator is provided, it is unclear what the records are.

7.8 Delimiter

This key's value defines the delimiter when working with CSV files. The default is ,.

7.9 Encoding

This key's value defines the encoding of the retrieved data. The default is utf-8.

7.10 Credentials

Credentials are provided when accessing a data source requires authentication. This key's value is a collection with the keys described below.

username
User name required to access the data source.
Required profiles: RML and D2RQ.
password
Password required to access the data source.
Required profiles: RML and D2RQ.

7.11 Query formulation vs reference formulation

Sequence diagram showing difference between query and reference formulation

In the figure above, you find a sequence diagram showing at what point the query and query formulation are used and a what point the iterator and reference formulation are used. The component "Processor" represents the software application that executes the converted YARRRML rules, e.g., RML rules. For clarity, this conversion is not included in the figure. The processor gets the type and access from the YARRRML rules. It uses this information to create a connection with the data source. Next, the processor gets the query formulation and query. It uses this information to query the desired data from the data source, making use of the earlier created connection. Once the data is retrieved the connection is closed. The processor gets the reference formulation and iterator. It uses this information to iterate over the retrieved data.

Sequence diagram showing difference between query and reference formulation for a SPARQL endpoint

In the figure above, you find a sequence diagram showing an example of the use of query, query formulation, iterator, and reference formulation. The data source is a SPARQL endpoint, available at http://example.org/sparql, which is queried using a SPARQL 1.1 query. The query formulation is sparql11 and the query is SELECT * WHERE{?s ?p ?o} The result is an XML document which is iterated upon using the iterator /sparql/results/result that conforms to the XPath specification.

7.12 Examples

In the following example a single data source is defined person-source.

Example 11: one data source
sources:
  person-source:
    access: data/person.json
    referenceFormulation: jsonpath
    iterator: $

A shortcut version of this example looks as follows.

Example 13: one data source using shortcuts
sources:
  person-source: [data/person.json~jsonpath, $]

The collection is replaced with an array where the first element contains the value for access appended with the ~ and the value for referenceFormulation, and the second element contains the iterator.

The following mapping access a SQL database and select the required data via a query.

Example 15: mapping with database as source
mapping:
  person:
    sources:
      access: http://localhost/example
      type: mysql
      credentials:
        username: root
        password: root
      queryFormulation: sql2008
      query: |
        SELECT DEPTNO, DNAME, LOC,
        (SELECT COUNT(*) FROM EMP WHERE EMP.DEPTNO=DEPT.DEPTNO) AS STAFF
        FROM DEPT;
      referenceFormulation: csv

8. Mappings

The mappings collection contains all the mappings of the document. Each mapping is added to this collection via key-value pair. The key is unique for each mapping. The value is collection containing rules to generate the subjects, predicates, and objects. In the following example two mappings are defined: person and project.

Example 17: two mappings
Example 18: two mappings
mappings:
  person: ...
  project: ...

8.1 Data sources

Besides defining data sources at the root of the document, data sources can also be defined inside a mapping via the collection sources. However, no unique key is specified for a source, and, thus, it cannot be referred to from other mappings. The key-value to add to a source are the same when defining sources at the root of the document. In the following example the mapping person has one source.

Example 20: mapping with one data source
mapping:
  person:
    sources:
      access: data/person.json
      referenceFormulation: jsonpath
      iterator: $
A shortcut version of this example looks as follows.
Example 22: mapping with one data source using shortcuts
mapping:
  person:
    sources: [data/person.json~jsonpath, $]
In case a mapping needs to be applied to multiple data sources, multiple sources can be added to the scources collection. In the following example the person mapping has two data sources.
Example 24: mapping with two data sources
mapping:
  person:
    sources:
      - access: data/person.json
        referenceFormulation: jsonpath
        iterator: $
      - access: data/person2.json
        referenceFormulation: jsonpath
        iterator: "$.persons[*]"
A shortcut version of this example looks as follows.
Example 26: mapping with two data sources using shortcuts
mapping:
  person:
    sources:
      - [data/person.json~jsonpath, $]
      - [data/person2.json~jsonpath, "$.persons[*]"]
If you describe a data source outside of the mappings, you can include via their unique key.
Example 28: mapping with one data sources
sources:
  person-source:
    access: data/person.json
    referenceFormulation: jsonpath
    iterator: $

mapping:
  person:
    sources: person-source
Multiple sources can be used by using an array of source keys as value for the key sources.
Example 30: mapping with two data sources
sources:
  person-source:
    access: data/person.json
    referenceFormulation: jsonpath
    iterator: $
  person-source2:
    access: data/person2.json
    referenceFormulation: jsonpath
    iterator: "$.person[*]"

mapping:
  person:
    sources:
      - person-source
      - person-source2
A combination of both sources defined outside and inside a mapping is possible.
Example 32: mapping with two data sources
sources:
  person-source:
    access: data/person.json
    referenceFormulation: jsonpath
    iterator: $

mapping:
  person:
    sources:
     - person-source
     - access: data/person2.json
       referenceFormulation: jsonpath
       iterator: "$.persons[*]"

8.2 Subjects

For every triple is required to define whether a IRI or blank node needs to used. This information is added to a mapping via the collection subjects. In the case of an IRI, this collection contains 0 or more templates that specify the IRI. In the case of a blank node, this collection is set to null or is not specified at all. In the following example the mapping person generate IRI for the subjects based on the template http://wwww.example.com/person/$(id).

Example 34: mapping with one subject
mappings:
  person:
    subjects: http://wwww.example.com/person/$(id)
It is also possible to specify multiple subjects. In this case an array of templates is used. In the following example the mapping person generate subjects based on the templates http://wwww.example.com/person/$(id) and http://www.test.com/$(firstname).
Example 36: mapping with two subjects
mappings:
  person:
    subjects: [http://wwww.example.com/person/$(id), http://www.test.com/$(firstname)]
It is possible to apply functions on subjects.

8.3 Predicates and objects

In the following example the mapping person generates combinations of predicates and objects, where the predicate is foaf:firstName and the object is the firstname of each person.
Example 38: mapping with one predicate and object
mappings:
  person:
    predicateobjects:
      - predicates: foaf:firstName
        objects: $(firstname)
A shortcut version of this example looks as follows.
Example 40: mapping with one predicate and object using shortcuts
mappings:
  person:
    predicateobjects:
      - [foaf:firstName, $(firstname)]
It is possible to specify multiple predicates and objects. In this case an array of templates is used.
Example 42: mapping with two predicates and objects
mappings:
  person:
    predicateobjects:
      - predicates: [foaf:name, rdfs:label]
        objects: [$(firstname), $(lastname)]
A shortcut version of this example looks as follows.
Example 44: mapping with two predicates and objects using shortcuts
mappings:
  person:
    predicateobjects:
      - [[foaf:name, rdfs:label], [$(firstname), $(lastname)]]
Example 46: mapping with object that generates an IRI
mappings:
  person:
    predicateobjects:
      - predicates: foaf:knows
        objects:
          value: $(colleague)
          type: iri
A shortcut version of this example looks as follows.
Example 48: mapping with object that generates an IRI using shortcuts
mappings:
  person:
    predicateobjects:
      - [[foaf:knows, rdfs:label], $(colleague)~iri]
The inverse predicate can also be added. This is only valid when the object is an IRI or a blank node.
Example 50: mapping with one inverse predicate
mappings:
  work:
    predicateobjects:
      - predicates: ex:createdBy
        inversepredicates: ex:created
        objects: $(foafprofile)
        type: iri

8.4 Datatypes

Example 51: mapping with one datatype
mappings:
  person:
    predicateobjects:
      - predicates: foaf:firstName
        objects:
          value: $(firstname)
          datatype: xsd:string
A shortcut version of this example looks as follows.
Example 53: mapping with one datatype using shortcuts
mappings:
  person:
    predicateobjects:
      - [foaf:firstName, $(firstname), xsd:string]
Example 55: mapping with two datatypes
mappings:
  person:
    predicateobjects:
      - predicates: foaf:name
        objects:
          - value: $(firstname)
            datatype: ex:string
          - value: $(lastname)
            datatype: ex:anotherString
A shortcut version of this example looks as follows.
Example 57: mapping with two datatypes using shortcuts
mappings:
  person:
    predicateobjects:
      - predicates: [foaf:name, rdfs:label]
        objects: [[$(firstname), ex:string], [$(lastname), ex:anotherString]]

8.5 Languages

Example 59: mapping with one language
mappings:
  person:
    predicateobjects:
      - predicates: foaf:firstName
        objects:
          value: $(firstname)
          language: en
A shortcut version of this example looks as follows.
Example 61: mapping with one language using shortcuts
mappings:
  person:
    predicateobjects:
      - [foaf:firstName, $(firstname), en~lang]
Example 63: mapping with two languages
mappings:
  person:
    predicateobjects:
      - predicates: foaf:name
        objects:
          - value: $(firstname)
            language: en
          - value: $(lastname)
            language: nl
A shortcut version of this example looks as follows.
Example 65: mapping with two languages using shortcuts
mappings:
  person:
    predicateobjects:
      - predicates: [foaf:name, rdfs:label]
        objects: [[$(firstname), en~lang], [$(lastname), nl~lang]]

8.6 Referring to other mappings

In certain use cases triples need to be generated between the records of two mappings. For example, in the following example we have two mappings: persons and projects. In the existing data there is for every person a field projectID referring to the project on which the person is working. Therefore, we want to generate triples between every person and his/her project. The objects collection has an object with the key mapping. The value of this key refers to the mapping that provides the IRIs that will serve as object for the predicate-object combination. Furthermore, a condition is added, so that only persons and projects are linked when they are actually related, based on the projectID of the person and the ID of the project. Note that a condition is not required. But when a condition is used an extra value can be given to a parameter of a function. This is either s or o. s means that the value of the parameter is coming from the subject of the relationship, while o means that the value is coming from the object of the relationship. The default value is s. In this example it would result in relationships between every person and their projects.

Example 67: interlinking two mappings
mappings:
  person:
    subjects: http://example.com/person/$(ID)
    predicateobjects:
      - predicates: foaf:worksFor
        objects:
        - mapping: project
          condition:
            function: equal
            parameters:
              - [str1, $(projectID), s]
              - [str2, $(ID), o]
  project:
    subjects: http://example.com/project/$(ID)

8.7 Graphs

8.7.1 All triples

Example 69: mapping with graph for all triples
mappings:
  person:
    graphs: ex:myGraph

8.7.2 All triples with a specific predicate and object

Example 71: mapping with graph for all triples with a specific predicate and object
mappings:
  person:
    predicateobjects:
     - predicates: foaf:firstName
       objects: $(firstname)
       graphs: ex:myGraph

9. Functions

Functions can be added to subjects, predicates, and objects.
Example 73: mapping with function on firstname
mappings:
  person:
    predicateobjects:
     - predicates: foaf:firstName
       objects:
        - function: ex:toLowerCase
          parameters:
           - parameter: ex:input
             value: $(firstname)
A shortcut version of this example looks as follows.
Example 75: mapping with function on firstname using shortcuts
mappings:
  person:
    predicateobjects:
     - predicates: foaf:firstName
       objects:
        - function: ex:toLowerCase
          parameters:
           - [ex:input, $(firstname)]
Datatypes can also be assigned to the results of functions.
Example 77: mapping with function on age and setting datatype to integer
mappings:
  person:
    predicateobjects:
     - predicates: ex:age
       objects:
        - function: ex:double
          parameters:
           - [ex:input, $(age)]
          datatype: xsd:integer
It possible to combine multiple functions, i.e., the value of a parameter of a function is the result of another function.
Example 79: mapping with multiple functions
mappings:
  person:
    predicateobjects:
     - predicates: schema:name
       objects:
        - function: ex:escape
          parameters:
           - parameter: ex:valueParameter
             value:
               function: ex:toUpperCase
               parameters:
                 - [ex:valueParameter, $(name)]
           - [ex:modeParameter, html]
Additionally, it is possible to combine the function and its parameters in one line. The function is followed by brackets ((...)), every parameter-value pair is separated by a comma (,), and parameters are separated from their value by an equal sign (=).
Example 81: mapping with function on firstname using one line
mappings:
  person:
    predicateobjects:
     - predicates: foaf:firstName
       objects:
        - function: ex:toLowerCase(ex:input = $(firstname))
Note that it is possible to exclude the prefix of the parameters if it is the same as the prefix of the function:
Example 83: mapping with function on firstname using one line without prefix
mappings:
  person:
    predicateobjects:
     - predicates: foaf:firstName
       objects:
        - function: ex:toLowerCase(input = $(firstname))

10. Conditions

A subject or predicate-object combination is in certain cases only generated when a condition is fulfilled. In the following example, the predicate-object is only generated when the firstname is valid.

Example 85: mapping with condition on predicate object
mappings:
  person:
    predicateobjects:
     - predicates: foaf:firstName
       objects: $(firstname)
       condition:
        function: ex:isValid
        parameters:
         - [ex:input, $(firstname)]
In the following example, the mapping is only executed for every record that has set its ID.
Example 86: mapping with condition
mappings:
  person:
    subjects: http://example.com/{ID}
    condition:
        function: ex:isSet
        parameters:
         - [ex:input, $(ID)]
    predicateobjects:
     - predicates: foaf:firstName
       objects: $(firstname)

11. External references

It is possible to define references that do not refer to data in a data source. These references are called "external references". They are provided via the external key that has as value a list of references with their values. In the following example two external references are defined: name and city with as values John and Ghent.

Example 87: defining external references
external:
  name: John
  city: Ghent

mappings:
  person:
    subjects: http://example.org/$(id)
    po:
      - [ex:name, $(_name)]
      - [ex:firstName, $(_name)]
      - [ex:city, $(_city)]

Replacing the external references with their actual values results in the following.

Example 88: external references are filled in
mappings:
  person:
    s: http://example.org/$(id)
    po:
      - [ex:name, John]
      - [ex:firstName, John]
      - [ex:city, Ghent]

If the value for an external reference is not provided, then the reference is not replaced. In the following example no value is provided for name.

Example 89: defining some external references
external:
  city: Ghent

mappings:
  person:
    subjects: http://example.org/$(id)
    po:
      - [ex:name, $(_name)]
      - [ex:firstName, $(_name)]
      - [ex:city, $(_city)]

Replacing the remaining external reference with its actual value results in the following.

Example 90: some external references are filled in
mappings:
  person:
    subjects: http://example.org/$(id)
    po:
      - [ex:name, $(_name)]
      - [ex:firstName, $(_name)]
      - [ex:city, Ghent]

$(_name) is not replaced.

If you want use a reference as both a regular and an external reference, you add a \ before the regular reference. In the following example $(_name) is an external reference and $(\_name) is a regular reference.

Example 91: defining both regular and external references
external:
  name: John

mappings:
  person:
    subjects: http://example.org/$(id)
    po:
      - [ex:name, $(_name)]
      - [ex:firstName, $(\_name)]
Replacing the external reference with its actual value results in the following.
Example 92: external reference is filled in, ignoring reular reference
mappings:
  person:
    subjects: http://example.org/$(id)
    po:
      - [ex:name, John]
      - [ex:firstName, $(_name)]

12. Shortcuts

12.1 Keys

12.2 Predicates

13. Reference implementation

The YARRRML Parser is a reference implementation that generates [R2]RML rules based on YARRRML. The parser's code also includes tests to validate a parser's conformance to the YARRRML specification.

A. References

A.1 Informative references

[csv2rdf]
Generating RDF from Tabular Data on the Web. Jeremy Tandy; Ivan Herman; Gregg Kellogg. W3C. 17 December 2015. W3C Recommendation. URL: https://www.w3.org/TR/csv2rdf/
[R2RML]
R2RML: RDB to RDF Mapping Language. Souripriya Das; Seema Sundara; Richard Cyganiak. W3C. 27 September 2012. W3C Recommendation. URL: https://www.w3.org/TR/r2rml/
[YAML]
YAML Ain’t Markup Language (YAML™) Version 1.2. Oren Ben-Kiki; Clark Evans; Ingy döt Net.1 October 2009. URL: http://yaml.org/spec/1.2/spec.html