This document is licensed under a Creative Commons Attribution 3.0 License.
YARRRML (pronounced /jɑɹməl/) is a human readable text-based representation for declarative generation rules. It is a subset of [YAML], a widely used data serialization language designed to be human-friendly.
This document is draft of a potential specification. It has no official standing of any kind and does not represent the support or consensus of any standards organisation.
This specification includes the following main profiles for tools that process YARRRML documents:
These profiles can be extended with additional profiles, which cannot be used on their own:
A base IRI can be defined. This IRI will be used for the creation of the IRIs of the term maps and sources of the [R2]RML rules.
A base IRI can be added by adding key base
with as value the IRI itself to the root of the document.
In the following example the base IRI is set to http://mybaseiri.com#
.
base: http://mybaseiri.com#
A set of prefixes and namespaces are predefined by default in every YARRRML document. There are the same as the predefined prefixes for RDFa.
Custom prefixes can be added by adding the collection prefixes
to the root of the document.
Each combination of a prefix and namespace is added to this collection as a key-value pair.
In the following example two prefixes are defined: ex
and test
.
prefixes: ex: http://www.example.com test: http://www.test.com
A template contains 0 or more constant strings and 0 or more references to data in a data.
References are prefixed with $(
and suffixed with )
.
For example, foo
is a template with one constant string.
$(id)
is a template with one reference to id
.
foo-$(id)
is a template with one constant string foo-
and one reference to id
.
The data sources are defined in the sources
collection in the root of the document.
Each source is added to this collection via a key-value pair.
The key has to be unique for every source.
The value is a collection with the keys described below.
This key's value describes what type of data source is used,
so that the correct way of connecting to the data source can be determined.
By default a local file is assumed when the access
value is a path such as file.json
.
The value of type is then implicitly localfile
.
By default a remote file is assumed and retrieved via an HTTP GET when a URL is given such as http://example.org/file.json
.
The value of type is then implicitly remotefile
.
The following values are supported:
Data source type | Value | Required profiles |
---|---|---|
Oracle Database | oracle | RML and D2RQ |
MySQL | mysql | RML and D2RQ |
HSQLDB | hsql | RML and D2RQ |
PostgreSQL | postgresql | RML and D2RQ |
IBM DB2 | db2 | RML and D2RQ |
IBM Informix | informix | RML and D2RQ |
Ingres | ingres | RML and D2RQ |
SAP Adaptive Server Enterprise | sapase | RML and D2RQ |
SAP SQL Anywhere | sapsqlanywhere | RML and D2RQ |
Firebird | firebird | RML and D2RQ |
Microsoft SQL Server | mssqlserver | RML and D2RQ |
Virtuoso | virtuoso | RML and D2RQ |
Web of Things | wot | RML and WOT |
SPARQL endpoint | sparql | RML and SD |
Local file | localfile | RML |
Remote file | remotefile | RML |
R2RML rules do not include this information: it is supplied directly to the used R2RML processor. Also, it does not support SPARQL endpoints, local files, and remote files.
This key's value describes where the data source can be accessed.
Examples are file.json
and http://example.org/my/db
.
Query formulations define what type of query is used query to a data source. This key's supported values are the same as the values for the type, together with the additional values below. Therefore, if only a type is provided the query formulation is implicitly the same. But if you want for example to use a MySQL query with an Oracle Database, then you need to specify both the type and the query formulation.
In the case of SPARQL endpoints, defined by the value sparql
for type,
the query formulation is by default sparql11
: the data sources are queried
via a SPARQL 1.1 query.
Query formulation | Value | Required profiles |
---|---|---|
SPARQL 1.1 query (default for SPARQL endpoint) | sparql11 | RML and SD |
SQL:2008 | sql2008 | RML and D2RQ, or R2RML |
SQL:2011 | sql2011 | RML and D2RQ, or R2RML |
SQL:2016 | sql2016 | RML and D2RQ, or R2RML |
This key's value is a query that conforms the selected query formulation.
For example, if the query formulation is mysql
, then
the value of query
needs to be a valid MySQL query.
Reference formulations define how to access the data retrieved from the data source. For example, this retrieved data can be query results coming from a database or a JSON file on the local file system. The value of this key is a string. This key's supported values are.
Reference formulation | Value | Required profiles |
---|---|---|
CSV (tables with columns and rows) | csv | RML |
JSONPath | jsonpath | RML |
XPath | xpath | RML |
XQuery | xquery | RML |
This key's value defines what records are processed. It has to conform to the selected reference formulation.
Consider the following JSON example with an array of two people. One person is one record.
{
"people": [
{...},
{...}
]
}
To iterate over all the people,
the iterator is $.people[*]
when using jsonpath
as a reference formulation.
If no iterator is provided,
it is unclear what the records are.
This key's value defines the delimiter when working with CSV files.
The default is ,
.
This key's value defines the encoding of the retrieved data.
The default is utf-8
.
Credentials are provided when accessing a data source requires authentication. This key's value is a collection with the keys described below.
This key's value defines the content type of the retrieved data with a MIME type.
This key's value defines the operation type for the Web of Things description.
Operation | Value | Operation type |
---|---|---|
Retrieve data from Web API or stream | read | td:readproperty |
Push data to Web API or stream | write | td:writeproperty |
Web of Things Security description to describe how authentication should be performed againast a Web API or stream when accessing its data.
Security | Value | Web of Things Security scheme |
---|---|---|
Security through API key | apikey | wotsec:APISecurityScheme |
In the figure above, you find a sequence diagram showing at what point the query and query formulation are used and
a what point the iterator and reference formulation are used.
The component "Processor" represents the software application that executes the converted YARRRML rules,
e.g., RML rules.
For clarity, this conversion is not included in the figure.
The processor gets the type
and access
from the YARRRML rules.
It uses this information to create a connection with the data source.
Next, the processor gets the query formulation
and query
.
It uses this information to query the desired data from the data source,
making use of the earlier created connection.
Once the data is retrieved the connection is closed.
The processor gets the reference formulation
and iterator
.
It uses this information to iterate over the retrieved data.
In the figure above, you find a sequence diagram showing an example of
the use of query, query formulation, iterator, and reference formulation.
The data source is a SPARQL endpoint, available at http://example.org/sparql
,
which is queried using a SPARQL 1.1 query.
The query formulation is sparql11
and
the query is SELECT * WHERE{?s ?p ?o}
The result is an XML document which is iterated upon using the iterator /sparql/results/result
that conforms to the XPath specification.
In the following example a single data source is defined person-source
.
sources: person-source: access: data/person.json referenceFormulation: jsonpath iterator: $
A shortcut version of this example looks as follows.
sources: person-source: [data/person.json~jsonpath, $]
The collection is replaced with an array where
the first element contains the value for access
appended with the ~
and the value for referenceFormulation
, and
the second element contains the iterator.
The following mapping access a SQL database and select the required data via a query.
mapping:
person:
sources:
access: http://localhost/example
type: mysql
credentials:
username: root
password: root
queryFormulation: sql2008
query: |
SELECT DEPTNO, DNAME, LOC,
(SELECT COUNT(*) FROM EMP WHERE EMP.DEPTNO=DEPT.DEPTNO) AS STAFF
FROM DEPT;
referenceFormulation: csv
This key's value describes what type of target is used, so that the correct way
of accessing to the target can be determined. By default a local file is assumed
when the access
value is a path such as file.nq. The value of type is
then implicitly localfile
. The following values are supported:
Target type | Value | Required profiles |
---|---|---|
SPARQL endpoint | sparql | RML, RMLT and SD |
Local file | localfile | RML, RMLT and VOID/DCAT |
This key's value describes where the target can be accessed. Example: file.nq
.
This key's value is the serialization format that should be used to serialize the RDF when exporting to the target. By default, the serialization format is N-Quads [DataIO]. The supported serialization formats are listed by the W3C Formats namespace.
Serialization format | Value |
---|---|
JSON-LD | jsonld |
N3 | n3 |
N-Triples | ntriples |
N-Quads | nquads |
LD Patch | ldpatch |
microdata | microdata |
OWL XML Serialization | owlxml |
OWL Functional Syntax | owlfunctional |
OWL Manchester Syntax | owlmanchester |
POWDER | powder |
POWDER-S | powder-s |
PROV-N | prov-n |
PROV-XML | prov-xml |
RDFa | rdfa |
RDF/JSON | rdfjson |
RDF/XML | rdfxml |
RIF XML Syntax | rifxml |
SPARQL Results in XML | sparqlxml |
SPARQL Results in JSON | sparqljson |
SPARQL Results in CSV | sparqlcsv |
SPARQL Results in TSV | sparqltsv |
Turtle | turtle |
TriG | trig |
Compression defines which compression algorithm should be applied when exporting to a target. By default, no compression algorithm is applied [DataIO]. The supported compression algorithms are listed by the Compression namespace.
Compression algorithm | Value |
---|---|
GZip | gzip |
Zip | zip |
TarGZip | targzip |
TarXz | tarxz |
In the following example a single target is defined person-target
.
targets:
person-target:
access: data/dump.ttl.gz
type: void
serialization: turtle
compression: gzip
A shortcut version of this example looks as follows.
targets:
person-target: [data/dump.ttl.gz~void, turtle, gzip]
The collection is replaced with an array where the first element contains the value
for access appended with the ~
and the value for type
, the
second element contains serialization format, and the third element contains the
compression algorithm. The serialization format and compression algorithm are not required.
By default N-Quads is used a serialization format and no compression is applied.
The following mapping access a SPARQL endpoint as target using SPARQL UPDATE queries:
The mappings
collection contains all the mappings of the document.
Each mapping is added to this collection via key-value pair.
The key is unique for each mapping.
The value is collection containing rules to generate the subjects, predicates, and objects.
In the following example two mappings are defined: person
and project
.
Besides defining data sources at the root of the document,
data sources can also be defined inside a mapping via the collection sources
.
However, no unique key is specified for a source, and, thus, it cannot be referred to from other mappings.
The key-value to add to a source are the same when defining sources at the root of the document.
In the following example the mapping person
has one source.
mapping: person: sources: access: data/person.json referenceFormulation: jsonpath iterator: $
mapping: person: sources: - [data/person.json~jsonpath, $]
scources
collection.
In the following example the person
mapping has two data sources.
mapping: person: sources: - access: data/person.json referenceFormulation: jsonpath iterator: $ - access: data/person2.json referenceFormulation: jsonpath iterator: "$.persons[*]"
mapping: person: sources: - [data/person.json~jsonpath, $] - [data/person2.json~jsonpath, "$.persons[*]"]
sources: person-source: access: data/person.json referenceFormulation: jsonpath iterator: $ mapping: person: sources: person-source
sources
.
sources:
person-source:
access: data/person.json
referenceFormulation: jsonpath
iterator: $
person-source2:
access: data/person2.json
referenceFormulation: jsonpath
iterator: "$.person[*]"
mapping:
person:
sources:
- person-source
- person-source2
Besides defining targets at the root of the document, targets can also be defined inside a
mapping via the collection targets
. However, no unique key is specified for a target,
and, thus, it cannot be referred to from other mappings. The key-value to add to a target are the
same when defining targets at the root of the document. In the following example the mapping
person
has one target.
mapping:
person:
subjects:
- value: "http://example.org/{id}"
targets:
access: data/dump.ttl.gz
type: void
serialization: turtle
compression: gzip
A shortcut version of this example looks as follows.
mapping:
person:
subjects:
- value: "http://example.org/{id}"
targets:
- ["data/dump.ttl.gz~void", "turtle", "gzip"]
In case the output of a mapping needs to be exported to multiple targets, multiple targets
can be added to the targets collection. In the following example the person
mapping
has two data sources.
mapping:
person:
subjects:
- value: "http://example.org/{id}"
targets:
- access: data/dump1.nq
type: void
- access: data/dump2.nq
type: void
A shortcut version of this example looks as follows.
mapping:
person:
subjects:
- value: "http://example.org/{id}"
targets:
- ["data/dump1.nq~void"]
- ["data/dump2.nq~void"]
If you describe a target outside of the mappings, you can include via their unique key.
targets:
person-target:
access: data/dump.jsonld.gz
type: dcat
serialization: jsonld
compression: gzip
mapping:
person:
subjects:
- value: "http://example.org/{id}"
targets: person-target
Multiple targets can be used by using an array of target keys as value for the key targets
.
targets:
person-target1:
access: data/dump.jsonld.gz
type: dcat
serialization: jsonld
compression: gzip
person-target2:
access: data/dump2.rdf
type: void
serialization: rdfxml
mapping:
person:
subjects:
- value: "http://example.org/{id}"
targets:
- person-target1
- person-target2
A combination of both targets defined outside and inside a mapping is possible.
For every triple is required to define whether a IRI or blank node needs to used.
This information is added to a mapping via the collection subjects
.
In the case of an IRI, this collection contains 0 or more templates that specify the IRI.
In the case of a blank node, this collection is set to null
or is not specified at all.
In the following example the mapping person
generate IRI for the subjects based on the template http://wwww.example.com/person/$(id)
.
mappings: person: subjects: http://wwww.example.com/person/$(id)
person
generate subjects based on the templates http://wwww.example.com/person/$(id)
and http://www.test.com/$(firstname)
.
It is possible to apply functions on subjects.
person
generates combinations of predicates and objects,
where the predicate is foaf:firstName
and the object is the firstname
of each person.
mappings:
person:
predicateobjects:
- predicates: foaf:firstName
objects: $(firstname)
mappings: person: predicateobjects: - [foaf:firstName, $(firstname)]
mappings:
person:
predicateobjects:
- predicates: [foaf:name, rdfs:label]
objects: [$(firstname), $(lastname)]
mappings:
person:
predicateobjects:
- [[foaf:name, rdfs:label], [$(firstname), $(lastname)]]
mappings:
person:
predicateobjects:
- predicates: foaf:knows
objects:
value: $(colleague)
type: iri
mappings:
person:
predicateobjects:
- [[foaf:knows, rdfs:label], $(colleague)~iri]
mappings:
work:
predicateobjects:
- predicates: ex:createdBy
inversepredicates: ex:created
objects: $(foafprofile)
type: iri
mappings:
person:
predicateobjects:
- predicates: foaf:firstName
objects:
value: $(firstname)
datatype: xsd:string
mappings:
person:
predicateobjects:
- [foaf:firstName, $(firstname), xsd:string]
mappings:
person:
predicateobjects:
- predicates: foaf:name
objects:
- value: $(firstname)
datatype: ex:string
- value: $(lastname)
datatype: ex:anotherString
- predicates: rdfs:label
objects:
- value: $(firstname)
datatype: ex:string
- value: $(lastname)
datatype: ex:anotherString
mappings:
person:
predicateobjects:
- predicates: foaf:firstName
objects:
value: $(firstname)
language: en
mappings: person: predicateobjects: - [foaf:firstName, $(firstname), en~lang]
mappings:
person:
predicateobjects:
- predicates: foaf:name
objects:
- value: $(firstname)
language: en
- value: $(lastname)
language: nl
In certain use cases triples need to be generated between the records of two mappings.
For example, in the following example we have two mappings: persons and projects.
In the existing data there is for every person a field projectID
referring to the project
on which the person is working.
Therefore, we want to generate triples between every person and his/her project.
The objects
collection has an object with the key mapping
.
The value of this key refers to the mapping that provides the IRIs that will serve as object for the predicate-object combination.
Furthermore, a condition is added, so that only persons and projects are linked when they are actually related, based on the projectID
of the person and the ID
of the project.
Note that a condition is not required.
But when a condition is used an extra value can be given to a parameter of a function.
This is either s
or o
.
s
means that the value of the parameter is coming from the subject of the relationship, while
o
means that the value is coming from the object of the relationship.
The default value is s
.
In this example it would result in relationships between every person and their projects.
mappings:
person:
subjects: http://example.com/person/$(ID)
predicateobjects:
- predicates: foaf:worksFor
objects:
- mapping: project
condition:
function: equal
parameters:
- [str1, $(projectID), s]
- [str2, $(ID), o]
project:
subjects: http://example.com/project/$(ID)
mappings:
person:
predicateobjects:
- predicates: foaf:firstName
objects:
- function: ex:toLowerCase
parameters:
- parameter: ex:input
value: $(firstname)
mappings:
person:
predicateobjects:
- predicates: foaf:firstName
objects:
- function: ex:toLowerCase
parameters:
- [ex:input, $(firstname)]
mappings:
person:
predicateobjects:
- predicates: ex:age
objects:
- function: ex:double
parameters:
- [ex:input, $(age)]
datatype: xsd:integer
mappings:
person:
predicateobjects:
- predicates: schema:name
objects:
- function: ex:escape
parameters:
- parameter: ex:valueParameter
value:
function: ex:toUpperCase
parameters:
- [ex:valueParameter, $(name)]
- [ex:modeParameter, html]
(...)
),
every parameter-value pair is separated by a comma (,
), and
parameters are separated from their value by an equal sign (=
).
mappings:
person:
predicateobjects:
- predicates: foaf:firstName
objects:
- function: ex:toLowerCase(ex:input = $(firstname))
A subject or predicate-object combination is in certain cases only generated when a condition is fulfilled.
In the following example, the predicate-object is only generated when the firstname
is valid.
mappings:
person:
predicateobjects:
- predicates: foaf:firstName
objects: $(firstname)
condition:
function: ex:isValid
parameters:
- [ex:input, $(firstname)]
ID
.
mappings:
person:
subjects: http://example.com/{ID}
condition:
function: ex:isSet
parameters:
- [ex:input, $(ID)]
predicateobjects:
- predicates: foaf:firstName
objects: $(firstname)
It is possible to define references that do not refer to data in a data source.
These references are called "external references".
They are provided via the external
key that has
as value a list of references with their values.
In the following example two external references are defined:
name
and city
with as values John
and Ghent
.
external:
name: John
city: Ghent
mappings:
person:
subjects: http://example.org/$(id)
po:
- [ex:name, $(_name)]
- [ex:firstName, $(_name)]
- [ex:city, $(_city)]
Replacing the external references with their actual values results in the following.
mappings:
person:
s: http://example.org/$(id)
po:
- [ex:name, John]
- [ex:firstName, John]
- [ex:city, Ghent]
If the value for an external reference is not provided, then the reference is not replaced.
In the following example no value is provided for name
.
external:
city: Ghent
mappings:
person:
subjects: http://example.org/$(id)
po:
- [ex:name, $(_name)]
- [ex:firstName, $(_name)]
- [ex:city, $(_city)]
Replacing the remaining external reference with its actual value results in the following.
mappings:
person:
subjects: http://example.org/$(id)
po:
- [ex:name, $(_name)]
- [ex:firstName, $(_name)]
- [ex:city, Ghent]
$(_name)
is not replaced.
If you want use a reference as both a regular and an external reference,
you add a \
before the regular reference.
In the following example $(_name)
is an external reference and
$(\_name)
is a regular reference.
external:
name: John
mappings:
person:
subjects: http://example.org/$(id)
po:
- [ex:name, $(_name)]
- [ex:firstName, $(\_name)]
mappings:
person:
subjects: http://example.org/$(id)
po:
- [ex:name, John]
- [ex:firstName, $(_name)]
The YARRRML Parser is a reference implementation that generates [R2]RML rules based on YARRRML. The parser's code also includes tests to validate a parser's conformance to the YARRRML specification.