At the end of the tutorial you will be able to generate RDF from an XML file using RML rules.
We assume that you understand
There are two ways to complete this tutorial: you read the explanations and either
For the second option you need a tool that executes RML rules. Suggestions are the RMLMapper and the RMLStreamer.
Consider the following XML file called "characters.xml":
<?xml version="1.0" encoding="UTF-8"?>
<characters>
<character id="0">
<firstname>Ash</firstname>
<lastname>Ketchum</lastname>
<hair>black</hair>
</character>
<character id="1">
<firstname>Misty</firstname>
<hair>orange</hair>
</character>
</characters>
It contains the information about two different characters. The id, first name, last name, and hair color are included. The latter two are optional. We want to annotate every character and generate the corresponding RDF triples.
For example, consider the character described by the first XML element:
<character id="0">
<firstname>Ash</firstname>
<lastname>Ketchum</lastname>
<hair>black</hair>
</character>
We want to generate the corresponding RDF triples for this element:
@prefix schema: <http://schema.org/> .
@prefix dbo: <http://dbpedia.org/ontology/> .
@prefix character: <http://example.org/character/> .
character:0 a schema:Person;
schema:givenName "Ash";
schema:lastName "Ketchum";
dbo:hairColor "black".
In the following sections we explain
Two sets of rules are needed:
In our example we need rules that define that:
http://example.org/character/
with the character's id.schema:Person
.schema:givenName
.schema:lastName
.dbo:hairColor
.We write the RML rules in a Turtle document. RML rules are RDF themselves.
We add the following prefixes:
Prefix | Description |
---|---|
rml |
RML ontology |
rr |
The R2RML ontology, which is extended by RML |
ql |
The Query Language vocabulary, which is used together with RML |
rdf |
The RDF Concepts Vocabulary |
empty | The prefix used for our RML rules |
schema |
The schema.org vocabulary |
dbo |
The DBpedia ontology |
The last two are added because they are used for the classes and properties.
The prefixes are added in Turtle like this:
@prefix rml: <http://semweb.mmlab.be/ns/rml#> .
@prefix rr: <http://www.w3.org/ns/r2rml#> .
@prefix ql: <http://semweb.mmlab.be/ns/ql#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix : <http://example.org/rules/> .
@prefix schema: <http://schema.org/> .
@prefix dbo: <http://dbpedia.org/ontology/> .
In our example the data of the characters is stored in an XML file. We add the following RML rules that define what XML file is used and how we iterate over the elements in it:
:TriplesMap a rr:TriplesMap;
rml:logicalSource [
rml:source "characters.xml";
rml:referenceFormulation ql:XPath;
rml:iterator "/characters/character"
].
The different rules work as follows:
:TriplesMap a rr:TriplesMap;
defines the Triples Map
that groups all rules for the characters.:TriplesMap rml:logicalSource [ ... ]
contains
all rules about the XML file.
The class of the blank node is implicitly of the class rml:LogicalSource
.[rml:source "characters.xml"]
says that we access the XML file characters.xml
.[rml:referenceFormulation ql:XPath]
says that we use XPath the access the data in the XML file.[rml:iterator "/characters/character"]
says that we iterate over all elements
that match the XPath expression /characters/character
.We add the following rules that define how the subject IRI of a character is generated:
:TriplesMap rr:subjectMap [
rr:template "http://example.org/character/{@id}"
].
The different rules work as follows:
:TriplesMap rr:subjectMap [ ... ]
contains all the rules about the subject of a triple.
The class of the blank node is implicitly of the class rr:SubjectMap
.[rr:template "http://example.org/character/{@id}"]
says that the IRI of the subject
is generated by concatenating http://example.org/character/
with the attribute id
of the character element.In our example we need to annotate every character with the class schema:Person
.
We add the following RML rules:
:TriplesMap rr:predicateObjectMap [
rr:predicate rdf:type;
rr:objectMap [ rr:constant schema:Person ];
].
The different rules work as follows:
:TriplesMap rr:predicateObjectMap [ ... ]
contains all the rules about a specific predicate of a triple.
The class of the blank node is implicitly of the class rr:PredicateObjectMap
.[rr:predicate rdf:type]
says that we use the predicate rdf:type
.[rr:objectMap [ ... ]]
contains all the rules about the object of a triple.
The class of the blank node is implicitly of the class rr:ObjectMap
.[rr:constant schema:Person]
says that the object of the triple is schema:Person
for every character.Putting all rules we have so far together results in
@prefix rml: <http://semweb.mmlab.be/ns/rml#> .
@prefix rr: <http://www.w3.org/ns/r2rml#> .
@prefix ql: <http://semweb.mmlab.be/ns/ql#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix : <http://example.org/rules/> .
@prefix schema: <http://schema.org/> .
@prefix dbo: <http://dbpedia.org/ontology/> .
:TriplesMap a rr:TriplesMap;
rml:logicalSource [
rml:source "characters.xml";
rml:referenceFormulation ql:XPath;
rml:iterator "/characters/character"
].
:TriplesMap rr:subjectMap [
rr:template "http://example.org/character/{@id}"
].
:TriplesMap rr:predicateObjectMap [
rr:predicate rdf:type;
rr:objectMap [
rr:constant schema:Person
]
].
You can download the Turtle file here. If we execute these rules, the following triples are generated:
@prefix schema: <http://schema.org/> .
<http://example.org/character/0> a schema:Person .
<http://example.org/character/1> a schema:Person .
Two triples are generated: one for each character.
There is a unique subject IRI for each character and
each character is annotated with the class schema:Person
.
In our example we need to annotate the values in the tags firstname
with the property schema:givenName
.
We add the following rules:
:TriplesMap rr:predicateObjectMap [
rr:predicate schema:givenName;
rr:objectMap [
rml:reference "firstname"
]
].
The rules are different from when annotating with a class:
rml:reference
is used instead of rr:constant
because
the object is not the same for every character.
More specific,
[rml:reference "firstname"]
says that the data in the tag firstname
is used for the object.
Putting all rules we have so far together results in
@prefix rml: <http://semweb.mmlab.be/ns/rml#> .
@prefix rr: <http://www.w3.org/ns/r2rml#> .
@prefix ql: <http://semweb.mmlab.be/ns/ql#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix : <http://example.org/rules/> .
@prefix schema: <http://schema.org/> .
@prefix dbo: <http://dbpedia.org/ontology/> .
:TriplesMap a rr:TriplesMap;
rml:logicalSource [
rml:source "characters.xml";
rml:referenceFormulation ql:XPath;
rml:iterator "/characters/character"
].
:TriplesMap rr:subjectMap [
rr:template "http://example.org/character/{@id}"
].
:TriplesMap rr:predicateObjectMap [
rr:predicate rdf:type;
rr:objectMap [
rr:constant schema:Person
]
].
:TriplesMap rr:predicateObjectMap [
rr:predicate schema:givenName;
rr:objectMap [
rml:reference "firstname"
]
].
You can download the Turtle file here. If we execute these rules, the following triples are generated:
@prefix schema: <http://schema.org/> .
<http://example.org/character/0> a schema:Person;
schema:givenName "Ash" .
<http://example.org/character/1> a schema:Person;
schema:givenName "Misty" .
Two triples are added: one for the first name of each character.
We add the following rules to annotate the last name and the hair color in the same way as the first name:
:TriplesMap rr:predicateObjectMap [
rr:predicate schema:lastName;
rr:objectMap [
rml:reference "lastname"
]
].
:TriplesMap rr:predicateObjectMap [
rr:predicate dbo:hairColor;
rr:objectMap [
rml:reference "hair"
]
].
The complete Turtle document with RML rules is
@prefix rml: <http://semweb.mmlab.be/ns/rml#> .
@prefix rr: <http://www.w3.org/ns/r2rml#> .
@prefix ql: <http://semweb.mmlab.be/ns/ql#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix : <http://example.org/rules/> .
@prefix schema: <http://schema.org/> .
@prefix dbo: <http://dbpedia.org/ontology/> .
:TriplesMap a rr:TriplesMap;
rml:logicalSource [
rml:source "characters.xml";
rml:referenceFormulation ql:XPath;
rml:iterator "/characters/character"
].
:TriplesMap rr:subjectMap [
rr:template "http://example.org/character/{@id}"
].
:TriplesMap rr:predicateObjectMap [
rr:predicate rdf:type;
rr:objectMap [
rr:constant schema:Person
]
].
:TriplesMap rr:predicateObjectMap [
rr:predicate schema:givenName;
rr:objectMap [
rml:reference "firstname"
]
].
:TriplesMap rr:predicateObjectMap [
rr:predicate schema:lastName;
rr:objectMap [
rml:reference "lastname"
]
].
:TriplesMap rr:predicateObjectMap [
rr:predicate dbo:hairColor;
rr:objectMap [
rml:reference "hair"
]
].
You can download the Turtle file here. If we execute these rules, the final triples are generated:
@prefix dbo: <http://dbpedia.org/ontology/> .
@prefix schema: <http://schema.org/> .
<http://example.org/character/0> a schema:Person;
dbo:hairColor "black";
schema:givenName "Ash";
schema:lastName "Ketchum" .
<http://example.org/character/1> a schema:Person;
dbo:hairColor "orange";
schema:givenName "Misty" .
Congratulations! You have created your own RML rules that generate RDF from data in an XML file. Nice work! We hope you now feel like you have a decent grasp on how RML rules work.
You can find more information about RML in its specification. There is also a human readable text-based representation available for RML rules called YARRRML. It is a subset of YAML, a widely used data serialization language designed to be human-friendly.
If you have questions or remarks, don't hesitate to contact us via email!