At the end of the tutorial you will be able to generate RDF from a Web API using RML rules.
We assume that you understand
There are two ways to complete this tutorial: you read the explanations and either
For the second option you need a tool that executes RML rules. We suggest to use the RMLMapper for this tutorial.
Consider the following response from the Web API "https://api.irail.be/stations/?format=xml":
<stations version="1.1" timestamp="1622209623">
<station locationX="4.336531"
locationY="50.835707"
id="BE.NMBS.008814001"
URI="http://irail.be/stations/NMBS/008814001"
standardname="Brussel-Zuid/Bruxelles-Midi">
Brussels-South/Brussels-Midi
</station>
<station locationX="0.12380800"
locationY="51.5304000"
id="BE.NMBS.007015400"
URI="http://irail.be/stations/NMBS/007015400"
standardname="London Saint Pancras International">
London Saint Pancras International
</station>
</stations>
Note: The actual response of the iRail Web API contains more than 2 stations, but these were omitted for readibility.
It contains the information about the Belgian railway stations. The location, id, name, locale name, and IRI are included. We want to annotate every station and generate the corresponding RDF triples.
For example, consider the first station described by the first XML object:
<station locationX="4.32571361"
locationY="51.2191923"
id="BE.NMBS.000000101"
URI="http://irail.be/stations/NMBS/000000101"
standardname="Brussels-Zuid/Bruxelles-Midi">
Brussels-South/Brussels-Midi
</station>
We want to generate the corresponding RDF triples for this object:
@prefix schema: <http://schema.org/> .
@prefix gtfs: <http://vocab.gtfs.org/terms#> .
@prefix geo: <http://www.w3.org/2003/01/geo/wgs84_pos#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
<http://irail.be/stations/NMBS/008814001> a gtfs:Station;
geo:latitude "50.835707"^^xsd:double ;
geo:longitude "4.336531"^^xsd:double ;
schema:name "Brussels-South/Brussels-Midi" ;
.
In the following sections we explain
Two sets of rules are needed:
In our example we need rules that define that:
@URI
attribute as IRI for the station.gtfs:Station
.@locationX
) as geo:longitude
.@locationY
) as geo:latitude
.text()
) as schema:name
.We write the RML rules in a Turtle document. RML rules are RDF themselves.
We add the following prefixes:
Prefix | Description |
---|---|
rml |
RML ontology |
rr |
The R2RML ontology, which is extended by RML |
ql |
The Query Language vocabulary, which is used together with RML |
rdf |
The RDF Concepts Vocabulary |
empty | The prefix used for our RML rules |
schema |
The schema.org vocabulary |
xsd |
The XML Schema ontology |
gtfs |
The GTFS ontology |
geo |
The WGS84 Geo Positioning ontology |
The last four are added because they are used for the classes and properties.
The prefixes are added in Turtle like this:
@prefix rml: <http://semweb.mmlab.be/ns/rml#> .
@prefix rr: <http://www.w3.org/ns/r2rml#> .
@prefix ql: <http://semweb.mmlab.be/ns/ql#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix td: <https://www.w3.org/2019/wot/td#> .
@prefix htv: <http://www.w3.org/2011/http#> .
@prefix hctl: <https://www.w3.org/2019/wot/hypermedia#> .
@prefix schema: <http://schema.org/> .
@prefix gtfs: <http://vocab.gtfs.org/terms#> .
@prefix geo: <http://www.w3.org/2003/01/geo/wgs84_pos#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@base <http://example.org/rules/> .
In our example, the data of the stations is retrieved from a Web API with a XML response. We add the following RML rules that define which Web API is used and how we iterate over the objects in it:
<#WoTWebAPISource> a td:PropertyAffordance;
td:hasForm [
# URL and content type
hctl:hasTarget "http://api.irail.be/stations?format=xml";
hctl:forContentType "application/xml";
# Read only
hctl:hasOperationType td:readproperty;
# Set HTTP method and headers
htv:methodName "GET";
htv:headers ([
htv:fieldName "User-Agent";
htv:fieldValue "RMLMapper";
]);
];
.
<#WoTWebAPI> a td:Thing;
td:hasPropertyAffordance <#WoTWebResource>;
.
<#TriplesMap> a rr:TriplesMap;
rml:logicalSource [ a rml:LogicalSource;
rml:source <#WoTWebAPISource>;
rml:referenceFormulation ql:XPath;
rml:iterator "/stations/station";
];
.
The different rules work as follows:
<#TriplesMap> a rr:TriplesMap;
defines the Triples Map
that groups all rules for the characters.<#TriplesMap> rml:logicalSource [ ... ]
contains
all rules about the Web API.
The class of the blank node is implicitly of the class rml:LogicalSource
.[rml:source <#WotWebAPI> ]
says that we access the Web API described using W3C Web of Things.[rml:referenceFormulation ql:XPath]
says that we use XPath the access the data in the Web API.[rml:iterator "/stations/station"]
says that we iterate over all objects
that match the XPath expression /stations/station
.[htv:fieldName "User-Agent"]
says that we want to configure the HTTP User-Agent
header.[htv:fieldValue "RMLMapper"]
says that the HTTP User-Agent
header must have the value RMLMapper
.We add the following rules that define how the subject IRI of a station is generated:
<#TriplesMap>
rr:subjectMap [
rml:reference "@URI";
];
.
The different rules work as follows:
<#TriplesMap> rr:subjectMap [ ... ]
contains all the rules about the subject of a triple.
The class of the blank node is implicitly of the class rr:SubjectMap
.[rml:reference "@URI"]
says that the IRI of the subject is retrieved from the XML URI
attribute.In our example we need to annotate every character with the class gtfs:Station
.
We add the following RML rules:
<#TriplesMap>
rr:predicateObjectMap [
rr:predicate rdf:type;
rr:object gtfs:Station;
];
.
The different rules work as follows:
<#TriplesMap> rr:predicateObjectMap [ ... ]
contains all the rules about a specific predicate of a triple.
The class of the blank node is implicitly of the class rr:PredicateObjectMap
.[rr:predicate rdf:type]
says that we use the predicate rdf:type
.[rr:objectMap [ ... ]]
contains all the rules about the object of a triple.
The class of the blank node is implicitly of the class rr:ObjectMap
.[rr:constant gtfs:Station]
says that the object of the triple is gtfs:Station
for every station.Putting all rules we have so far together results in
@prefix rml: <http://semweb.mmlab.be/ns/rml#> .
@prefix rr: <http://www.w3.org/ns/r2rml#> .
@prefix ql: <http://semweb.mmlab.be/ns/ql#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix td: <https://www.w3.org/2019/wot/td#> .
@prefix htv: <http://www.w3.org/2011/http#> .
@prefix hctl: <https://www.w3.org/2019/wot/hypermedia#> .
@prefix schema: <http://schema.org/> .
@prefix gtfs: <http://vocab.gtfs.org/terms#> .
@prefix geo: <http://www.w3.org/2003/01/geo/wgs84_pos#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@base <http://example.org/rules/> .
<#WoTWebAPISource> a td:PropertyAffordance;
td:hasForm [
# URL and content type
hctl:hasTarget "http://api.irail.be/stations?format=xml";
hctl:forContentType "application/xml";
# Read only
hctl:hasOperationType td:readproperty;
# Set HTTP method and headers
htv:methodName "GET";
htv:headers ([
htv:fieldName "User-Agent";
htv:fieldValue "RMLMapper";
]);
];
.
<#WoTWebAPI> a td:Thing;
td:hasPropertyAffordance <#WoTWebResource>;
.
<#TriplesMap> a rr:TriplesMap;
rml:logicalSource [ a rml:LogicalSource;
rml:source <#WoTWebAPISource>;
rml:referenceFormulation ql:XPath;
rml:iterator "/stations/station";
];
rr:subjectMap [
rml:reference "@URI";
];
rr:predicateObjectMap [
rr:predicate rdf:type;
rr:object gtfs:Station;
];
.
You can download the Turtle file here. If we execute these rules, the following triples are generated:
@prefix gtfs: <http://vocab.gtfs.org/terms#> .
<http://irail.be/stations/NMBS/008814001> a gtfs:Station .
<http://irail.be/stations/NMBS/008814001> a gtfs:Station .
Two triples are generated: one for each station.
There is a unique subject IRI for each station and
each station is annotated with the class gtfs:Station
.
In our example we need to annotate the values in the tags text()
with the property schema:name
.
We add the following rules:
<#TriplesMap>
rr:predicateObjectMap [
rr:predicate schema:name;
rr:objectMap [
rml:reference "text()";
];
];
.
The rules are different from when annotating with a class:
rml:reference
is used instead of rr:object
because
the object is not the same for every station.
More specific, [rml:reference "text()"]
says that the data
in the XML text object text()
is used for the object.
Putting all rules we have so far together results in
@prefix rml: <http://semweb.mmlab.be/ns/rml#> .
@prefix rr: <http://www.w3.org/ns/r2rml#> .
@prefix ql: <http://semweb.mmlab.be/ns/ql#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix td: <https://www.w3.org/2019/wot/td#> .
@prefix htv: <http://www.w3.org/2011/http#> .
@prefix hctl: <https://www.w3.org/2019/wot/hypermedia#> .
@prefix schema: <http://schema.org/> .
@prefix gtfs: <http://vocab.gtfs.org/terms#> .
@prefix geo: <http://www.w3.org/2003/01/geo/wgs84_pos#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@base <http://example.org/rules/> .
<#WoTWebAPISource> a td:PropertyAffordance;
td:hasForm [
# URL and content type
hctl:hasTarget "http://api.irail.be/stations?format=xml";
hctl:forContentType "application/xml";
# Read only
hctl:hasOperationType td:readproperty;
# Set HTTP method and headers
htv:methodName "GET";
htv:headers ([
htv:fieldName "User-Agent";
htv:fieldValue "RMLMapper";
]);
];
.
<#WoTWebAPI> a td:Thing;
td:hasPropertyAffordance <#WoTWebResource>;
.
<#TriplesMap> a rr:TriplesMap;
rml:logicalSource [ a rml:LogicalSource;
rml:source <#WoTWebAPISource>;
rml:referenceFormulation ql:XPath;
rml:iterator "/stations/station";
];
rr:subjectMap [
rml:reference "@URI";
];
rr:predicateObjectMap [
rr:predicate rdf:type;
rr:object gtfs:Station;
];
rr:predicateObjectMap [
rr:predicate schema:name;
rr:objectMap [
rml:reference "text()";
];
];
.
You can download the Turtle file here. If we execute these rules, the following triples are generated:
<http://irail.be/stations/NMBS/008814001> a gtfs:Station;
schema:name "Brussels-South/Brussels-Midi" .
Two triples are added to indicate the latitude and longitude of the station. We add the following rules to annotate the latitude and longitude in the same way as with the station's name:
<#TriplesMap>
rr:predicateObjectMap [
rr:predicate geo:latitude;
rr:objectMap [
rml:reference "@locationY";
];
];
rr:predicateObjectMap [
rr:predicate geo:longitude;
rr:objectMap [
rml:reference "@locationX";
];
];
.
The complete Turtle document with RML rules is
@prefix rml: <http://semweb.mmlab.be/ns/rml#> .
@prefix rr: <http://www.w3.org/ns/r2rml#> .
@prefix ql: <http://semweb.mmlab.be/ns/ql#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix td: <https://www.w3.org/2019/wot/td#> .
@prefix htv: <http://www.w3.org/2011/http#> .
@prefix hctl: <https://www.w3.org/2019/wot/hypermedia#> .
@prefix schema: <http://schema.org/> .
@prefix gtfs: <http://vocab.gtfs.org/terms#> .
@prefix geo: <http://www.w3.org/2003/01/geo/wgs84_pos#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@base <http://example.org/rules/> .
<#WoTWebAPISource> a td:PropertyAffordance;
td:hasForm [
# URL and content type
hctl:hasTarget "http://api.irail.be/stations?format=xml";
hctl:forContentType "application/xml";
# Read only
hctl:hasOperationType td:readproperty;
# Set HTTP method and headers
htv:methodName "GET";
htv:headers ([
htv:fieldName "User-Agent";
htv:fieldValue "RMLMapper";
]);
];
.
<#WoTWebAPI> a td:Thing;
td:hasPropertyAffordance <#WoTWebResource>;
.
<#TriplesMap> a rr:TriplesMap;
rml:logicalSource [ a rml:LogicalSource;
rml:source <#WoTWebAPISource>;
rml:referenceFormulation ql:XPath;
rml:iterator "/stations/station";
];
rr:subjectMap [
rml:reference "@URI";
];
rr:predicateObjectMap [
rr:predicate rdf:type;
rr:object gtfs:Station;
];
rr:predicateObjectMap [
rr:predicate schema:name;
rr:objectMap [
rml:reference "text()";
];
];
rr:predicateObjectMap [
rr:predicate geo:latitude;
rr:objectMap [
rml:reference "@locationY";
];
];
rr:predicateObjectMap [
rr:predicate geo:longitude;
rr:objectMap [
rml:reference "@locationX";
];
];
.
You can download the Turtle file here. If we execute these rules, the final triples are generated:
<http://irail.be/stations/NMBS/008814001> a gtfs:Station;
schema:name "Brussels-South/Brussels-Midi";
geo:latitude "50.835707";
geo:longitude "4.336531" .
<http://irail.be/stations/NMBS/007015400> a gtfs:Station;
schema:name "London Saint Pancras International";
geo:latitude "51.5304000";
geo:longitude "0.12380800" .
Note: The actual response of the iRail Web API contains more than 2 stations, but these were omitted for readibility. You can download the complete response as Turtle here.
Congratulations! You have created your own RML rules that generate RDF from XML data retrieved from a Web API. Nice work! We hope you now feel like you have a decent grasp on how RML rules work.
You can find more information about RML in its specification. There is also a human readable text-based representation available for RML rules called YARRRML. It is a subset of YAML, a widely used data serialization language designed to be human-friendly.
If you have questions or remarks, don't hesitate to contact us via email!