Tutorial: Generate RDF from a Web API with XML responses

Table of Contents

1 Before we start the tutorial

1.1 What you learn

At the end of the tutorial you will be able to generate RDF from a Web API using RML rules.

1.2 What you need

We assume that you understand

1.3 How you use the tutorial

There are two ways to complete this tutorial: you read the explanations and either

For the second option you need a tool that executes RML rules. We suggest to use the RMLMapper for this tutorial.

2 Example

Consider the following response from the Web API "https://api.irail.be/stations/?format=xml":

<stations version="1.1" timestamp="1622209623">
  <station locationX="4.336531" 
           locationY="50.835707" 
           id="BE.NMBS.008814001"
           URI="http://irail.be/stations/NMBS/008814001"
           standardname="Brussel-Zuid/Bruxelles-Midi">
           Brussels-South/Brussels-Midi
  </station>
  <station locationX="0.12380800"
           locationY="51.5304000"
           id="BE.NMBS.007015400"
           URI="http://irail.be/stations/NMBS/007015400"
           standardname="London Saint Pancras International">
           London Saint Pancras International
  </station>
</stations>

Note: The actual response of the iRail Web API contains more than 2 stations, but these were omitted for readibility.

It contains the information about the Belgian railway stations. The location, id, name, locale name, and IRI are included. We want to annotate every station and generate the corresponding RDF triples.

For example, consider the first station described by the first XML object:

<station locationX="4.32571361" 
         locationY="51.2191923" 
         id="BE.NMBS.000000101"
         URI="http://irail.be/stations/NMBS/000000101"
         standardname="Brussels-Zuid/Bruxelles-Midi">
         Brussels-South/Brussels-Midi
</station>

We want to generate the corresponding RDF triples for this object:

@prefix schema: <http://schema.org/> .
@prefix gtfs: <http://vocab.gtfs.org/terms#> .
@prefix geo: <http://www.w3.org/2003/01/geo/wgs84_pos#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<http://irail.be/stations/NMBS/008814001> a gtfs:Station;
  geo:latitude "50.835707"^^xsd:double ;
  geo:longitude "4.336531"^^xsd:double ;
  schema:name "Brussels-South/Brussels-Midi" ;
.

In the following sections we explain

  1. what rules you need to generate these triples, and
  2. how you write them using RML.

3 What rules are needed

Two sets of rules are needed:

In our example we need rules that define that:

4 How to start a document with RML rules

We write the RML rules in a Turtle document. RML rules are RDF themselves.

We add the following prefixes:

Prefix Description
rml RML ontology
rr The R2RML ontology, which is extended by RML
ql The Query Language vocabulary, which is used together with RML
rdf The RDF Concepts Vocabulary
empty The prefix used for our RML rules
schema The schema.org vocabulary
xsd The XML Schema ontology
gtfs The GTFS ontology
geo The WGS84 Geo Positioning ontology

The last four are added because they are used for the classes and properties.

The prefixes are added in Turtle like this:

@prefix rml: <http://semweb.mmlab.be/ns/rml#> .
@prefix rr: <http://www.w3.org/ns/r2rml#> .
@prefix ql: <http://semweb.mmlab.be/ns/ql#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix td: <https://www.w3.org/2019/wot/td#> .
@prefix htv: <http://www.w3.org/2011/http#> .
@prefix hctl: <https://www.w3.org/2019/wot/hypermedia#> .
@prefix schema: <http://schema.org/> .
@prefix gtfs: <http://vocab.gtfs.org/terms#> .
@prefix geo: <http://www.w3.org/2003/01/geo/wgs84_pos#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@base <http://example.org/rules/> .

5 What data to use

In our example, the data of the stations is retrieved from a Web API with a XML response. We add the following RML rules that define which Web API is used and how we iterate over the objects in it:

<#WoTWebAPISource> a td:PropertyAffordance;
  td:hasForm [
    # URL and content type
    hctl:hasTarget "http://api.irail.be/stations?format=xml";
    hctl:forContentType "application/xml";
    # Read only
    hctl:hasOperationType td:readproperty;
    # Set HTTP method and headers
    htv:methodName "GET";
    htv:headers ([
      htv:fieldName "User-Agent";
      htv:fieldValue "RMLMapper";
    ]);
  ];
.

<#WoTWebAPI> a td:Thing;
  td:hasPropertyAffordance <#WoTWebResource>;
.

<#TriplesMap> a rr:TriplesMap;
  rml:logicalSource [ a rml:LogicalSource;
    rml:source <#WoTWebAPISource>;
    rml:referenceFormulation ql:XPath;
    rml:iterator "/stations/station";
  ];
.

The different rules work as follows:

6 How to generate subjects

We add the following rules that define how the subject IRI of a station is generated:

<#TriplesMap>
  rr:subjectMap [
    rml:reference "@URI";
  ];
.

The different rules work as follows:

7 How to generate predicates and objects

7.1 How to annotate with a class

In our example we need to annotate every character with the class gtfs:Station. We add the following RML rules:

<#TriplesMap>
  rr:predicateObjectMap [
    rr:predicate rdf:type;
    rr:object gtfs:Station;
  ];
.

The different rules work as follows:

Putting all rules we have so far together results in

@prefix rml: <http://semweb.mmlab.be/ns/rml#> .
@prefix rr: <http://www.w3.org/ns/r2rml#> .
@prefix ql: <http://semweb.mmlab.be/ns/ql#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix td: <https://www.w3.org/2019/wot/td#> .
@prefix htv: <http://www.w3.org/2011/http#> .
@prefix hctl: <https://www.w3.org/2019/wot/hypermedia#> .
@prefix schema: <http://schema.org/> .
@prefix gtfs: <http://vocab.gtfs.org/terms#> .
@prefix geo: <http://www.w3.org/2003/01/geo/wgs84_pos#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@base <http://example.org/rules/> .

<#WoTWebAPISource> a td:PropertyAffordance;
  td:hasForm [
    # URL and content type
    hctl:hasTarget "http://api.irail.be/stations?format=xml";
    hctl:forContentType "application/xml";
    # Read only
    hctl:hasOperationType td:readproperty;
    # Set HTTP method and headers
    htv:methodName "GET";
    htv:headers ([
      htv:fieldName "User-Agent";
      htv:fieldValue "RMLMapper";
    ]);
  ];
.

<#WoTWebAPI> a td:Thing;
  td:hasPropertyAffordance <#WoTWebResource>;
.

<#TriplesMap> a rr:TriplesMap;
  rml:logicalSource [ a rml:LogicalSource;
    rml:source <#WoTWebAPISource>;
    rml:referenceFormulation ql:XPath;
    rml:iterator "/stations/station";
  ];
  rr:subjectMap [
    rml:reference "@URI";
  ];
  rr:predicateObjectMap [
    rr:predicate rdf:type;
    rr:object gtfs:Station;
  ];
.

You can download the Turtle file here. If we execute these rules, the following triples are generated:

@prefix gtfs: <http://vocab.gtfs.org/terms#> .

<http://irail.be/stations/NMBS/008814001> a gtfs:Station .
<http://irail.be/stations/NMBS/008814001> a gtfs:Station .

Two triples are generated: one for each station. There is a unique subject IRI for each station and each station is annotated with the class gtfs:Station.

7.2 How to annotate with a property

In our example we need to annotate the values in the tags text() with the property schema:name. We add the following rules:

<#TriplesMap> 
  rr:predicateObjectMap [
    rr:predicate schema:name;
    rr:objectMap [
      rml:reference "text()";
    ];
  ];
.

The rules are different from when annotating with a class: rml:reference is used instead of rr:object because the object is not the same for every station. More specific, [rml:reference "text()"] says that the data in the XML text object text() is used for the object.

Putting all rules we have so far together results in

@prefix rml: <http://semweb.mmlab.be/ns/rml#> .
@prefix rr: <http://www.w3.org/ns/r2rml#> .
@prefix ql: <http://semweb.mmlab.be/ns/ql#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix td: <https://www.w3.org/2019/wot/td#> .
@prefix htv: <http://www.w3.org/2011/http#> .
@prefix hctl: <https://www.w3.org/2019/wot/hypermedia#> .
@prefix schema: <http://schema.org/> .
@prefix gtfs: <http://vocab.gtfs.org/terms#> .
@prefix geo: <http://www.w3.org/2003/01/geo/wgs84_pos#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@base <http://example.org/rules/> .

<#WoTWebAPISource> a td:PropertyAffordance;
  td:hasForm [
    # URL and content type
    hctl:hasTarget "http://api.irail.be/stations?format=xml";
    hctl:forContentType "application/xml";
    # Read only
    hctl:hasOperationType td:readproperty;
    # Set HTTP method and headers
    htv:methodName "GET";
    htv:headers ([
      htv:fieldName "User-Agent";
      htv:fieldValue "RMLMapper";
    ]);
  ];
.

<#WoTWebAPI> a td:Thing;
  td:hasPropertyAffordance <#WoTWebResource>;
.

<#TriplesMap> a rr:TriplesMap;
  rml:logicalSource [ a rml:LogicalSource;
    rml:source <#WoTWebAPISource>;
    rml:referenceFormulation ql:XPath;
    rml:iterator "/stations/station";
  ];
  rr:subjectMap [
    rml:reference "@URI";
  ];
  rr:predicateObjectMap [
    rr:predicate rdf:type;
    rr:object gtfs:Station;
  ];
  rr:predicateObjectMap [
    rr:predicate schema:name;
    rr:objectMap [
      rml:reference "text()";
    ];
  ];
.

You can download the Turtle file here. If we execute these rules, the following triples are generated:

<http://irail.be/stations/NMBS/008814001> a gtfs:Station;
  schema:name "Brussels-South/Brussels-Midi" .

Two triples are added to indicate the latitude and longitude of the station. We add the following rules to annotate the latitude and longitude in the same way as with the station's name:

<#TriplesMap>
  rr:predicateObjectMap [
    rr:predicate geo:latitude;
    rr:objectMap [
      rml:reference "@locationY";
    ];
  ];
  rr:predicateObjectMap [
    rr:predicate geo:longitude;
    rr:objectMap [
      rml:reference "@locationX";
    ];
  ];
.

8 Complete Turtle document with RML rules

The complete Turtle document with RML rules is

@prefix rml: <http://semweb.mmlab.be/ns/rml#> .
@prefix rr: <http://www.w3.org/ns/r2rml#> .
@prefix ql: <http://semweb.mmlab.be/ns/ql#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix td: <https://www.w3.org/2019/wot/td#> .
@prefix htv: <http://www.w3.org/2011/http#> .
@prefix hctl: <https://www.w3.org/2019/wot/hypermedia#> .
@prefix schema: <http://schema.org/> .
@prefix gtfs: <http://vocab.gtfs.org/terms#> .
@prefix geo: <http://www.w3.org/2003/01/geo/wgs84_pos#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@base <http://example.org/rules/> .

<#WoTWebAPISource> a td:PropertyAffordance;
  td:hasForm [
    # URL and content type
    hctl:hasTarget "http://api.irail.be/stations?format=xml";
    hctl:forContentType "application/xml";
    # Read only
    hctl:hasOperationType td:readproperty;
    # Set HTTP method and headers
    htv:methodName "GET";
    htv:headers ([
      htv:fieldName "User-Agent";
      htv:fieldValue "RMLMapper";
    ]);
  ];
.

<#WoTWebAPI> a td:Thing;
  td:hasPropertyAffordance <#WoTWebResource>;
.

<#TriplesMap> a rr:TriplesMap;
  rml:logicalSource [ a rml:LogicalSource;
    rml:source <#WoTWebAPISource>;
    rml:referenceFormulation ql:XPath;
    rml:iterator "/stations/station";
  ];
  rr:subjectMap [
    rml:reference "@URI";
  ];
  rr:predicateObjectMap [
    rr:predicate rdf:type;
    rr:object gtfs:Station;
  ];
  rr:predicateObjectMap [
    rr:predicate schema:name;
    rr:objectMap [
      rml:reference "text()";
    ];
  ];
  rr:predicateObjectMap [
    rr:predicate geo:latitude;
    rr:objectMap [
      rml:reference "@locationY";
    ];
  ];
  rr:predicateObjectMap [
    rr:predicate geo:longitude;
    rr:objectMap [
      rml:reference "@locationX";
    ];
  ];
.

You can download the Turtle file here. If we execute these rules, the final triples are generated:

<http://irail.be/stations/NMBS/008814001> a gtfs:Station;
  schema:name "Brussels-South/Brussels-Midi";
  geo:latitude "50.835707";
  geo:longitude "4.336531" .
<http://irail.be/stations/NMBS/007015400> a gtfs:Station;
  schema:name "London Saint Pancras International";
  geo:latitude "51.5304000";
  geo:longitude "0.12380800" .

Note: The actual response of the iRail Web API contains more than 2 stations, but these were omitted for readibility. You can download the complete response as Turtle here.

9 Wrapping up

Congratulations! You have created your own RML rules that generate RDF from XML data retrieved from a Web API. Nice work! We hope you now feel like you have a decent grasp on how RML rules work.

10 More information

You can find more information about RML in its specification. There is also a human readable text-based representation available for RML rules called YARRRML. It is a subset of YAML, a widely used data serialization language designed to be human-friendly.

If you have questions or remarks, don't hesitate to contact us via email!