Tutorial: Generate RDF from a TSV file

Table of Contents

1 Before we start the tutorial

1.1 What you learn

At the end of the tutorial you will be able to generate RDF from a TSV file using RML rules.

1.2 What you need

We assume that you understand

1.3 How you use the tutorial

There are two ways to complete this tutorial: you read the explanations and either

For the second option you need a tool that executes RML rules. Suggestions are the RMLMapper and the RMLStreamer.

2 Example

Consider the following TSV file called "characters.tsv":

id firstname lastname hair
0 Ash Ketchum black
1 Misty  orange

It contains the information about two different characters. The id, first name, last name, and hair color are included. The latter two are optional. We want to annotate every character and generate the corresponding RDF triples.

For example, consider the character described by the first row:

0 Ash Ketchum black

We want to generate the corresponding RDF triples for this row:

@prefix schema: <http://schema.org/> .
@prefix dbo: <http://dbpedia.org/ontology/> .
@prefix character: <http://example.org/character/> .

character:0 a schema:Person;
  schema:givenName "Ash";
  schema:lastName "Ketchum";
  dbo:hairColor "black".

In the following sections we explain

  1. what rules you need to generate these triples, and
  2. how you write them using RML.

3 What rules are needed

Two sets of rules are needed:

In our example we need rules that define that:

4 How to start a document with RML rules

We write the RML rules in a Turtle document. RML rules are RDF themselves.

We add the following prefixes:

Prefix Description
rml RML ontology
rr The R2RML ontology, which is extended by RML
ql The Query Language vocabulary, which is used together with RML
csvw The CSVW Vocabulary, which is used to describe the TSV file
rdf The RDF Concepts Vocabulary
empty The prefix used for our RML rules
schema The schema.org vocabulary
dbo The DBpedia ontology

The last two are added because they are used for the classes and properties.

The prefixes are added in Turtle like this:

@prefix rml: <http://semweb.mmlab.be/ns/rml#> .
@prefix rr: <http://www.w3.org/ns/r2rml#> .
@prefix ql: <http://semweb.mmlab.be/ns/ql#> .
@prefix csvw: <http://www.w3.org/ns/csvw#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix : <http://example.org/rules/> .
@prefix schema: <http://schema.org/> .
@prefix dbo: <http://dbpedia.org/ontology/> .

5 What data to use

In our example the data of the characters is stored in a TSV file. We add the following RML rules that define what TSV file is used:

:TriplesMap a rr:TriplesMap;
  rml:logicalSource [
    rml:source [
      a csvw:Table;
      csvw:url "characters.tsv";
      csvw:dialect [
        a csvw:Dialect;
        csvw:delimiter "\t"
      ]
    ];
    rml:referenceFormulation ql:CSV
  ].

The different rules work as follows:

Note that we access a TSV file as if it is a CSV file because we consider it a CSV file with a different delimiter.

6 How to generate subjects

We add the following rules that define how the subject IRI of a character is generated:

:TriplesMap rr:subjectMap [
  rr:template "http://example.org/character/{id}"
].

The different rules work as follows:

7 How to generate predicates and objects

7.1 How to annotate with a class

In our example we need to annotate every character with the class schema:Person. We add the following RML rules:

:TriplesMap rr:predicateObjectMap [
  rr:predicate rdf:type;
  rr:objectMap [ rr:constant schema:Person ];
].

The different rules work as follows:

Putting all rules we have so far together results in

@prefix rml: <http://semweb.mmlab.be/ns/rml#> .
@prefix rr: <http://www.w3.org/ns/r2rml#> .
@prefix ql: <http://semweb.mmlab.be/ns/ql#> .
@prefix csvw: <http://www.w3.org/ns/csvw#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix : <http://example.org/rules/> .
@prefix schema: <http://schema.org/> .
@prefix dbo: <http://dbpedia.org/ontology/> .

:TriplesMap a rr:TriplesMap;
  rml:logicalSource [
    rml:source [
      a csvw:Table;
      csvw:url "characters.tsv";
      csvw:dialect [ 
        a csvw:Dialect;
        csvw:delimiter "\t"
      ]
    ];
    rml:referenceFormulation ql:CSV
  ].

:TriplesMap rr:subjectMap [
  rr:template "http://example.org/character/{id}"
].

:TriplesMap rr:predicateObjectMap [
  rr:predicate rdf:type;
  rr:objectMap [
    rr:constant schema:Person
  ]
].

You can download the Turtle file here. If we execute these rules, the following triples are generated:

@prefix schema: <http://schema.org/> .

<http://example.org/character/0> a schema:Person .
<http://example.org/character/1> a schema:Person .

Two triples are generated: one for each character. There is a unique subject IRI for each character and each character is annotated with the class schema:Person.

7.2 How to annotate with a property

In our example we need to annotate the values in the column firstname with the property schema:givenName. We add the following rules:

:TriplesMap rr:predicateObjectMap [
  rr:predicate schema:givenName;
  rr:objectMap [
    rml:reference "firstname"
  ]
].

The rules are different from when annotating with a class: rml:reference is used instead of rr:constant because the object is not the same for every character. More specific, [rml:reference "firstname"] says that the data in the column firstname is used for the object.

Putting all rules we have so far together results in

@prefix rml: <http://semweb.mmlab.be/ns/rml#> .
@prefix rr: <http://www.w3.org/ns/r2rml#> .
@prefix ql: <http://semweb.mmlab.be/ns/ql#> .
@prefix csvw: <http://www.w3.org/ns/csvw#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix : <http://example.org/rules/> .
@prefix schema: <http://schema.org/> .
@prefix dbo: <http://dbpedia.org/ontology/> .

:TriplesMap a rr:TriplesMap;
  rml:logicalSource [
    rml:source [
      a csvw:Table;
      csvw:url "characters.tsv";
      csvw:dialect [ 
        a csvw:Dialect;
        csvw:delimiter "\t"
      ]
    ];
    rml:referenceFormulation ql:CSV
  ].

:TriplesMap rr:subjectMap [
  rr:template "http://example.org/character/{id}"
].

:TriplesMap rr:predicateObjectMap [
  rr:predicate rdf:type;
  rr:objectMap [
   rr:constant schema:Person
 ]
].

:TriplesMap rr:predicateObjectMap [
  rr:predicate schema:givenName;
  rr:objectMap [
    rml:reference "firstname"
  ]
].

You can download the Turtle file here. If we execute these rules, the following triples are generated:

@prefix schema: <http://schema.org/> .

<http://example.org/character/0> a schema:Person;
  schema:givenName "Ash" .

<http://example.org/character/1> a schema:Person;
  schema:givenName "Misty" .

Two triples are added: one for the first name of each character.

We add the following rules to annotate the last name and the hair color in the same way as the first name:

:TriplesMap rr:predicateObjectMap [
  rr:predicate schema:lastName;
  rr:objectMap [
    rml:reference "lastname"
  ]
].

:TriplesMap rr:predicateObjectMap [
  rr:predicate dbo:hairColor;
  rr:objectMap [
    rml:reference "hair"
  ]
].

8 Complete Turtle document with RML rules

The complete Turtle document with RML rules is

@prefix rml: <http://semweb.mmlab.be/ns/rml#> .
@prefix rr: <http://www.w3.org/ns/r2rml#> .
@prefix ql: <http://semweb.mmlab.be/ns/ql#> .
@prefix csvw: <http://www.w3.org/ns/csvw#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix : <http://example.org/rules/> .
@prefix schema: <http://schema.org/> .
@prefix dbo: <http://dbpedia.org/ontology/> .

:TriplesMap a rr:TriplesMap;
  rml:logicalSource [
    rml:source [
      a csvw:Table;
      csvw:url "characters.tsv";
      csvw:dialect [ 
        a csvw:Dialect;
        csvw:delimiter "\t"
      ]
    ];
    rml:referenceFormulation ql:CSV
  ].

:TriplesMap rr:subjectMap [
  rr:template "http://example.org/character/{id}"
].

:TriplesMap rr:predicateObjectMap [
  rr:predicate rdf:type;
  rr:objectMap [
   rr:constant schema:Person
 ]
].

:TriplesMap rr:predicateObjectMap [
  rr:predicate schema:givenName;
  rr:objectMap [
    rml:reference "firstname"
  ]
].

:TriplesMap rr:predicateObjectMap [
  rr:predicate schema:lastName;
  rr:objectMap [
    rml:reference "lastname"
  ]
].

:TriplesMap rr:predicateObjectMap [
  rr:predicate dbo:hairColor;
  rr:objectMap [
    rml:reference "hair"
  ]
].

You can download the Turtle file here. If we execute these rules, the final triples are generated:

@prefix dbo: <http://dbpedia.org/ontology/> .
@prefix schema: <http://schema.org/> .

<http://example.org/character/0> a schema:Person;
  dbo:hairColor "black";
  schema:givenName "Ash";
  schema:lastName "Ketchum" .

<http://example.org/character/1> a schema:Person;
  dbo:hairColor "orange";
  schema:givenName "Misty";
  schema:lastName "" .

9 Wrapping up

Congratulations! You have created your own RML rules that generate RDF from data in a TSV file. Nice work! We hope you now feel like you have a decent grasp on how RML rules work.

10 More information

You can find more information about RML in its specification. There is also a human readable text-based representation available for RML rules called YARRRML. It is a subset of YAML, a widely used data serialization language designed to be human-friendly.

If you have questions or remarks, don't hesitate to contact us via email!