
Generating Linked Data with YARRRML: using targets
Before we start the tutorial
Learning objective
At the end of this tutorial, you'll be able to manually write YARRRML rules that output RDF to different locations, including local files and web resources with and without authentication.
Prerequisites
We assume that you have completed our getting started tutorial. For this tutorial, we use the method using Docker containers and use the following versions of our tools:
- YARRRML parser:
1.11.0
- RMLMapper:
7.3.3
We use the following bash script map.sh
on Linux:
#!/bin/bash
# Usage:
# - run in the directory where this script is located
# - supply the YARRRML file as the one and only argument
#
# Example:
# ./map.sh rules.yml
YARRRML_FILE=$1
RML_FILE=temp.rml.ttl
mkdir -p out
docker run --rm -it -v $(pwd):/data rmlio/yarrrml-parser:1.11.0 -i /data/${YARRRML_FILE} -o /data/${RML_FILE}
docker run --net=host --rm -it -v $(pwd):/data rmlio/rmlmapper-java:v7.3.3 -m /data/${RML_FILE}
Create a working directory on your local machine now,
download this script as map.sh
, and
make it executable via
chmod +x map.sh
If you're not on Linux, try to make a similar script that suits your environment.
Note that we ensure the existence of subdirectory out
,
which is the path of all local file outputs in this tutorial.
To read and write to web resources, we use an instance of the Community Solid Server with single pod, which we can run as well via a Docker container:
docker run --rm -v ./Solid:/data -p 3000:3000 -it solidproject/community-server:7.1.7 -c config/file-root-pod.json
Next, we need to add the files
people$.ttl,
people.acl,
people-private$.ttl, and
people-private.acl
to the directory Solid
in our working directory.
We can do this using curl
via
cd Solid
curl https://rml.io/yarrrml/tutorial/data/targets/people$.ttl -o people$.ttl
curl https://rml.io/yarrrml/tutorial/data/targets/people.acl -o people.acl
curl https://rml.io/yarrrml/tutorial/data/targets/people-private$.ttl -o people-private$.ttl
curl https://rml.io/yarrrml/tutorial/data/targets/people-private.acl -o people-private.acl
You don't need to know any details about how Solid and pods work. You only need to know that we can use a pod to host public and private web resources that are accessible via HTTP methods.
Concepts
Targets
A target is a collection of properties describing a location where the output of an RDF mapping engine goes to and how to access it.
In our getting started tutorial,
we had a single output location: the console output of the mapping engine.
Optionally, we could send that output to a local file,
for example, using RMLMapper's -o
option on the command line:
java -jar /path/to/rmlmapper.jar -m rules.rml.ttl -o outputfile.ttl
In this tutorial, we'll learn how to specify targets inside our YARRRML document and assign them to mappings:
- We can assign multiple targets to one mapping.
- We can assign one target to multiple mappings.
Basic targets
We'll use the term "basic target" throughout this tutorial where needed to distinguish targets resulting in output going to local files, optionally with compression, or web resources, optionally with authentication, from dynamic targets. The term is not normative.
Local file targets
A local file target is a target resulting in output going to local files.
HTTP request targets
An HTTP request target is a target resulting in output going to web resources, using HTTP methods. We distinguish two types of HTTP request targets:
- Direct HTTP request targets, which write to a web resource.
- Linked HTTP request targets, which write to a web resource that is linked to another web resource. We don't cover this type in this tutorial.
Dynamic targets
A dynamic target is a blueprint for basic targets. For a dynamic target, a YARRRML interpreter generates output that enables a mapping engine to compute a basic target, using one or more fields from each record in the source associated with the dynamic target. Thus, dynamic targets can result in as many output locations as there are records in its associated source.
Example
Consider the following columns from the CSV file people.csv:
person_id | firstname | lastname |
---|---|---|
0 | Natsu | Dragneel |
1 | Gray | Fullbuster |
2 | Gajeel | Redfox |
3 | Lucy | Heartfilia |
4 | Erza | Scarlet |
They contain information about five different characters, corresponding with the five rows, that appear in the same TV show. The information includes their id, first name, and last name. Also, consider the CSV file friends.csv. It contains information about who's friend with whom:
id-person | id-friend |
---|---|
0 | 2 |
1 | 3 |
1 | 4 |
We would like to annotate every character, link them to their friends, and generate the corresponding RDF triples.
Initial YARRRML document
Based on what we learned in the getting started tutorial, the following YARRRML document generates the aforementioned triples:
prefixes:
e: http://myontology.com/
ex: http://www.example.com/
schema: http://schema.org/
sources:
source-people: ['people.csv~csv']
source-friends: ['friends.csv~csv']
mappings:
people:
sources:
- source-people
s:
value: ex:$(person_id)
po:
- [a, schema:Person]
- [schema:givenName, $(firstname)]
- [schema:familyName, $(lastname)]
friends:
sources:
- source-friends
s:
value: ex:$(id-person)
po:
- [e:hasFriend, ex:$(id-friend)]
We use the predicate e:hasFriend
to express that a person has another person as a friend.
To produce RDF output,
we download the above YARRRML document,
save it to initial.yml
in our working directory,
download people.csv and friends.csv to our working directory,
and execute the mapping commands by calling our map.sh
script:
./map.sh initial.yml
Different output locations
We want to output our generated RDF triples from the people
and friends
mappings to different locations:
- a local file
/data/out/people.ttl
, serialized as Turtle. - a local file
/data/out/people-gz.ttl.gz
, serialized as Turtle and gzip-compressed. - a public web resource at URL http://localhost:3000/people, serialized as Turtle.
- a private web resource at URL http://localhost:3000/people-private, serialized as Turtle.
- multiple local files
/data/out/people-<x>.ttl
, serialized as Turtle, where<x>
matches a value of theperson_id
column ofpeople.csv
.
In the following sections, we explain what rules we need to achieve this, and how we write them using YARRRML.
What rules are needed
We need two sets of rules:
- rules that describe the output locations
- rules that link these output locations the existing mappings
In our example, we need rules that define:
- that the local file
/data/out/people.ttl
contains the generated triples, - that the triples of
/data/out/people.ttl
are serialized as Turtle, - that the local file
/data/out/people.ttl.gz
contains the same content as/data/out/people.ttl
and that it's gzip-compressed. - that the public web resource at URL http://localhost:3000/people contains the generated triples,
- that the triples of http://localhost:3000/people are serialized as Turtle,
- that the web resource at URL http://localhost:3000/people-private contains the generated triples,
- that the triples of http://localhost:3000/people-private are serialized as Turtle,
- the authentication required for writing to http://localhost:3000/people-private,
- that there should be multiple files
/data/out/people-<x>.ttl
, - that
/data/out/people-<x>.ttl
contains the triples that correspond with the row that has<x>
as value for theperson_id
column ofpeople.csv
.
How to output to local files
In our example,
we need to output our RDF to local files, both with and without compression.
We define the corresponding targets via the top-level targets
collection:
targets:
We group rules per target and we give them a unique key.
In our example, we use target-people
as the key for the rules that
define that the triples should be outputted to /data/out/people.ttl
,
serialized as Turtle:
targets:
target-people:
We need to define that the location of the file is /data/out/people.ttl
.
We do this via the key access
:
targets:
target-people:
access: /data/out/people.ttl
We can define that /data/out/people.ttl
is a local file
by using the key type
with the value localfile
.
But this is not required because if the value of access
is a path, then
the type
is implicitly set to localfile
.
How to define the serialization
We need to define that the serialization should be Turtle.
We do this via the key serialization
and value turtle
:
targets:
target-people:
access: /data/out/people.ttl
serialization: turtle
We can write this using an array notation as well:
targets:
target-people: [/data/out/people.ttl, turtle]
How to define compression
For the local file /data/out/people.ttl.gz
,
we use the key target-people-gz
,
set the value of access
to /data/out/people.ttl.gz
and
the value of serialization
to turtle
:
targets:
target-people-gz:
access: /data/out/people.ttl.gz
serialization: turtle
We define that we want compression via the key compression
.
In our example, we want gzip compression:
targets:
target-people-gz:
access: /data/out/people.ttl.gz
serialization: turtle
compression: gzip
We can write this using an array notation as well:
targets:
target-people-gz: [/data/out/people.ttl.gz, turtle, gzip]
How to link targets to mappings
In our example,
we need to link the mappings to the targets target-people
and target-people-gz
.
We do this by adding the targets to subject
mapping using the key targets
.
For the people
mapping, this results into:
mappings:
people:
sources:
- ['people.csv~csv']
s:
value: ex:$(person_id)
targets:
- target-people
- target-people-gz
It is also possible to use an inline style.
To do so, we describe our target specifications immediately as values of targets
of the subject,
rather than in the top-level targets
collection:
mappings:
people:
sources:
- ['people.csv~csv']
s:
value: ex:$(person_id)
targets:
- ['/data/out/people.ttl', 'turtle']
- ['/data/out/people-gz.ttl.gz', 'turtle', 'gzip']
This is convenient for targets that appear only in a single mapping.
Our complete YARRRML document looks like this:
prefixes:
e: http://myontology.com/
ex: http://www.example.com/
schema: http://schema.org/
sources:
source-people: ['people.csv~csv']
source-friends: ['friends.csv~csv']
targets:
target-people: ['/data/out/people.ttl', 'turtle']
target-people-gz: ['/data/out/people-gz.ttl.gz', 'turtle', 'gzip']
mappings:
people:
sources:
- source-people
s:
value: ex:$(person_id)
targets:
- target-people
- target-people-gz
po:
- [a, schema:Person]
- [schema:givenName, $(firstname)]
- [schema:familyName, $(lastname)]
friends:
sources:
- source-friends
s:
value: ex:$(id-person)
targets:
- target-people
- target-people-gz
po:
- [e:hasFriend, ex:$(id-friend)]
To produce RDF output,
we download
the above YARRRML document,
save it to targets-basic.yml
in our working directory,
and execute the mapping commands by calling our map.sh
script:
./map.sh targets-basic.yml
Now we can view the RDF output in our subdirectory out
:
cd out
ls -1
The content of the directory is:
people-gz.ttl.gz
people.ttl
Next, we gunzip the gzip file:
gunzip people-gz.ttl.gz
ls -1
Now, the content of the directory is:
people-gz.ttl
people.ttl
The contents of the files people.ttl
and people-gz.ttl
are identical and equal to:
<http://www.example.com/0> a <http://schema.org/Person>;
<http://myontology.com/hasFriend> "http://www.example.com/2";
<http://schema.org/familyName> "Dragneel";
<http://schema.org/givenName> "Natsu" .
<http://www.example.com/1> a <http://schema.org/Person>;
<http://myontology.com/hasFriend> "http://www.example.com/3", "http://www.example.com/4";
<http://schema.org/familyName> "Fullbuster";
<http://schema.org/givenName> "Gray" .
<http://www.example.com/2> a <http://schema.org/Person>;
<http://schema.org/familyName> "Redfox";
<http://schema.org/givenName> "Gajeel" .
<http://www.example.com/3> a <http://schema.org/Person>;
<http://schema.org/familyName> "Heartfilia";
<http://schema.org/givenName> "Lucy" .
<http://www.example.com/4> a <http://schema.org/Person>;
<http://schema.org/familyName> "Scarlet";
<http://schema.org/givenName> "Erza" .
How to output to web resources
In our example,
we don't only need to output RDF to local files,
but also to web resources.
Specifically, we need to define that the mapping engine stores the generated triples at http://localhost:3000/people.
We add another key-value to the top-level targets
collection to achieve this:
targets:
target-web:
type: directhttprequest
access: http://localhost:3000/people
methodName: PUT
headers:
- name: Content-Type
value: text/turtle
serialization: turtle
This description instructs the mapping engine to send an HTTP request
to http://localhost:3000/people,
using the PUT
method, with the Content-Type
header set to text/turtle
,
and with RDF, serialized as Turtle, in the body.
The default methodName
is PUT
and
a YARRRML interpreter derives the Content-Type
header value text/turtle
from serialization: turtle
,
so we can shorten this to:
targets:
target-web:
type: directhttprequest
access: http://localhost:3000/people
serialization: turtle
The complete YARRRML document looks like this:
prefixes:
e: http://myontology.com/
ex: http://www.example.com/
schema: http://schema.org/
sources:
source-people: ['people.csv~csv']
source-friends: ['friends.csv~csv']
targets:
target-people: ['/data/out/people.ttl', 'turtle']
target-people-gz: ['/data/out/people-gz.ttl.gz', 'turtle', 'gzip']
target-web:
type: directhttprequest
access: http://localhost:3000/people
serialization: turtle
mappings:
people:
sources:
- source-people
s:
value: ex:$(person_id)
targets:
- target-people
- target-people-gz
- target-web
po:
- [a, schema:Person]
- [schema:givenName, $(firstname)]
- [schema:familyName, $(lastname)]
friends:
sources:
- source-friends
s:
value: ex:$(id-person)
targets:
- target-people
- target-people-gz
- target-web
po:
- [e:hasFriend, ex:$(id-friend)]
To produce RDF output,
we download the above YARRRML document,
save it to targets-web.yml
in our working directory,
and execute the mapping commands by calling our map.sh
script:
./map.sh targets-web.yml
We can see the data that is now available at http://localhost:3000/people via
curl http://localhost:3000/people
The result is the same as the contents of the files people.ttl
and people-gz.ttl
:
<http://www.example.com/0> a <http://schema.org/Person>;
<http://myontology.com/hasFriend> "http://www.example.com/2";
<http://schema.org/familyName> "Dragneel";
<http://schema.org/givenName> "Natsu" .
<http://www.example.com/1> a <http://schema.org/Person>;
<http://myontology.com/hasFriend> "http://www.example.com/3", "http://www.example.com/4";
<http://schema.org/familyName> "Fullbuster";
<http://schema.org/givenName> "Gray" .
<http://www.example.com/2> a <http://schema.org/Person>;
<http://schema.org/familyName> "Redfox";
<http://schema.org/givenName> "Gajeel" .
<http://www.example.com/3> a <http://schema.org/Person>;
<http://schema.org/familyName> "Heartfilia";
<http://schema.org/givenName> "Lucy" .
<http://www.example.com/4> a <http://schema.org/Person>;
<http://schema.org/familyName> "Scarlet";
<http://schema.org/givenName> "Erza" .
How to output to web resources requiring authentication
In our example, we need to write to the private web resource at http://localhost:3000/people-private, which is hosted on a Solid pod. We can use Client Credentials Authentication because we use an instance of the Community Solid Server to host the pod. The details we need are
- WebID: http://localhost:3000/profile/card#me
- OIDC issuer: http://localhost:3000/
- email:
test@example.com
- password:
secret!
It's not required for this tutorial to know what a WebID and OIDC issuer are.
We add this information to the top-level collection authentications
using the key auth1
:
authentications:
auth1:
type: cssclientcredentials
email: test@example.com
password: secret!
webId: http://localhost:3000/profile/card#me
oidcIssuer: http://localhost:3000/
We also need to include a new target for the web resource http://localhost:3000/people-private
and define that the mapping engine needs to use the authentication information
we defined at auth1
using the authentication
key:
targets:
target-web-private:
type: directhttprequest
access: http://localhost:3000/people-private
serialization: turtle
authentication: auth1
The complete YARRRML document looks like this:
prefixes:
e: http://myontology.com/
ex: http://www.example.com/
schema: http://schema.org/
sources:
source-people: ['people.csv~csv']
source-friends: ['friends.csv~csv']
targets:
target-people: ['/data/out/people.ttl', 'turtle']
target-people-gz: ['/data/out/people-gz.ttl.gz', 'turtle', 'gzip']
target-web:
type: directhttprequest
access: http://localhost:3000/people
serialization: turtle
target-web-private:
type: directhttprequest
access: http://localhost:3000/people-private
serialization: turtle
authentication: auth1
authentications:
auth1:
type: cssclientcredentials
email: test@example.com
password: secret!
webId: http://localhost:3000/profile/card#me
oidcIssuer: http://localhost:3000/
mappings:
people:
sources:
- source-people
s:
value: ex:$(person_id)
targets:
- target-people
- target-people-gz
- target-web
- target-web-private
po:
- [a, schema:Person]
- [schema:givenName, $(firstname)]
- [schema:familyName, $(lastname)]
friends:
sources:
- source-friends
s:
value: ex:$(id-person)
targets:
- target-people
- target-people-gz
- target-web
- target-web-private
po:
- [e:hasFriend, ex:$(id-friend)]
To produce RDF output,
we download the above YARRRML document,
save it to targets-web-auth.yml
in our working directory,
and execute the mapping commands by calling our map.sh
script:
./map.sh targets-web-auth.yml
We can see the data that is now available at http://localhost:3000/people-private via
curl http://localhost:3000/people-private
Note that the pod requires authentication to write to the web resource, but that it doesn't require authentication to read the web resource. The result is the same as the contents of http://localhost:3000/people:
<http://www.example.com/0> a <http://schema.org/Person>;
<http://myontology.com/hasFriend> "http://www.example.com/2";
<http://schema.org/familyName> "Dragneel";
<http://schema.org/givenName> "Natsu" .
<http://www.example.com/1> a <http://schema.org/Person>;
<http://myontology.com/hasFriend> "http://www.example.com/3", "http://www.example.com/4";
<http://schema.org/familyName> "Fullbuster";
<http://schema.org/givenName> "Gray" .
<http://www.example.com/2> a <http://schema.org/Person>;
<http://schema.org/familyName> "Redfox";
<http://schema.org/givenName> "Gajeel" .
<http://www.example.com/3> a <http://schema.org/Person>;
<http://schema.org/familyName> "Heartfilia";
<http://schema.org/givenName> "Lucy" .
<http://www.example.com/4> a <http://schema.org/Person>;
<http://schema.org/familyName> "Scarlet";
<http://schema.org/givenName> "Erza" .
How to output to dynamic locations
In our example,
we need to add all the triples of one person to one local file.
Specifically, we need to output all triples of the person with the value <x>
in the column person_id
into the file /data/out/people-<x>.ttl
, serialized as Turtle.
We do this by using dynamic targets.
Dynamic targets have the same keys as basic targets, which are now templates instead of static strings,
and an extra one called source
.
In our example,
access
has the value /data/out/people-$(person_id).ttl
and
source
has the value source-people
:
targets:
target-people-dt:
source: source-people
access: /data/out/people-$(person_id).ttl
serialization: turtle
The complete YARRRML document looks like this:
prefixes:
e: http://myontology.com/
ex: http://www.example.com/
schema: http://schema.org/
sources:
source-people: ['people.csv~csv']
source-friends: ['friends.csv~csv']
targets:
target-people: ['/data/out/people.ttl', 'turtle']
target-people-gz: ['/data/out/people-gz.ttl.gz', 'turtle', 'gzip']
target-web:
type: directhttprequest
access: http://localhost:3000/people
serialization: turtle
target-web-private:
type: directhttprequest
access: http://localhost:3000/people-private
serialization: turtle
authentication: auth1
target-people-dt:
source: source-people
access: /data/out/people-$(person_id).ttl
serialization: turtle
authentications:
auth1:
type: cssclientcredentials
email: test@example.com
password: secret!
webId: http://localhost:3000/profile/card#me
oidcIssuer: http://localhost:3000/
mappings:
people:
sources:
- source-people
s:
value: ex:$(person_id)
targets:
- target-people
- target-people-gz
- target-web
- target-web-private
- target-people-dt
po:
- [a, schema:Person]
- [schema:givenName, $(firstname)]
- [schema:familyName, $(lastname)]
friends:
sources:
- source-friends
s:
value: ex:$(id-person)
targets:
- target-people
- target-people-gz
- target-web
- target-web-private
po:
- [e:hasFriend, ex:$(id-friend)]
To produce RDF output,
we download the above YARRRML document
save it to targets-dynamic.yml
in our working directory,
and execute the mapping commands by calling our map.sh
script:
./map.sh targets-dynamic.yml
Now we can view the RDF output in our subdirectory out
:
cd out
ls -1
The content of the directory is:
people-gz.ttl.gz
people.ttl
people-0.ttl
people-1.ttl
people-2.ttl
people-3.ttl
people-4.ttl
There are five files that start with people-
because there are five records in people.csv
with each a unique value for the column id
.
The contents of people-0.ttl
, containing the triples of the first record, is
<http://www.example.com/0> a <http://schema.org/Person>;
<http://schema.org/familyName> "Dragneel";
<http://schema.org/givenName> "Natsu" .
The contents of people-1.ttl
, containing the triples of the second record, is
<http://www.example.com/1> a <http://schema.org/Person>;
<http://schema.org/familyName> "Fullbuster";
<http://schema.org/givenName> "Gray" .
How to output to dynamic locations using different sources
In our example,
we need to output triples generated via the friends
mapping also to the files /data/out/people-$(person_id).ttl
.
We can't just add target-people-dt
to the targets of the subject of the friends
mapping,
because the sources of the target and the mapping are different.
We need to add the id
key to our target:
targets:
target-people-dt:
source: source-people
id: id-target-people-dt-$(person_id)
access: /data/out/people-$(person_id).ttl
serialization: turtle
We can now refer to the target via the value of the id
key.
For example,
if we use id-target-people-dt-$(id-person)
,
then the mapping engine uses the value of id-person
of the processed record of friends.csv
as
value for person_id
of the target's id
.
This will also determine the value for the access
key considering they both use person_id
in this target.
If we only look at the friends
mapping,
then we add the target as follows:
friends:
sources:
- source-friends
s:
value: ex:$(id-person)
targets:
- id-target-people-dt-$(id-person)
po:
- [e:hasFriend, ex:$(id-friend)]
The complete YARRRML document looks like this:
prefixes:
e: http://myontology.com/
ex: http://www.example.com/
schema: http://schema.org/
sources:
source-people: ['people.csv~csv']
source-friends: ['friends.csv~csv']
targets:
target-people: ['/data/out/people.ttl', 'turtle']
target-people-gz: ['/data/out/people-gz.ttl.gz', 'turtle', 'gzip']
target-web:
type: directhttprequest
access: http://localhost:3000/people
serialization: turtle
target-web-private:
type: directhttprequest
access: http://localhost:3000/people-private
serialization: turtle
authentication: auth1
target-people-dt:
source: source-people
id: id-target-people-dt-$(person_id)
access: /data/out/people-$(person_id).ttl
serialization: turtle
authentications:
auth1:
type: cssclientcredentials
email: test@example.com
password: secret!
webId: http://localhost:3000/profile/card#me
oidcIssuer: http://localhost:3000/
mappings:
people:
sources:
- source-people
s:
value: ex:$(person_id)
targets:
- target-people
- target-people-gz
- target-web
- target-web-private
- target-people-dt
po:
- [a, schema:Person]
- [schema:givenName, $(firstname)]
- [schema:familyName, $(lastname)]
friends:
sources:
- source-friends
s:
value: ex:$(id-person)
targets:
- target-people
- target-people-gz
- target-web
- target-web-private
- id-target-people-dt-$(id-person)
po:
- [e:hasFriend, ex:$(id-friend)]
To produce RDF output,
we download the above YARRRML document
save it to targets-dynamic-friends.yml
in our working directory,
and execute the mapping commands by calling our map.sh
script:
./map.sh targets-dynamic-friends.yml
Now we can view the RDF output in our subdirectory out
:
cd out
ls -1
The content of the directory is:
people-gz.ttl.gz
people.ttl
people-0.ttl
people-1.ttl
people-2.ttl
people-3.ttl
people-4.ttl
The contents of people-0.ttl
, containing the triples of the first record, is
<http://www.example.com/0> a <http://schema.org/Person>;
<http://myontology.com/hasFriend> "http://www.example.com/2";
<http://schema.org/familyName> "Dragneel";
<http://schema.org/givenName> "Natsu" .
Note that the mapping engine added the triple with the predicate http://myontology.com/hasFriend.
The contents of people-1.ttl
, containing the triples of the second record, is
<http://www.example.com/1> a <http://schema.org/Person>;
<http://myontology.com/hasFriend> "http://www.example.com/3", "http://www.example.com/4"
<http://schema.org/familyName> "Fullbuster";
<http://schema.org/givenName> "Gray" .
Wrapping up
Congratulations! You have created your own YARRRML rules that:
- output RDF to local files,
- compress the output,
- output RDF to web resources,
- use authentication, and
- output RDF to different local files in a dynamic way.
Nice work! We hope you now feel like you have a decent grasp on how targets in YARRRML work.
More information
You can find more information in the following: