Generating Linked Data with YARRRML: using targets

Before we start the tutorial

Learning objective

At the end of this tutorial, you'll be able to manually write YARRRML rules that output RDF to different locations, including local files and web resources with and without authentication.

Prerequisites

We assume that you have completed our getting started tutorial. For this tutorial, we use the method using Docker containers and use the following versions of our tools:

  • YARRRML parser: 1.11.0
  • RMLMapper: 7.3.3

We use the following bash script map.sh on Linux:

#!/bin/bash
# Usage:
# - run in the directory where this script is located
# - supply the YARRRML file as the one and only argument
#
# Example:
#   ./map.sh rules.yml

YARRRML_FILE=$1
RML_FILE=temp.rml.ttl

mkdir -p out
docker run --rm -it -v $(pwd):/data rmlio/yarrrml-parser:1.11.0 -i /data/${YARRRML_FILE} -o /data/${RML_FILE}
docker run --net=host --rm -it -v $(pwd):/data rmlio/rmlmapper-java:v7.3.3 -m /data/${RML_FILE}

Create a working directory on your local machine now, download this script as map.sh, and make it executable via

chmod +x map.sh

If you're not on Linux, try to make a similar script that suits your environment. Note that we ensure the existence of subdirectory out, which is the path of all local file outputs in this tutorial.

To read and write to web resources, we use an instance of the Community Solid Server with single pod, which we can run as well via a Docker container:

docker run --rm -v ./Solid:/data -p 3000:3000 -it solidproject/community-server:7.1.7 -c config/file-root-pod.json

Next, we need to add the files people$.ttl, people.acl, people-private$.ttl, and people-private.acl to the directory Solid in our working directory. We can do this using curl via

cd Solid
curl https://rml.io/yarrrml/tutorial/data/targets/people$.ttl -o people$.ttl
curl https://rml.io/yarrrml/tutorial/data/targets/people.acl -o people.acl
curl https://rml.io/yarrrml/tutorial/data/targets/people-private$.ttl -o people-private$.ttl
curl https://rml.io/yarrrml/tutorial/data/targets/people-private.acl -o people-private.acl

You don't need to know any details about how Solid and pods work. You only need to know that we can use a pod to host public and private web resources that are accessible via HTTP methods.

Concepts

Targets

A target is a collection of properties describing a location where the output of an RDF mapping engine goes to and how to access it.

In our getting started tutorial, we had a single output location: the console output of the mapping engine. Optionally, we could send that output to a local file, for example, using RMLMapper's -o option on the command line:

java -jar /path/to/rmlmapper.jar -m rules.rml.ttl -o outputfile.ttl

In this tutorial, we'll learn how to specify targets inside our YARRRML document and assign them to mappings:

  • We can assign multiple targets to one mapping.
  • We can assign one target to multiple mappings.

Basic targets

We'll use the term "basic target" throughout this tutorial where needed to distinguish targets resulting in output going to local files, optionally with compression, or web resources, optionally with authentication, from dynamic targets. The term is not normative.

Local file targets

A local file target is a target resulting in output going to local files.

HTTP request targets

An HTTP request target is a target resulting in output going to web resources, using HTTP methods. We distinguish two types of HTTP request targets:

  • Direct HTTP request targets, which write to a web resource.
  • Linked HTTP request targets, which write to a web resource that is linked to another web resource. We don't cover this type in this tutorial.

Dynamic targets

A dynamic target is a blueprint for basic targets. For a dynamic target, a YARRRML interpreter generates output that enables a mapping engine to compute a basic target, using one or more fields from each record in the source associated with the dynamic target. Thus, dynamic targets can result in as many output locations as there are records in its associated source.

Example

Consider the following columns from the CSV file people.csv:

person_id firstname lastname
0 Natsu Dragneel
1 Gray Fullbuster
2 Gajeel Redfox
3 Lucy Heartfilia
4 Erza Scarlet

They contain information about five different characters, corresponding with the five rows, that appear in the same TV show. The information includes their id, first name, and last name. Also, consider the CSV file friends.csv. It contains information about who's friend with whom:

id-person id-friend
0 2
1 3
1 4

We would like to annotate every character, link them to their friends, and generate the corresponding RDF triples.

Initial YARRRML document

Based on what we learned in the getting started tutorial, the following YARRRML document generates the aforementioned triples:

prefixes:
  e: http://myontology.com/
  ex: http://www.example.com/
  schema: http://schema.org/

sources:
  source-people: ['people.csv~csv']
  source-friends: ['friends.csv~csv']

mappings:
  people:
    sources:
      - source-people
    s:
      value: ex:$(person_id)
    po:
      - [a, schema:Person]
      - [schema:givenName, $(firstname)]
      - [schema:familyName, $(lastname)]

  friends:
    sources:
      - source-friends
    s:
      value: ex:$(id-person)
    po:
      - [e:hasFriend, ex:$(id-friend)]

We use the predicate e:hasFriend to express that a person has another person as a friend.

To produce RDF output, we download the above YARRRML document, save it to initial.yml in our working directory, download people.csv and friends.csv to our working directory, and execute the mapping commands by calling our map.sh script:

./map.sh initial.yml

Different output locations

We want to output our generated RDF triples from the people and friends mappings to different locations:

  • a local file /data/out/people.ttl, serialized as Turtle.
  • a local file /data/out/people-gz.ttl.gz, serialized as Turtle and gzip-compressed.
  • a public web resource at URL http://localhost:3000/people, serialized as Turtle.
  • a private web resource at URL http://localhost:3000/people-private, serialized as Turtle.
  • multiple local files /data/out/people-<x>.ttl, serialized as Turtle, where <x> matches a value of the person_id column of people.csv.

In the following sections, we explain what rules we need to achieve this, and how we write them using YARRRML.

What rules are needed

We need two sets of rules:

  • rules that describe the output locations
  • rules that link these output locations the existing mappings

In our example, we need rules that define:

  • that the local file /data/out/people.ttl contains the generated triples,
  • that the triples of /data/out/people.ttl are serialized as Turtle,
  • that the local file /data/out/people.ttl.gz contains the same content as /data/out/people.ttl and that it's gzip-compressed.
  • that the public web resource at URL http://localhost:3000/people contains the generated triples,
  • that the triples of http://localhost:3000/people are serialized as Turtle,
  • that the web resource at URL http://localhost:3000/people-private contains the generated triples,
  • that the triples of http://localhost:3000/people-private are serialized as Turtle,
  • the authentication required for writing to http://localhost:3000/people-private,
  • that there should be multiple files /data/out/people-<x>.ttl,
  • that /data/out/people-<x>.ttl contains the triples that correspond with the row that has <x> as value for the person_id column of people.csv.

How to output to local files

In our example, we need to output our RDF to local files, both with and without compression. We define the corresponding targets via the top-level targets collection:

targets:

We group rules per target and we give them a unique key. In our example, we use target-people as the key for the rules that define that the triples should be outputted to /data/out/people.ttl, serialized as Turtle:

targets:
  target-people:

We need to define that the location of the file is /data/out/people.ttl. We do this via the key access:

targets:
  target-people:
    access: /data/out/people.ttl

We can define that /data/out/people.ttl is a local file by using the key type with the value localfile. But this is not required because if the value of access is a path, then the type is implicitly set to localfile.

How to define the serialization

We need to define that the serialization should be Turtle. We do this via the key serialization and value turtle:

targets:
  target-people:
    access: /data/out/people.ttl
    serialization: turtle

We can write this using an array notation as well:

targets:
  target-people: [/data/out/people.ttl, turtle]

How to define compression

For the local file /data/out/people.ttl.gz, we use the key target-people-gz, set the value of access to /data/out/people.ttl.gz and the value of serialization to turtle:

targets:
  target-people-gz:
    access: /data/out/people.ttl.gz
    serialization: turtle

We define that we want compression via the key compression. In our example, we want gzip compression:

targets:
  target-people-gz:
    access: /data/out/people.ttl.gz
    serialization: turtle
    compression: gzip

We can write this using an array notation as well:

targets:
  target-people-gz: [/data/out/people.ttl.gz, turtle, gzip]

In our example, we need to link the mappings to the targets target-people and target-people-gz. We do this by adding the targets to subject mapping using the key targets. For the people mapping, this results into:

mappings:
  people:
    sources:
      - ['people.csv~csv']
    s:
      value: ex:$(person_id)
      targets:
        - target-people
        - target-people-gz

It is also possible to use an inline style. To do so, we describe our target specifications immediately as values of targets of the subject, rather than in the top-level targets collection:

mappings:
  people:
    sources:
      - ['people.csv~csv']
    s:
      value: ex:$(person_id)
      targets:
        - ['/data/out/people.ttl', 'turtle']
        - ['/data/out/people-gz.ttl.gz', 'turtle', 'gzip']

This is convenient for targets that appear only in a single mapping.

Our complete YARRRML document looks like this:

prefixes:
  e: http://myontology.com/
  ex: http://www.example.com/
  schema: http://schema.org/

sources:
  source-people: ['people.csv~csv']
  source-friends: ['friends.csv~csv']

targets:
  target-people: ['/data/out/people.ttl', 'turtle']
  target-people-gz: ['/data/out/people-gz.ttl.gz', 'turtle', 'gzip']

mappings:
  people:
    sources:
      - source-people
    s:
      value: ex:$(person_id)
      targets:
        - target-people
        - target-people-gz
    po:
      - [a, schema:Person]
      - [schema:givenName, $(firstname)]
      - [schema:familyName, $(lastname)]

  friends:
    sources:
      - source-friends
    s:
      value: ex:$(id-person)
      targets:
        - target-people
        - target-people-gz
    po:
      - [e:hasFriend, ex:$(id-friend)]

To produce RDF output, we download the above YARRRML document, save it to targets-basic.yml in our working directory, and execute the mapping commands by calling our map.sh script:

./map.sh targets-basic.yml

Now we can view the RDF output in our subdirectory out:

cd out
ls -1

The content of the directory is:

people-gz.ttl.gz
people.ttl

Next, we gunzip the gzip file:

gunzip people-gz.ttl.gz
ls -1

Now, the content of the directory is:

people-gz.ttl
people.ttl

The contents of the files people.ttl and people-gz.ttl are identical and equal to:

<http://www.example.com/0> a <http://schema.org/Person>;
  <http://myontology.com/hasFriend> "http://www.example.com/2";
  <http://schema.org/familyName> "Dragneel";
  <http://schema.org/givenName> "Natsu" .

<http://www.example.com/1> a <http://schema.org/Person>;
  <http://myontology.com/hasFriend> "http://www.example.com/3", "http://www.example.com/4";
  <http://schema.org/familyName> "Fullbuster";
  <http://schema.org/givenName> "Gray" .

<http://www.example.com/2> a <http://schema.org/Person>;
  <http://schema.org/familyName> "Redfox";
  <http://schema.org/givenName> "Gajeel" .

<http://www.example.com/3> a <http://schema.org/Person>;
  <http://schema.org/familyName> "Heartfilia";
  <http://schema.org/givenName> "Lucy" .

<http://www.example.com/4> a <http://schema.org/Person>;
  <http://schema.org/familyName> "Scarlet";
  <http://schema.org/givenName> "Erza" .

How to output to web resources

In our example, we don't only need to output RDF to local files, but also to web resources. Specifically, we need to define that the mapping engine stores the generated triples at http://localhost:3000/people. We add another key-value to the top-level targets collection to achieve this:

targets:
  target-web:
    type: directhttprequest
    access: http://localhost:3000/people
    methodName: PUT
    headers:
      - name: Content-Type
        value: text/turtle
    serialization: turtle

This description instructs the mapping engine to send an HTTP request to http://localhost:3000/people, using the PUT method, with the Content-Type header set to text/turtle, and with RDF, serialized as Turtle, in the body.

The default methodName is PUT and a YARRRML interpreter derives the Content-Type header value text/turtle from serialization: turtle, so we can shorten this to:

targets:
  target-web:
    type: directhttprequest
    access: http://localhost:3000/people
    serialization: turtle

The complete YARRRML document looks like this:

prefixes:
  e: http://myontology.com/
  ex: http://www.example.com/
  schema: http://schema.org/

sources:
  source-people: ['people.csv~csv']
  source-friends: ['friends.csv~csv']

targets:
  target-people: ['/data/out/people.ttl', 'turtle']
  target-people-gz: ['/data/out/people-gz.ttl.gz', 'turtle', 'gzip']
  target-web:
    type: directhttprequest
    access: http://localhost:3000/people
    serialization: turtle

mappings:
  people:
    sources:
      - source-people
    s:
      value: ex:$(person_id)
      targets:
        - target-people
        - target-people-gz
        - target-web
    po:
      - [a, schema:Person]
      - [schema:givenName, $(firstname)]
      - [schema:familyName, $(lastname)]

  friends:
    sources:
      - source-friends
    s:
      value: ex:$(id-person)
      targets:
        - target-people
        - target-people-gz
        - target-web
    po:
      - [e:hasFriend, ex:$(id-friend)]

To produce RDF output, we download the above YARRRML document, save it to targets-web.yml in our working directory, and execute the mapping commands by calling our map.sh script:

./map.sh targets-web.yml

We can see the data that is now available at http://localhost:3000/people via

curl http://localhost:3000/people

The result is the same as the contents of the files people.ttl and people-gz.ttl:

<http://www.example.com/0> a <http://schema.org/Person>;
  <http://myontology.com/hasFriend> "http://www.example.com/2";
  <http://schema.org/familyName> "Dragneel";
  <http://schema.org/givenName> "Natsu" .

<http://www.example.com/1> a <http://schema.org/Person>;
  <http://myontology.com/hasFriend> "http://www.example.com/3", "http://www.example.com/4";
  <http://schema.org/familyName> "Fullbuster";
  <http://schema.org/givenName> "Gray" .

<http://www.example.com/2> a <http://schema.org/Person>;
  <http://schema.org/familyName> "Redfox";
  <http://schema.org/givenName> "Gajeel" .

<http://www.example.com/3> a <http://schema.org/Person>;
  <http://schema.org/familyName> "Heartfilia";
  <http://schema.org/givenName> "Lucy" .

<http://www.example.com/4> a <http://schema.org/Person>;
  <http://schema.org/familyName> "Scarlet";
  <http://schema.org/givenName> "Erza" .

How to output to web resources requiring authentication

In our example, we need to write to the private web resource at http://localhost:3000/people-private, which is hosted on a Solid pod. We can use Client Credentials Authentication because we use an instance of the Community Solid Server to host the pod. The details we need are

It's not required for this tutorial to know what a WebID and OIDC issuer are.

We add this information to the top-level collection authentications using the key auth1:

authentications:
  auth1:
    type: cssclientcredentials
    email: test@example.com
    password: secret!
    webId: http://localhost:3000/profile/card#me
    oidcIssuer: http://localhost:3000/

We also need to include a new target for the web resource http://localhost:3000/people-private and define that the mapping engine needs to use the authentication information we defined at auth1 using the authentication key:

targets:
  target-web-private:
    type: directhttprequest
    access: http://localhost:3000/people-private
    serialization: turtle
    authentication: auth1

The complete YARRRML document looks like this:

prefixes:
  e: http://myontology.com/
  ex: http://www.example.com/
  schema: http://schema.org/

sources:
  source-people: ['people.csv~csv']
  source-friends: ['friends.csv~csv']

targets:
  target-people: ['/data/out/people.ttl', 'turtle']
  target-people-gz: ['/data/out/people-gz.ttl.gz', 'turtle', 'gzip']
  target-web:
    type: directhttprequest
    access: http://localhost:3000/people
    serialization: turtle
  target-web-private:
    type: directhttprequest
    access: http://localhost:3000/people-private
    serialization: turtle
    authentication: auth1

authentications:
  auth1:
    type: cssclientcredentials
    email: test@example.com
    password: secret!
    webId: http://localhost:3000/profile/card#me
    oidcIssuer: http://localhost:3000/

mappings:
  people:
    sources:
      - source-people
    s:
      value: ex:$(person_id)
      targets:
        - target-people
        - target-people-gz
        - target-web
        - target-web-private
    po:
      - [a, schema:Person]
      - [schema:givenName, $(firstname)]
      - [schema:familyName, $(lastname)]

  friends:
    sources:
      - source-friends
    s:
      value: ex:$(id-person)
      targets:
        - target-people
        - target-people-gz
        - target-web
        - target-web-private
    po:
      - [e:hasFriend, ex:$(id-friend)]

To produce RDF output, we download the above YARRRML document, save it to targets-web-auth.yml in our working directory, and execute the mapping commands by calling our map.sh script:

./map.sh targets-web-auth.yml

We can see the data that is now available at http://localhost:3000/people-private via

curl http://localhost:3000/people-private

Note that the pod requires authentication to write to the web resource, but that it doesn't require authentication to read the web resource. The result is the same as the contents of http://localhost:3000/people:

<http://www.example.com/0> a <http://schema.org/Person>;
  <http://myontology.com/hasFriend> "http://www.example.com/2";
  <http://schema.org/familyName> "Dragneel";
  <http://schema.org/givenName> "Natsu" .

<http://www.example.com/1> a <http://schema.org/Person>;
  <http://myontology.com/hasFriend> "http://www.example.com/3", "http://www.example.com/4";
  <http://schema.org/familyName> "Fullbuster";
  <http://schema.org/givenName> "Gray" .

<http://www.example.com/2> a <http://schema.org/Person>;
  <http://schema.org/familyName> "Redfox";
  <http://schema.org/givenName> "Gajeel" .

<http://www.example.com/3> a <http://schema.org/Person>;
  <http://schema.org/familyName> "Heartfilia";
  <http://schema.org/givenName> "Lucy" .

<http://www.example.com/4> a <http://schema.org/Person>;
  <http://schema.org/familyName> "Scarlet";
  <http://schema.org/givenName> "Erza" .

How to output to dynamic locations

In our example, we need to add all the triples of one person to one local file. Specifically, we need to output all triples of the person with the value <x> in the column person_id into the file /data/out/people-<x>.ttl, serialized as Turtle. We do this by using dynamic targets. Dynamic targets have the same keys as basic targets, which are now templates instead of static strings, and an extra one called source. In our example, access has the value /data/out/people-$(person_id).ttl and source has the value source-people:

targets:
  target-people-dt:
    source: source-people
    access: /data/out/people-$(person_id).ttl
    serialization: turtle

The complete YARRRML document looks like this:

prefixes:
  e: http://myontology.com/
  ex: http://www.example.com/
  schema: http://schema.org/

sources:
  source-people: ['people.csv~csv']
  source-friends: ['friends.csv~csv']

targets:
  target-people: ['/data/out/people.ttl', 'turtle']
  target-people-gz: ['/data/out/people-gz.ttl.gz', 'turtle', 'gzip']
  target-web:
    type: directhttprequest
    access: http://localhost:3000/people
    serialization: turtle
  target-web-private:
    type: directhttprequest
    access: http://localhost:3000/people-private
    serialization: turtle
    authentication: auth1
  target-people-dt:
    source: source-people
    access: /data/out/people-$(person_id).ttl
    serialization: turtle

authentications:
  auth1:
    type: cssclientcredentials
    email: test@example.com
    password: secret!
    webId: http://localhost:3000/profile/card#me
    oidcIssuer: http://localhost:3000/

mappings:
  people:
    sources:
      - source-people
    s:
      value: ex:$(person_id)
      targets:
        - target-people
        - target-people-gz
        - target-web
        - target-web-private
        - target-people-dt
    po:
      - [a, schema:Person]
      - [schema:givenName, $(firstname)]
      - [schema:familyName, $(lastname)]

  friends:
    sources:
      - source-friends
    s:
      value: ex:$(id-person)
      targets:
        - target-people
        - target-people-gz
        - target-web
        - target-web-private
    po:
      - [e:hasFriend, ex:$(id-friend)]

To produce RDF output, we download the above YARRRML document save it to targets-dynamic.yml in our working directory, and execute the mapping commands by calling our map.sh script:

./map.sh targets-dynamic.yml

Now we can view the RDF output in our subdirectory out:

cd out
ls -1

The content of the directory is:

people-gz.ttl.gz
people.ttl
people-0.ttl
people-1.ttl
people-2.ttl
people-3.ttl
people-4.ttl

There are five files that start with people- because there are five records in people.csv with each a unique value for the column id.

The contents of people-0.ttl, containing the triples of the first record, is

<http://www.example.com/0> a <http://schema.org/Person>;
  <http://schema.org/familyName> "Dragneel";
  <http://schema.org/givenName> "Natsu" .

The contents of people-1.ttl, containing the triples of the second record, is

<http://www.example.com/1> a <http://schema.org/Person>;
  <http://schema.org/familyName> "Fullbuster";
  <http://schema.org/givenName> "Gray" .

How to output to dynamic locations using different sources

In our example, we need to output triples generated via the friends mapping also to the files /data/out/people-$(person_id).ttl. We can't just add target-people-dt to the targets of the subject of the friends mapping, because the sources of the target and the mapping are different. We need to add the id key to our target:

targets:
  target-people-dt:
    source: source-people
    id: id-target-people-dt-$(person_id)
    access: /data/out/people-$(person_id).ttl
    serialization: turtle

We can now refer to the target via the value of the id key. For example, if we use id-target-people-dt-$(id-person), then the mapping engine uses the value of id-person of the processed record of friends.csv as value for person_id of the target's id. This will also determine the value for the access key considering they both use person_id in this target.

If we only look at the friends mapping, then we add the target as follows:

friends:
  sources:
    - source-friends
  s:
    value: ex:$(id-person)
    targets:
      - id-target-people-dt-$(id-person)
  po:
    - [e:hasFriend, ex:$(id-friend)]

The complete YARRRML document looks like this:

prefixes:
  e: http://myontology.com/
  ex: http://www.example.com/
  schema: http://schema.org/

sources:
  source-people: ['people.csv~csv']
  source-friends: ['friends.csv~csv']

targets:
  target-people: ['/data/out/people.ttl', 'turtle']
  target-people-gz: ['/data/out/people-gz.ttl.gz', 'turtle', 'gzip']
  target-web:
    type: directhttprequest
    access: http://localhost:3000/people
    serialization: turtle
  target-web-private:
    type: directhttprequest
    access: http://localhost:3000/people-private
    serialization: turtle
    authentication: auth1
  target-people-dt:
    source: source-people
    id: id-target-people-dt-$(person_id)
    access: /data/out/people-$(person_id).ttl
    serialization: turtle

authentications:
  auth1:
    type: cssclientcredentials
    email: test@example.com
    password: secret!
    webId: http://localhost:3000/profile/card#me
    oidcIssuer: http://localhost:3000/

mappings:
  people:
    sources:
      - source-people
    s:
      value: ex:$(person_id)
      targets:
        - target-people
        - target-people-gz
        - target-web
        - target-web-private
        - target-people-dt
    po:
      - [a, schema:Person]
      - [schema:givenName, $(firstname)]
      - [schema:familyName, $(lastname)]

  friends:
    sources:
      - source-friends
    s:
      value: ex:$(id-person)
      targets:
        - target-people
        - target-people-gz
        - target-web
        - target-web-private
        - id-target-people-dt-$(id-person)
    po:
      - [e:hasFriend, ex:$(id-friend)]

To produce RDF output, we download the above YARRRML document save it to targets-dynamic-friends.yml in our working directory, and execute the mapping commands by calling our map.sh script:

./map.sh targets-dynamic-friends.yml

Now we can view the RDF output in our subdirectory out:

cd out
ls -1

The content of the directory is:

people-gz.ttl.gz
people.ttl
people-0.ttl
people-1.ttl
people-2.ttl
people-3.ttl
people-4.ttl

The contents of people-0.ttl, containing the triples of the first record, is

<http://www.example.com/0> a <http://schema.org/Person>;
  <http://myontology.com/hasFriend> "http://www.example.com/2";
  <http://schema.org/familyName> "Dragneel";
  <http://schema.org/givenName> "Natsu" .

Note that the mapping engine added the triple with the predicate http://myontology.com/hasFriend.

The contents of people-1.ttl, containing the triples of the second record, is

<http://www.example.com/1> a <http://schema.org/Person>;
  <http://myontology.com/hasFriend> "http://www.example.com/3", "http://www.example.com/4"
  <http://schema.org/familyName> "Fullbuster";
  <http://schema.org/givenName> "Gray" .

Wrapping up

Congratulations! You have created your own YARRRML rules that:

  • output RDF to local files,
  • compress the output,
  • output RDF to web resources,
  • use authentication, and
  • output RDF to different local files in a dynamic way.

Nice work! We hope you now feel like you have a decent grasp on how targets in YARRRML work.

More information

You can find more information in the following:

Table of contents