RML and Data Retrieval Description


RML takes advantage of W3C-standardized or widely-accepted vocabularies used to advertise services or datasets, to define how to retrieve and access data sources, available on the Web or not. Such descriptions, either derived from data owners/publishers or defined by data publishers/consumers, are used to describe how to retrieve the data.

Dataset and Service descriptions

Dataset description

Data Catalog Vocabulary (DCAT) is the W3C recommended vocabulary for describing datasets in data catalogs, enabling applications to easily consume the underlying data.
dcat:Dataset represents a dataset in the catalog. DCAT considers as a dataset a collection of data, published or curated by a single agent, and available for access or download in one or more formats.
dcat:Distribution represents an accessible form of a dataset, e.g., a downloadable file, an RSS feed or a Web Service.
@prefix dcat: <http://www.w3.org/ns/dcat#> .
<#DCAT_source>
    a dcat:Dataset ;
    dcat:distribution [
        a dcat:Distribution;
        dcat:downloadURL "http://example.org/file.xml" ].
A DCAT description of a Dataset

Web APIs description

Hydra core vocabulary (Hydra) is a lightweight vocabulary, published by the Hydra W3C Community Group, for the description of Hypermedia-Driven Web APIs. Hydra can be used both to describe static data sources identified by a URI, and dynamic sources, described by a template-valued URI that contains variables, whose values depends on information only known by the client.
Hydra enables a server to advertise valid state transitions. A client can use this information to construct HTTP requests to retrieve the data.
hydra:IriTemplate represents an IRI template. hydra:TemplateMapping represents a mapping from an IRI template variable to a property.
@prefix hydra : <http://www.w3.org/ns/hydra/core#> .
<#API_template_source>
    a hydra:IriTemplate
    hydra:template "https://biblio.ugent.be/publication/{id}?format={format}";
    hydra:mapping 
        [ a hydra:TemplateMapping ;
          hydra:variable "id";
          hydra:required true ],
        [ a hydra:TemplateMapping ;
          hydra:variable "format";
          hydra:required false ] . 
A Hydra description of a template-valued Web API.

Database service description

The D2RQ Mapping Language (D2RQ) is a declarative Mapping Language for describing the relation between a relational database schema and RDFS vocabularies or OWL ontologies.
D2RQ defines d2rq:Database to represent a JDBC connection to a local or remote relational database.
@prefix d2rq : <http://www.wiwiss.fu-berlin.de/suhl/bizer/D2RQ/0.1#> .
<#DB_source> a d2rq:Database;
    d2rq:jdbcDSN "jdbc:mysql://localhost/example";
    d2rq:jdbcDriver "com.mysql.jdbc.Driver";
    d2rq:username "user";
    d2rq:password "password" . 
A D2RQ description of a Database.

SPARQL service description

SPARQL service description (SPARQL-SD) is a W3C standardized vocabulary for describing SPARQL services made available via the SPARQL 1.1 Protocol. These descriptions provide a mechanism by which a client or end user can discover information about the SPARQL service and details about the available dataset.
SPARQL-SD defines sd:Service to represent a SPARQL service made available via the SPARQL Protocol, sd:Dataset to represent a RDF Dataset comprised of a default graph and zero or more named graphs and sd:Graph to represent the description of an RDF graph.
@prefix sd : <http://www.w3.org/ns/sparql-service-description#> .
<#SPARQL_JSON_source> a sd:Service ;
    sd:endpoint <http://dbpedia.org/sparql/> ;
    sd:supportedLanguage sd:SPARQL11Query ;
    sd:resultFormat <http://www.w3.org/ns/formats/SPARQL_Results_JSON> . 
A SPARQL-SD description of a SPARQL endpoint set to return data in JSON format.

@prefix sd : <http://www.w3.org/ns/sparql-service-description#> .
<#SPARQL_XML_source> a sd:Service ;
    sd:endpoint <http://dbpedia.org/sparql/> ;
    sd:supportedLanguage sd:SPARQL11Query ;
    sd:resultFormat <http://www.w3.org/ns/formats/SPARQL_Results_XML> . 
A SPARQL-SD description of a SPARQL endpoint set to return data in XML.


CSV on the Web description

CSV on the Web Vocabulary (CSVW) is a W3C working draft vocabulary for metadata that annotates tabular data.
CSVW defines csvw:Table that represents a table within a CSV file and csvw:Dialect that represents a CSV dialect and informs the parsers regarding how to parse the file in a table description.
@prefix csvw : <http://www.w3.org/ns/csvw#> .
<#CSVW_source> a csvw:Table;
    csvw:url "http://rml.io/data/csvw/Airport.csv" ;
    csvw:dialect [ a csvw:Dialect;
        csvw:delimiter ";";
        csvw:encoding "UTF-8";
        csvw:header "1"^^xsd:boolean 
A CSVW description of a CSV file on the Web.

RML Logical Source descriptions


Mapping a Local File

Original data: If you want to map data stored in a local file
Access description: Provide the path to the this file
<#TriplesMapLocalFile> rml:logicalSource [
    rml:source "/path/to/local/file.xml" ;
    rml:referenceFormulation ql:CSV ] . 
A Triples Map mapping data from a local file.

Mapping a File published in a catalog on the Web

Original data: If you want to map data published in a data catalog on the Web
Access description: Provide the distribution description of the published dataset as the data source (DCAT).
<#TriplesMapCatalog> rml:logicalSource [
    rml:source <#DCAT_source> ; 
    rml:referenceFormulation ql:XML;
    rml:iterator "/" ] . 
A Triples Map mapping data from a file published as a Dataset's Distribution, described in its turn using DCAT.

Mapping data from a Web API

Original data: If you want to map data published on the Web and accessed via a Web API
Access description: Provide the API description or, at least, the description of the IRI (template) as the data source (Web API).
<#TriplesMapWebAPI> rml:logicalSource [
    rml:source <#API_template_source> ; 
    rml:referenceFormulation ql:JSON;
    rml:iterator "$" ] . 
A Triples Map mapping data from a Web API, described using Hydra, whose template is instantiated with certain values.

Mapping data from a Database

Original data: If you want to map data stored in a database
Access description: Provide the database connectivity description as the data source (Database description).
<#TriplesMapDatabase> rml:logicalSource [
    rml:source <#DB_source> ; 
    rr:sqlVersion rr:SQL2008;
    rml:query """
SELECT DEPTNO, DNAME, LOC,
       (SELECT COUNT(*) FROM EMP WHERE EMP.DEPTNO=DEPT.DEPTNO) AS STAFF
FROM DEPT; """ .  
A Triples Map mapping data from a database, described using D2RQ.

Mapping data from a SPARQL Endpoint

Original data: If you consider data already in RDF
Access description: Provide the SPARQL service description (SPARQL).
<#TriplesMapSPARQL> rml:logicalSource [
    rml:source <#SPARQL_XML_source> ; 
    rml:referenceFormulation ql:XML;
    rml:iterator "/";
    rml:query " select distinct ?resource ?resource_label 
                where { ?resource rdfs:label ?resource_label } " ] . 
A Triples Map mapping data from a SPARQL endpoint, described using SPARQL-SD.