Data retrieval with RML

RML takes advantage of W3C-standardized or widely-accepted vocabularies used to advertise services or datasets, to define how to retrieve and access data sources, available on the Web or not. Such descriptions, either derived from data owners/publishers or defined by data publishers/consumers, are used to describe how to retrieve the data.

Dataset and service descriptions

Dataset description

Data Catalog Vocabulary (DCAT) is the W3C recommended vocabulary for describing datasets in data catalogs, enabling applications to easily consume the underlying data. dcat:Dataset represents a dataset in the catalog. DCAT considers as a dataset a collection of data, published or curated by a single agent, and available for access or download in one or more formats. dcat:Distribution represents an accessible form of a dataset, e.g., a downloadable file, an RSS feed or a Web Service. An example is shown below.


@prefix dcat: <http://www.w3.org/ns/dcat#> .
<#DCAT_source>
    a dcat:Dataset ;
    dcat:distribution [
        a dcat:Distribution;
        dcat:downloadURL "http://example.org/file.xml" ].
Web API description

Hydra core vocabulary (Hydra) is a lightweight vocabulary, published by the Hydra W3C Community Group, for the description of Hypermedia-Driven Web APIs. Hydra can be used both to describe static data sources identified by a URI, and dynamic sources, described by a template-valued URI that contains variables, whose values depends on information only known by the client. Hydra enables a server to advertise valid state transitions. A client can use this information to construct HTTP requests to retrieve the data. hydra:IriTemplate represents an IRI template. hydra:TemplateMapping represents a mapping from an IRI template variable to a property An example is shown below.


@prefix hydra : <http://www.w3.org/ns/hydra/core#> .
<#API_template_source>
    a hydra:IriTemplate
    hydra:template "https://biblio.ugent.be/publication/{id}?format={format}";
    hydra:mapping
        [ a hydra:TemplateMapping ;
          hydra:variable "id";
          hydra:required true ],
        [ a hydra:TemplateMapping ;
          hydra:variable "format";
          hydra:required false ] .
Database service description

The D2RQ Mapping Language (D2RQ) is a declarative Mapping Language for describing the relation between a relational database schema and RDFS vocabularies or OWL ontologies. D2RQ defines d2rq:Database to represent a JDBC connection to a local or remote relational database. An example is shown below.

        
@prefix d2rq : <http://www.wiwiss.fu-berlin.de/suhl/bizer/D2RQ/0.1#> .
<#DB_source> a d2rq:Database;
    d2rq:jdbcDSN "jdbc:mysql://localhost/example";
    d2rq:jdbcDriver "com.mysql.jdbc.Driver";
    d2rq:username "user";
    d2rq:password "password" .
        
    
SPARQL service description

SPARQL service description (SPARQL-SD) is a W3C standardized vocabulary for describing SPARQL services made available via the SPARQL 1.1 Protocol. These descriptions provide a mechanism by which a client or end user can discover information about the SPARQL service and details about the available dataset. SPARQL-SD defines sd:Service to represent a SPARQL service made available via the SPARQL Protocol, sd:Dataset to represent a RDF Dataset comprised of a default graph and zero or more named graphs and sd:Graph to represent the description of an RDF graph. Below two examples are shown. The first one is a a SPARQL-SD description of a SPARQL endpoint set to return data in JSON format. The second one is a SPARQL-SD description of a SPARQL endpoint set to return data in XML.

            
@prefix sd :  <http://www.w3.org/ns/sparql-service-description#> .
 <#SPARQL_JSON_source> a sd:Service ;
    sd:endpoint  <http://dbpedia.org/sparql/> ;
    sd:supportedLanguage sd:SPARQL11Query ;
    sd:resultFormat  <http://www.w3.org/ns/formats/SPARQL_Results_JSON> .
            
        
            
@prefix sd : <http://www.w3.org/ns/sparql-service-description#> .
<#SPARQL_XML_source> a sd:Service ;
    sd:endpoint <http://dbpedia.org/sparql/> ;
    sd:supportedLanguage sd:SPARQL11Query ;
    sd:resultFormat <http://www.w3.org/ns/formats/SPARQL_Results_XMLN> .
            
        
CSV on the Web description

CSV on the Web Vocabulary (CSVW) is a W3C working draft vocabulary for metadata that annotates tabular data. CSVW defines csvw:Table that represents a table within a CSV file and csvw:Dialect that represents a CSV dialect and informs the parsers regarding how to parse the file in a table description. An example is shown below.

        
@prefix csvw : <http://www.w3.org/ns/csvw#> .
<#CSVW_source> a csvw:Table;
    csvw:url "http://rml.io/data/csvw/Airport.csv" ;
    csvw:dialect [ a csvw:Dialect;
        csvw:delimiter ";";
        csvw:encoding "UTF-8";
        csvw:header "1"^^xsd:boolean
        
    

RML Logical Source descriptions

Local file

Original data: If you want to map data stored in a local file. Access description: Provide the path to the this file. An example is shown below.


<#TriplesMapLocalFile> rml:logicalSource [
    rml:source "/path/to/local/file.xml" ;
    rml:referenceFormulation ql:CSV ] .
File published in a catalog on the Web

Original data: If you want to map data published in a data catalog on the Web. Access description: Provide the distribution description of the published dataset as the data source (DCAT). An example is shown below.


<#TriplesMapLocalFile> rml:logicalSource [
    rml:source <#DCAT_source> ;
    rml:referenceFormulation ql:XML;
    rml:iterator "/" ] .
Data from a Web API

Original data: If you want to map data published on the Web and accessed via a Web API. Access description: Provide the API description or, at least, the description of the IRI (template) as the data source (Web API). An example is shown below.


<#TriplesMapLocalFile> rml:logicalSource [
    rml:source <#API_template_source> ;
    rml:referenceFormulation ql:JSON;
    rml:iterator "$" ] .
Data from a database

Original data: If you want to map data stored in a database. Access description: Provide the database connectivity description as the data source (Database description). An example is shown below.


<#TriplesMapLocalFile> rml:logicalSource [
    rml:source <#DB_source> ;
    rr:sqlVersion rr:SQL2008;
    rml:query """
SELECT DEPTNO, DNAME, LOC,
       (SELECT COUNT(*) FROM EMP WHERE EMP.DEPTNO=DEPT.DEPTNO) AS STAFF
FROM DEPT; """ .
Data from a SPARQL endpoint

Original data: If you consider data already in RDF. Access description: Provide the SPARQL service description (SPARQL). An example is shown below.


<#TriplesMapLocalFile> rml:logicalSource [
    rml:source <#SPARQL_XML_source> ;
    rml:referenceFormulation ql:XML;
    rml:iterator "/";
    rml:query " select distinct ?resource ?resource_label
                where { ?resource rdfs:label ?resource_label } " ] .