Automated Metadata Generation for
Linked Data Generation and Publishing Workflows

Anastasia Dimou, Ghent UniversityiMinds

Automated Metadata Generation
for Linked Data Generation
and Publishing Workflows

Anastasia.Dimou@ugent.be


@natadimou


Ghent UniversityiMinds

Anastasia Dimou,

Tom De Nies, Ruben Verborgh,
Erik Mannens, Rik Van de Walle

Not all data is intuitively represented
as Linked Data, nore ever will be

Where is Linked Data derived from?

We barely know the provenance for
the 36% of the RDF datasets
published on the Linked Data Cloud

M. Schmachtenberg, C. Bizer, H. Paulheim
Adoption of the Linked Data Best Practices in Different Topical Domains
ISWC 2014

However, what we know
is of questionable trustworthiness
and lack of accuracy and details
(28% uses DC or DCTerms, 0.8% PROV)

Provenance and metadata
is manually provided by
the data publishers
(person-agents)

... and not automatically
by the applications
that generate the RDF dataset
(software-agents)

Automated self-descriptive Metadata
of the Linked Data Generation

based on the mapping rules & data access
declarative descrptions

Linked Data Generation Workflows

Linked Data Generation Workflows

Linked Data Generation

Linked Data Generation

Linked Data is derived from

Declarative Data (Access) Descriptions
Examples of Vocabularies

Declarative Data (Access) Descriptions
Examples of Vocabularies

Data Input Source - DCAT Example

Linked Data Generation Workflows

RDF Mapping Language (RML)

A. Dimou, M. Vander Sande, P. Colpaert, R. Verborgh, E. Mannens, and R. Van de Walle.
RML: A Generic Language for Integrated RDF Mappings of Heterogeneous Data.
LDOW 2014

Triples Map

Triples Map

Triples Map

Logical Source

Logical Source - Example XML data

Logical Source

Logical Source

Input Source


A. Dimou, R. Verborgh, M. Vander Sande, E. Mannens, and R. Van de Walle
Machine-Interpretable Dataset & Service Descriptions for Heterogeneous Data Access & Retrieval
SEMANTiCS 2015

Input Source - Different Vocabularies

Input Source

Logical Source - Example

Triples Map

Generating RDF Terms

Generating RDF Terms

Generating RDF Datasets

Linked Data Generation Workflows

Mapping rules execution

Mapping rules execution

Generating RDF Datasets

Mapping rules execution

Generating Provenance

<#RDF_Dataset> prov:wasGeneratedBy <Mapping_Activity>.

Generating RDF Datasets with Provenance

<#RDF_Dataset> prov:wasGeneratedBy <Mapping_Activity>.
<Mapping_Activity> prov:used <XML_VenueMapping>.

Generating RDF Datasets with Provenance

<Mapping_Activity> 
    prov:used <XML_VenueMapping>, <DCAT_LogicalSource>.

Generating RDF Datasets with Provenance

<#DCAT_LogicalSource> 
    prov:wasGeneratedBy <Retrieval_Activity>.

Generating RDF Datasets with Provenance

<Retrieval_Activity> prov:used <DCAT_Input>.

Generating RDF Datasets with Provenance

<#RDF_Dataset> prov:wasDerivedFrom <#DCAT_Input> .

Mapping rules execution

Metadata Generation

Metadata Generation - DCAT Example

Mapping rules execution

Automated self-descriptive
Provenance and Metadata

based on RML mapping rules &
declarative data descrptions

RML.io


Anastasia.Dimou@ugent.be
@natadimou