Unofficial Draft
Copyright © 2024 the document editors/authors. Text is available under the Creative Commons Attribution 4.0 International Public License; additional terms may apply.
Linked Data Event Streams (LDES) is an advanced Knowledge Graph (KG) publication specification aimed at continuous data source replication and synchronization with benefits such as data entities versioning and history retention while providing a self-descriptive API. However, building an LDES requires a high level of expertise in the Semantic Web ecosystem. In this specification provides an extension point to YARRRML, a human-friendly way to configure KG generation via RML. This extension provides an easy-to-use starting point for anyone wanting to create an LDES from non-semantic data.
This is a general full-fletched example with all options which describes behaviour of how to generate IncRML with YARRRML.
sources:
data-source-1: a data source
data-source-2: another data source
targets:
# an "ordinary" target'
my-boring-target:
access: out.nq
type: localfile
serialization: nquads
# an LDES target
my-special-target:
access: out-ldes.nq
type: localfile
serialization: nquads
ldes: # LDES specific keys
id: https://my-ldes.org/the-one-and-only-ldes # The identifier of the Event Stream object.
timestampPath: dcterms:created # optional, default = dcterms:created.
versionOfPath: dcterms:isVersionOf # optional, default = dcterms: isVersionOf.
generateImmutableIRI: false # optional, default = false. If true, turn the member subject IRI into a unique one.
mapping:
general-mapping:
# Here come some existing YARRRML rules
sources: data-source-1
graphs: ex:some-graph
subjects:
- value: ex:$(ObservationID)
targets: my-boring-target
po:
- some-PO-mappings: fun!
# Here's where the magic happens.
# Specify what to do when certain changes in data are detected.
# You can specify any combination of 'create', 'update' and 'delete' here
changeDetection:
# The operation to perform when change is detected.
# Can be `create`, `update` or `delete`.
# In this case the explicit creation of new data objects
create:
# The type of operation: is the create operation explicit or not?
# Optional.
# `true` = explicitly advertised by the data source (default)
# `false` = implicitly advertised by the data source
# See more explanation in section "Change detection".
explicit: true
# Optional. Things that will be *added to* the current mapping for this operation
mappingAdd:
sources: data-source-2
graphs: ex:a-second-graph
subjects:
- value: ex:$(Sensor)/$(ObservationID)
targets: my-special-target
po: [even more fun]
# Optional. Things that will be removed (ignored) from the original mapping
mappingRemove:
subjects: [] # The empty list means "all subjects"
graphs: []
po: []
sources: []
# Here's an "implicit update" example:
update:
explicit: false
# References to data attributes that trigger an update when they change.
watchedProperties: [$(temperature)]
mappingAdd:
# Add a graph to the original mapping.
graphs: ex:update
# Add a target to the original subjects.
subjects:
- targets: my-special-target
# Remove the original graphs at mapping level
mappingRemove:
graphs: []
# The "delete" operation works the same.
The delete
operation removes all PO maps from the generated triples map.
By adding POmaps using mappingAdd
, you can create RDF that provide hints/classifications
e.g. ex:id4 ex:currentState <deleted>
. If no mappingAdd po
elements are added,
all original rdf:type
po
s are kept.
Here are some brief examples.
# explicit create with default options
create:
explicit: true
# implicit create with default options
create:
explicit: false
# explicit create with a specific data source
create:
explicit: true
mappingRemove:
sources: []
mappingAdd:
sources: create-source
# implicit update with properties to watch for change.
# Results in using the default `implicitUpdate` IDLab function
# to check the properties.
update:
explicit: false
watchedProperties: [$(temperature)]
# Custom change detection can be accomplished by just using
# functions at subjects level without specifying changeDetection.
mappings:
general-mapping:
subjects:
- function: idlab-fn:implicitUpdate
parameters:
- [idlab-fn:watchedProperty, $(temperature)]
To model changes in data (e.g., modeling a stream of events), we introduce the
changeDetection
key. It specifies how to detect and act upon changes in
the data of a certain mapping
.
mapping:
myMapping:
subjects: subject mappings
predicateObjects: predicate-object mappings
graphs: graph mappings
changeDetection:
... # details of the change detection...
How changes are detected, depend on the data source: it can publish
changes explicitly or implicitly.
Types of changes are create
, update
, and delete
.
This results in handling the following combinations:
In YARRRML this is defined by an operation (create, update, delete) key
and a boolean explicit
sub-key.
changeDetection:
create:
explicit: true
Implicit create:
changeDetection:
create:
explicit: false
Explicit update:
changeDetection:
update:
explicit: true
Implicit update:
changeDetection:
update:
explicit: false
Explicit delete:
changeDetection:
delete:
explicit: true
Implicit delete:
changeDetection:
delete:
explicit: false
By default, changes are detected by detecting presence or absence of
the IRI generated by the subjects
mappings.
This translates to one of the IDLab functions
explicitCreate
, implicitCreate
, explicitUpdate
, implicitUpdate
,
explicitDelete
, and implicitDelete
applied to the subject mapping.
The next table illustrates what a subject mapping with change detection generates:
Run | Incoming IRI | Create e | Create i | Update e | Update i | Delete e | Delete i |
---|---|---|---|---|---|---|---|
1 | example.org/1 | example.org/1 | example.org/1 | example.org/1 | x | example.org/1 | x |
2 | example.org/2 | example.org/2 | example.org/2 | example.org/2 | x | example.org/2 | example.org/1 |
3 | example.org/1 | x | x | x | example.org/1 | x | example.org/2 |
example.org/1
. The second run the dataset consists of example.org/2
,
and the third run the dataset consists of example.org/1
again.e
stands for explicit
, i
stands for implicit
.x
means the subject mapping is not executed.Explicit and Implicit create behave the same: if an IRI is not seen yet, it is considered new and gets generated by the subject mapping.
Explicit update and Explicit delete consider the incoming data as updates or deletes resp. and will get generated by the subject mapping. Duplicates are ignored: their IRIs are already updated or deleted by the data source.
Implicit update only considers IRIs that have already been seen as updates, and they get generated by the subject mapping. If updates in data fields that are not used when generating the subject IRI need to be considered, see watchedProperties.
Implicit delete considers IRIs it does not encounter the next run as deleted. The subject mapping generates those deleted IRIs.
Implicit changes sometimes require monitoring certain properties or attributes of the data that are not used for subject IRI generation.
For example, consider this initial dataset:
sensorID | temperature |
---|---|
1 | 15.1 |
2 | 14.9 |
We'd map this in YARRRML as an implicit create because an update would would use the same sensor IDs.
mapping:
temperatures:
subjects: https://thermometer.net/sensor_$(sensorID)
po:
- [ex:temperature, $(temperature)]
changeDetection:
create:
explicit: false
Next we get an update of this data set:
sensorID | temperature |
---|---|
1 | 17 |
2 | 14.9 |
Notice that sensor 1 changes its reading while sensor 2 stays the same. We want to capture the change in sensor 1's value.
We'd map this in YARRRML as an implicit update:
mapping:
temperatures:
subjects: https://thermometer.net/sensor_$(sensorID)
po:
- [ex:temperature, $(temperature)]
changeDetection:
update:
create:
explicit: false
update:
explicit: false
This is not enough though; the result of this mapping would not generate triples:
the sensorID
s remain the same, so no change is detected in the subject IRI.
To fix this, a watchedProperties
key can be added: we can specify to
monitor temperature
for changes:
mapping:
temperatures:
subjects: https://thermometer.net/sensor_$(sensorID)
po:
- [ex:temperature, $(temperature)]
changeDetection:
update:
create:
explicit: false
update:
explicit: false
watchedProperties: [$(temperature)]
When processing the updated dataset the temperature change of sensor 1 will be detected and a new triple will be generated.
It is possible to act differently on different changes.
For example, changes could be written to a specific named graph,
or another target.
Or certain predicateObject
mappings would not be executed for certain
changes.
This chapter descripbes the possibilities.
Modifications in the original mappings can be specified with the
mappingAdd
and mappingRemove
sub-keys of changeDetection
.
Suppose we process a stream of messages like these:
{
"create": [{"fruit": "apple", "colour": "green"}, {"fruit": "orange", "colour": "orange"}],
"update": [],
"delete": []
}
{
"create": [{"fruit": "mellon", "colour": "yellow"}],
"update": [{"fruit": "apple", "colour": "red"}],
"delete": []
}
{
"create": [{"fruit": "pear", "colour": "green"}],
"update": [],
"delete": [{"fruit": "apple"}, {"fruit": "mellon"}]
}
All changes are explicitly advertised in separate JSON keys for every type of change (what a coincidence!).
One way of specifying this in YARRRML is by removing the original source(s) and add different sources since they have a different iterator:
# Different iterators, so different sources:
sources:
create-source: [message.json~jsonpath, $.create.*]
update-source: [message.json~jsonpath, $.update.*]
delete-source: [message.json~jsonpath, $.delete.*]
mappings:
s: http://fruit.org/$(fruit)
po:
- [ex:colour, $(colour)]
changeDetection:
create:
explicit: true
mappingAdd:
sources: create-source
mappingRemove:
sources: []
update:
explicit: true
mappingAdd:
sources: update-source
mappingRemove:
sources: []
update:
explicit: true
mappingAdd:
sources: delete-source
mappingRemove:
sources: []
A more concise way achieving the same result is "updating" the iterator from the source mapping:
mappings:
# This source will be updated by the operations:
sources: [message.json~jsonpath, $.*]
s: http://fruit.org/$(fruit)
po:
- [ex:colour, $(colour)]
changeDetection:
create:
explicit: true
mappingAdd:
sources:
- iterator: $.create.*
update:
explicit: true
mappingAdd:
sources:
- iterator: $.update.*
create:
explicit: true
mappingAdd:
sources:
- iterator: $.delete.*
This is what can be done with mappingAdd
and mappingRemove
:
changeDetection:
create:
mappingAdd:
# Adds a new source when processing this `create`
sources: reference-to-a-new-source
changeDetection:
create:
mappingAdd:
# Adds a new source when processing this `create`, this time with
# an inline source definition
sources:
- [data.json~jsonpath, $.*]
access
key at mappingAdd/sources
level:changeDetection:
create:
mappingAdd:
# Updates the original source with a new iterator:
sources:
- iterator: $.*
The same can be done for all other sub-keys of sources
, such as
delimiter
, query
, encoding
, etc.changeDetection:
create:
mappingRemove:
sources: []
changeDetection:
create:
mappingRemove:
sources:
- reference-to-a-source
compression
and delimiter
):changeDetection:
create:
mappingRemove:
sources:
- compression
- delimiter
compression
and delimiter
):changeDetection:
create:
mappingRemove:
sources:
- compression
- delimiter
graphs:
changeDetection:
create:
mappingRemove:
predicateobjects: []
orchangeDetection:
create:
mappingRemove:
po: []
changeDetection:
create:
mappingRemove:
po: [["ex:pressure", "$(pressure)"]]
changeDetection:
create:
mappingRemove:
subjects: ex:sensor/$(sensor)
changeDetection:
create:
mappingRemove:
subjects:
targets: []
changeDetection:
create:
mappingRemove:
subjects:
- ex:sensor/$(id) # specific subject
- targets: [] # all targets
Not possible yet:
compression
.