1 Computer Science Department, University of Crete 2

Provenance Management for SPARQL Updates
Argyro Avgoustaki
1, 2,
Giorgos Flouris
2,
Irini Fundulaki
2
{argiro, fgeo, fundul}@ics.forth.gr
2 Institute
1 Computer
Science Department,
University of Crete
of Computer Science,
FORTH
http://www.csd.uoc.gr
http://www.ics.forth.gr
Motivation
Model Features
 suitable for encoding the triple and attribute level provenance of RDF
quadruples
 uses complex algebraic expressions.
 represents the provenance of a quadruple as a set of provenance expressions
indicating that the quadruple can be produced by different updates
 based on how and where provenance models
 supports unions of basic graph patterns
In the last few years we have witnessed an explosion in the publication of data in the form of
Linked Data. Recording the provenance information of Linked Data is an essential task in order to
effectively support trustworthiness, accountability and repeatability. In this context, our work:
 Introduces a new provenance model for SPARQL INSERT Updates
 Allows the reconstructability of SPARQL INSERT Updates from their provenance
 Provides algorithmic support via the Provenance Construction and the Update
Reconstruction algorithms
Provenance Model
quadruple
identifier
Quadruple (q)
Provenance (P)
Case
s
p
o
n
c1
c2
a
d
b
b
d
a
n1
n2
P1 = { (⊥, ⊥, ⊥) }
P2 = { ( qp1 .o (c1), qp1 .p (c1), qp1 .s (c1)) }
1
1
1
Attributes derived from const values
Attributes derived by ``copying ‘’ values
c3
a
d
b
n3
P3 = { ( qp1 .o (c1 {qp 1 .p, qp 1 .o} ⊙ {qp 1 .p, qp1 .o} c2),
1
2
1
2
2
qp11 .o (c1 {qp 1 .p, qp 1 .o} ⊙ {qp 1 .p, qp1 .o} c2),
1
2
1
2
qp 11 .p (c1 {qp 1 .p, qp 1 .o} ⊙ {qp 1 .p, qp1 .o} c2) ) }
1
2
1
2
Attributes derived by ``copying’’ values
generated via joins
c4
a
d
b
n4
P4 = { ( ( qp1 .s (c3), qp1 .p (c3), qp 1 .o (c3)) ⊕ (( qp 2 .o (c2), qp 2 .s (c2), qp 2 .p (c2)) ) }
1
1
1
1
1
1
Provenance Construction
Attributes derived by ``copying’’ values
from two different sources
Update Reconstruction
Provenance
Compatible
Updates
• Quadruple q3: (a, d, b, n3)
position
INSERT
s
p
o
{ (?o1, ?o, ?p,
WHERE { (?s,
(?o,
quad
pattern
identifier
?p, ?o,
n3) }
n1) .
?p, ?o1, n2) }
qi is assigned to
identifier ci and has
provenance Pi
*c1 and c2 result from the evaluation
1
1
of qp1 and qp2 respectively
qpins
1
qp 1
qp 21
Dataset
--------------{q1, q2}
position
q3: (a, d, b, n3)
INSERT
P3: { ( qp 1 (c1* qp 1 qp 1 ⊙ qp 1 qp 1 c2*),
{ 2 .p , 2 .s }
{
2 .o
1 .o }
1 .p ,
1
qp 1 .o (c1 { qp 1 .p , qp 11 .o } ⊙ { qp 1 .p , qp21 .s } c2 ),
1
2
1
qp 1 .p (c1 { qp 1 .p , qp 11 .o } ⊙ { qp 1 .p , qp21 .s } c2) ) }
1
Provenance Construction
Algorithm
O(n) w.r.t the update size
O(logn) w.r..t. the Dataset size
O(n) w.r.t. the result size
Future Work
1. Include FILTER and other non-monotonic operators
2. Study the SPARQL DELETE, CREATE and DROP operations, since all SPARQL
operations can be written as a combination of INSERT, DELETE, CREATE and DROP
statements
3. Implement our model to experimentally evaluate its performance and scalability for
large SPARQL updates and/or updates with a large output.
This work was partially supported by the PlanetData NoE (FP7:ICT-2009.3.4,
#257641) and DIACHRON IP (FP7:ICT-2011.4.3, #601043)
WHERE
s
{ (?x, ?y,
{ (?w, ?z,
(?y,
2
p
?z,
o
quad
pattern
identifier
?z, n3) }
qpins
?y, n1) .
1
qp 1
?x, n2) }
1
qp 2
Update Reconstruction
Algorithm
O(n) w.r.t the update size
O(n) w.r..t. the Dataset size
References
1. Theoharis, Y., Fundulaki, I., Karvounarakis, G., Christophides, V.: On
Provenance of Queries on Semantic Web Data. IEE 15(1), 31-39 (2011)
2. Buneman, P., Cheney, J., Vansummeren, S.: On the Expressiveness of
Implicit Provenance in Query and Update Languages. In: Schwentick, T.,
Suciu, D. (eds.) ICDT. vol. 4353, pp. 209-223. Springer (2007)
3. Green, T.J., Karvounarakis, G., Tannen, V.: Provenance semirings. In:
PODS. pp. 31-40. ACM (2007)