slides - Antoine Amarilli

Introduction
Known results
Unordered documents
Unambiguous labels
Conclusion
The Possibility Problem
for Probabilistic XML
Antoine Amarilli
Télécom ParisTech, Paris, France
June 5, 2014
1/21
Introduction
Known results
Unordered documents
Unambiguous labels
Conclusion
Probabilistic XML
We are unsure about the exact contents of an XML document.
directory
person
person
address
ind
x
0.6
John Doe 42 Foo St. mobile
name
1234
name
0.8
mux
0.2
address
x
42 Foo St.
Jane Doe Jean Doe
Semantics: probability distribution over deterministic documents.
2/21
Introduction
Known results
Unordered documents
Unambiguous labels
Conclusion
Local formalisms: possible worlds semantics
r
ind
α β
a b
⇒
(1 − α)(1 − β)
r
α(1 − β)
r
(1 − α)β
r
αβ
r
a
b
a b
3/21
Introduction
Known results
Unordered documents
Unambiguous labels
Conclusion
Local formalisms: possible worlds semantics
r
ind
α β
a b
⇒
r
mux
α β
a b
⇒
(1 − α)(1 − β)
r
1−α−β
r
α(1 − β)
r
(1 − α)β
r
αβ
r
a
b
a b
α
r
β
r
a
b
3/21
Introduction
Known results
Unordered documents
Unambiguous labels
Conclusion
Local formalisms: possible worlds semantics
r
ind
α β
a b
⇒
r
mux
α β
a b
⇒
r
det
a b
⇒
(1 − α)(1 − β)
r
1−α−β
r
α(1 − β)
r
(1 − α)β
r
αβ
r
a
b
a b
α
r
β
r
a
b
1
r
a b
Caution: we impose α < 1, β < 1 in ind.
3/21
Introduction
Known results
Unordered documents
Unambiguous labels
Conclusion
Event formalisms
Probability distribution on events
x
y
0.7
0.4
r
Draw events independently
Edges annotated with formulae on the events
Edges with false formulae are removed
x
¬x ∧ y
a
b
⇒ mie: multivalued events (see later)
⇒ cie: conjunctions of Boolean events
⇒ fie: formulae of Boolean events
4/21
Introduction
Known results
Unordered documents
Unambiguous labels
Conclusion
Possibility problem (Poss)
Given:
a probabilistic document D
a deterministic document W
Is W a possible world of D?
If yes, with which probability?
Diverse probabilistic formalisms, ordered and unordered
Like query evaluation but:
⇒ Need inequality: “don’t collapse nodes”
⇒ Need negation: “no additional things”
⇒ Query depends on input W
⇒ Specific bounds for this Poss problem?
5/21
Introduction
Known results
Unordered documents
Unambiguous labels
Conclusion
Table of contents
1
Introduction
2
Known results
3
Unordered documents
4
Unambiguous labels
5
Conclusion
6/21
Introduction
Known results
Unordered documents
Unambiguous labels
Conclusion
In NP, in FP#P
Guess a valuation of the events
Guess a match of W in D
Check that the match is realized by the valuation
⇒ Likewise, probability computation is in FP#P
⇒ Of course Poss is NP-hard for fie
7/21
Introduction
Known results
Unordered documents
Unambiguous labels
Conclusion
Tractable for ordered local documents
Local choices and ordered documents
Possibility decision and computation are in PTIME
Intuitively:
match each possible subsequences of siblings
dynamic algorithm for match at each level
⇒ Implied by determininstic tree automata on probabilistic XML:
Cohen, Kimelfeld, and Sagiv 2009
⇒ Assumption of order is crucial
8/21
Introduction
Known results
Unordered documents
Unambiguous labels
Conclusion
Table of contents
1
Introduction
2
Known results
3
Unordered documents
4
Unambiguous labels
5
Conclusion
9/21
Introduction
Known results
Unordered documents
Unambiguous labels
Conclusion
Computation is #P-hard for ind or mux
1
2
3
1
2
3
D
r
⊤
1/2
a1
1/2
a2
W
r
⊤
⊤
1/2
1/2
a3
a2
⊤
⊤
⊤
a1
a2
a3
1/2
a3
⇒ Probability of match times 2n : number of perfect matchings
⇒ Computation is #P-hard for unordered and ind or mux
10/21
Introduction
Known results
Unordered documents
Unambiguous labels
Conclusion
Decision is in PTIME for ind or mux
Compute bottom-up if a node has the empty possible world
Check dynamically between all nodes of D and W
⇒ Build bipartite graph based on child compatibility
⇒ Add dummy nodes for deletions of nodes that can be deleted
⇒ Check in PTIME if graph has a perfect matching
D
r
X1
X2
ind
1/2
a
1/2
b
ind
1/2
b
W
r
ind
1/2
c
G
X1 X2
X3
X4
a
b
c
X3 X4 del
11/21
Introduction
Known results
Unordered documents
Unambiguous labels
Conclusion
Decision is NP-hard for any two of ind, mux, det
With det, reduction from exact cover
S = {Si }, Si = {sij }
∪
∪
Is there T ⊆ S such that T = S with no dupes?
D
r
S = {{a, b},
{a, c},
1/2
{b}}
1/2
det
a
W
r
1/2
det
det
b
a
c
a
b
c
b
12/21
Introduction
Known results
Unordered documents
Unambiguous labels
Conclusion
Decision is NP-hard for any two of ind, mux, det (cont’d)
With ind and mux, reduction from SAT
F = (a ∨ b ∨ ¬c) ∧ (a ∨ c) ∧ (¬a)
D
r
mux
1/2
ind
1/2
mux
mux
1/2 1/2
1/2 1/2
1/2
ind
1/2 1/2
c1 c2
W
r
c3
ind ind
1/2
ind ind
1/2 1/2
c1
c2
c1
c2
c3
c3
13/21
Introduction
Known results
Unordered documents
Unambiguous labels
Conclusion
Table of contents
1
Introduction
2
Known results
3
Unordered documents
4
Unambiguous labels
5
Conclusion
14/21
Introduction
Known results
Unordered documents
Unambiguous labels
Conclusion
Unambiguity
D is unambiguous if node labels are unique
Possible refinements (unique among siblings, etc.)
⇒ There is at most one way to match W!
All local models tractable (can impose order)
⇒ Can we have correlations?
15/21
Introduction
Known results
Unordered documents
Unambiguous labels
Conclusion
Still NP-hard for cie
F=
∧∨
i
j ±xj
in CNF
∧ ∧
Equivalently: i ¬ j ∓xij
i
∧
D
r
∧
1
j ∓xj
a1
W
r
...
1
j ∓xj
an
⇒ W is a possible world of D iff F is satisfiable
⇒ Decision for Poss is NP-hard
16/21
Introduction
Known results
Unordered documents
Unambiguous labels
Conclusion
The mie class
Var
Val
Prob
x
x
x
x
1
2
3
4
0.6
0.2
0.1
0.1
No conjunctions allowed
y
y
1
2
0.5
0.5
Intractable if ambiguous
mie: Multivalued independent events
Captures mux
Doesn’t capture det or ind hierarchies
⇒ If non-ambiguous, do we have tractability?
17/21
Introduction
Known results
Unordered documents
Unambiguous labels
Conclusion
mie tractable on non-ambiguous documents
Var
Val
Prob
x
x
x
x
1
2
3
4
0.6
0.2
0.1
0.1
y
y
1
2
0.5
0.5
D
r
y=2
x=1
b
a
c
x=2 y=1 y=2
d
e f
g
W
r
a
b
d
x ̸= 2, x ̸= 1, y = 2, y ̸= 1
x ∈ {3, 4}, y ∈ {2}.
⇒ Probability 0.1.
18/21
Introduction
Known results
Unordered documents
Unambiguous labels
Conclusion
Table of contents
1
Introduction
2
Known results
3
Unordered documents
4
Unambiguous labels
5
Conclusion
19/21
Introduction
Known results
Unordered documents
Unambiguous labels
Conclusion
Conclusion
Ordered local models are tractable
Unordered local models are tractable
⇒ For decision only, and
⇒ With only mux or only ind
mie is tractable on unambiguous documents
Other cases are hard
20/21
Introduction
Known results
Unordered documents
Unambiguous labels
Conclusion
Conclusion
Ordered local models are tractable
Unordered local models are tractable
⇒ For decision only, and
⇒ With only mux or only ind
mie is tractable on unambiguous documents
Other cases are hard
⇒ Height does not matter
⇒ Probabilities do not matter
⇒ Can we refine mie, unambiguity, mux–ind interaction?
⇒ What if D is partially ordered?
20/21
Introduction
Known results
Unordered documents
Unambiguous labels
Conclusion
Conclusion
Ordered local models are tractable
Unordered local models are tractable
⇒ For decision only, and
⇒ With only mux or only ind
mie is tractable on unambiguous documents
Other cases are hard
⇒ Height does not matter
⇒ Probabilities do not matter
⇒ Can we refine mie, unambiguity, mux–ind interaction?
⇒ What if D is partially ordered?
Thanks for your attention!
20/21
Additional material
References
References
Cohen, Sara, Benny Kimelfeld, and Yehoshua Sagiv (2009).
“Running tree automata on probabilistic XML”. In: Proc.
PODS. ACM, pp. 227–236.
21/21