Introduction Known results Unordered documents Unambiguous labels Conclusion The Possibility Problem for Probabilistic XML Antoine Amarilli Télécom ParisTech, Paris, France June 5, 2014 1/21 Introduction Known results Unordered documents Unambiguous labels Conclusion Probabilistic XML We are unsure about the exact contents of an XML document. directory person person address ind x 0.6 John Doe 42 Foo St. mobile name 1234 name 0.8 mux 0.2 address x 42 Foo St. Jane Doe Jean Doe Semantics: probability distribution over deterministic documents. 2/21 Introduction Known results Unordered documents Unambiguous labels Conclusion Local formalisms: possible worlds semantics r ind α β a b ⇒ (1 − α)(1 − β) r α(1 − β) r (1 − α)β r αβ r a b a b 3/21 Introduction Known results Unordered documents Unambiguous labels Conclusion Local formalisms: possible worlds semantics r ind α β a b ⇒ r mux α β a b ⇒ (1 − α)(1 − β) r 1−α−β r α(1 − β) r (1 − α)β r αβ r a b a b α r β r a b 3/21 Introduction Known results Unordered documents Unambiguous labels Conclusion Local formalisms: possible worlds semantics r ind α β a b ⇒ r mux α β a b ⇒ r det a b ⇒ (1 − α)(1 − β) r 1−α−β r α(1 − β) r (1 − α)β r αβ r a b a b α r β r a b 1 r a b Caution: we impose α < 1, β < 1 in ind. 3/21 Introduction Known results Unordered documents Unambiguous labels Conclusion Event formalisms Probability distribution on events x y 0.7 0.4 r Draw events independently Edges annotated with formulae on the events Edges with false formulae are removed x ¬x ∧ y a b ⇒ mie: multivalued events (see later) ⇒ cie: conjunctions of Boolean events ⇒ fie: formulae of Boolean events 4/21 Introduction Known results Unordered documents Unambiguous labels Conclusion Possibility problem (Poss) Given: a probabilistic document D a deterministic document W Is W a possible world of D? If yes, with which probability? Diverse probabilistic formalisms, ordered and unordered Like query evaluation but: ⇒ Need inequality: “don’t collapse nodes” ⇒ Need negation: “no additional things” ⇒ Query depends on input W ⇒ Specific bounds for this Poss problem? 5/21 Introduction Known results Unordered documents Unambiguous labels Conclusion Table of contents 1 Introduction 2 Known results 3 Unordered documents 4 Unambiguous labels 5 Conclusion 6/21 Introduction Known results Unordered documents Unambiguous labels Conclusion In NP, in FP#P Guess a valuation of the events Guess a match of W in D Check that the match is realized by the valuation ⇒ Likewise, probability computation is in FP#P ⇒ Of course Poss is NP-hard for fie 7/21 Introduction Known results Unordered documents Unambiguous labels Conclusion Tractable for ordered local documents Local choices and ordered documents Possibility decision and computation are in PTIME Intuitively: match each possible subsequences of siblings dynamic algorithm for match at each level ⇒ Implied by determininstic tree automata on probabilistic XML: Cohen, Kimelfeld, and Sagiv 2009 ⇒ Assumption of order is crucial 8/21 Introduction Known results Unordered documents Unambiguous labels Conclusion Table of contents 1 Introduction 2 Known results 3 Unordered documents 4 Unambiguous labels 5 Conclusion 9/21 Introduction Known results Unordered documents Unambiguous labels Conclusion Computation is #P-hard for ind or mux 1 2 3 1 2 3 D r ⊤ 1/2 a1 1/2 a2 W r ⊤ ⊤ 1/2 1/2 a3 a2 ⊤ ⊤ ⊤ a1 a2 a3 1/2 a3 ⇒ Probability of match times 2n : number of perfect matchings ⇒ Computation is #P-hard for unordered and ind or mux 10/21 Introduction Known results Unordered documents Unambiguous labels Conclusion Decision is in PTIME for ind or mux Compute bottom-up if a node has the empty possible world Check dynamically between all nodes of D and W ⇒ Build bipartite graph based on child compatibility ⇒ Add dummy nodes for deletions of nodes that can be deleted ⇒ Check in PTIME if graph has a perfect matching D r X1 X2 ind 1/2 a 1/2 b ind 1/2 b W r ind 1/2 c G X1 X2 X3 X4 a b c X3 X4 del 11/21 Introduction Known results Unordered documents Unambiguous labels Conclusion Decision is NP-hard for any two of ind, mux, det With det, reduction from exact cover S = {Si }, Si = {sij } ∪ ∪ Is there T ⊆ S such that T = S with no dupes? D r S = {{a, b}, {a, c}, 1/2 {b}} 1/2 det a W r 1/2 det det b a c a b c b 12/21 Introduction Known results Unordered documents Unambiguous labels Conclusion Decision is NP-hard for any two of ind, mux, det (cont’d) With ind and mux, reduction from SAT F = (a ∨ b ∨ ¬c) ∧ (a ∨ c) ∧ (¬a) D r mux 1/2 ind 1/2 mux mux 1/2 1/2 1/2 1/2 1/2 ind 1/2 1/2 c1 c2 W r c3 ind ind 1/2 ind ind 1/2 1/2 c1 c2 c1 c2 c3 c3 13/21 Introduction Known results Unordered documents Unambiguous labels Conclusion Table of contents 1 Introduction 2 Known results 3 Unordered documents 4 Unambiguous labels 5 Conclusion 14/21 Introduction Known results Unordered documents Unambiguous labels Conclusion Unambiguity D is unambiguous if node labels are unique Possible refinements (unique among siblings, etc.) ⇒ There is at most one way to match W! All local models tractable (can impose order) ⇒ Can we have correlations? 15/21 Introduction Known results Unordered documents Unambiguous labels Conclusion Still NP-hard for cie F= ∧∨ i j ±xj in CNF ∧ ∧ Equivalently: i ¬ j ∓xij i ∧ D r ∧ 1 j ∓xj a1 W r ... 1 j ∓xj an ⇒ W is a possible world of D iff F is satisfiable ⇒ Decision for Poss is NP-hard 16/21 Introduction Known results Unordered documents Unambiguous labels Conclusion The mie class Var Val Prob x x x x 1 2 3 4 0.6 0.2 0.1 0.1 No conjunctions allowed y y 1 2 0.5 0.5 Intractable if ambiguous mie: Multivalued independent events Captures mux Doesn’t capture det or ind hierarchies ⇒ If non-ambiguous, do we have tractability? 17/21 Introduction Known results Unordered documents Unambiguous labels Conclusion mie tractable on non-ambiguous documents Var Val Prob x x x x 1 2 3 4 0.6 0.2 0.1 0.1 y y 1 2 0.5 0.5 D r y=2 x=1 b a c x=2 y=1 y=2 d e f g W r a b d x ̸= 2, x ̸= 1, y = 2, y ̸= 1 x ∈ {3, 4}, y ∈ {2}. ⇒ Probability 0.1. 18/21 Introduction Known results Unordered documents Unambiguous labels Conclusion Table of contents 1 Introduction 2 Known results 3 Unordered documents 4 Unambiguous labels 5 Conclusion 19/21 Introduction Known results Unordered documents Unambiguous labels Conclusion Conclusion Ordered local models are tractable Unordered local models are tractable ⇒ For decision only, and ⇒ With only mux or only ind mie is tractable on unambiguous documents Other cases are hard 20/21 Introduction Known results Unordered documents Unambiguous labels Conclusion Conclusion Ordered local models are tractable Unordered local models are tractable ⇒ For decision only, and ⇒ With only mux or only ind mie is tractable on unambiguous documents Other cases are hard ⇒ Height does not matter ⇒ Probabilities do not matter ⇒ Can we refine mie, unambiguity, mux–ind interaction? ⇒ What if D is partially ordered? 20/21 Introduction Known results Unordered documents Unambiguous labels Conclusion Conclusion Ordered local models are tractable Unordered local models are tractable ⇒ For decision only, and ⇒ With only mux or only ind mie is tractable on unambiguous documents Other cases are hard ⇒ Height does not matter ⇒ Probabilities do not matter ⇒ Can we refine mie, unambiguity, mux–ind interaction? ⇒ What if D is partially ordered? Thanks for your attention! 20/21 Additional material References References Cohen, Sara, Benny Kimelfeld, and Yehoshua Sagiv (2009). “Running tree automata on probabilistic XML”. In: Proc. PODS. ACM, pp. 227–236. 21/21
© Copyright 2024 ExpyDoc