Translation of Tree-processing Programs into Stream-processing Programs based on Ordered Linear Type Koichi Kodama (Tokyo Institute of Technology) Kohei Suenaga (University of Tokyo) Naoki Kobayashi (Tohoku University) Two main methods of XML processing Tree-processing Views XML documents as tree structures Manipulates tree structure of input data Trees are kept on memory in many implementations e.g., DOM API, XDuce, CDuce Stream-processing Views XML documents as stream of tokens Read/Write tokens from/to streams e.g., SAX 2015/10/1 The Second Asian Symposium on Programming Language and Systems 2 Tree-processing node leaf node 3 4 leaf 3 leaf 2 memory leaf Programmers write here disk, network <node> <leaf>2</leaf> <leaf>3</leaf> </node> 2015/10/1 The Second Asian Symposium on Programming Language and Systems <node> <leaf>3</leaf> <leaf>4</leaf> </node> 3 Stream-processing memory disk, network Programmers write here <node> <node> <leaf>2</leaf> <leaf>3</leaf> </node> 2015/10/1 The Second Asian Symposium on Programming Language and Systems <leaf>3</leaf> <leaf>4</leaf> </node> 4 Example of programs in each style Incrementing the value of each leaf Tree-processing fix f. λt. case t of leaf x → leaf (x + 1) | node x1 x2 → node (f x1) (f x2) Stream-processing fix f. λt. case read() of leaf → write(leaf); write(read() + 1) | node → write(node); f (); f () 2015/10/1 The Second Asian Symposium on Programming Language and Systems 5 Comparison of two styles Tree-processing Stream-processing Readability, writability better worse Memory efficiency worse better 2015/10/1 The Second Asian Symposium on Programming Language and Systems 6 Our goal Taking the best of both world Readability and writability of tree-processing High memory efficiency of stream-processing Approach Automatic translation of tree-processing programs into stream-processing programs 2015/10/1 The Second Asian Symposium on Programming Language and Systems 7 Key observation (1/2) In order for stream-processing to be effective, a program should access an input tree from left to right, in the depth-first order. fix f. λt. case t of leaf x → leaf (x + 1) | node x1 x2 → node (f x2) (f x1) 2015/10/1 The Second Asian Symposium on Programming Language and Systems 8 Key observation (2/2) If a program accesses an input tree in that order, then there is a simple, structure-preserving translation into an equivalent stream-processing program. fix f. λt. case t of leaf x → leaf (x + 1) | node x1 x2 → node (f x1) (f x2) 2015/10/1 fix f. λt. case read() of leaf → write(leaf); write(read() + 1) | node → write(node); f (); f () The Second Asian Symposium on Programming Language and Systems 9 Our solution Use the idea of ordered linear types [Polakow 2000] Guarantee “correct” access order by assigning ordered linear types to input trees 2015/10/1 The Second Asian Symposium on Programming Language and Systems 10 Outline Syntax of languages Type system Translation algorithm Extension Related work Conclusion 2015/10/1 The Second Asian Symposium on Programming Language and Systems 11 Outline Syntax of languages Type system Translation algorithm Extension Related work Conclusion 2015/10/1 The Second Asian Symposium on Programming Language and Systems 12 Languages Source language Call-by-value λ-calculus + primitives for binary tree processing Target language Call-by-value λ-calculus + primitives for stream processing 2015/10/1 The Second Asian Symposium on Programming Language and Systems 13 Source language M (terms) ::= i (integer) | M1 + M2 (addition) |x (variable) | λx.M (abstraction) | M1 M2 (application) | fix f. M (recursive function) | leaf M (leaf with an integer) | node M1 M2 (branch) | (case M of (case analysis) leaf x → M1 | node x1 x2 → M2) 2015/10/1 The Second Asian Symposium on Programming Language and Systems 14 Target language e (terms) ::= i | e1 + e2 | () |x |λx.e | e1 e2 | fix f. e | leaf | node | read () | write e | (case e of leaf → e1 | node → e2) 2015/10/1 (integer) (addition) (unit) (variable) (abstraction) (application) (recursive function) (tag) (stream manipulation) (case analysis) The Second Asian Symposium on Programming Language and Systems 15 Representation of trees in each language node Source language node (leaf 1) (leaf 2) Target language node; leaf; 1; leaf; 2 leaf leaf 1 2 We do not consider closing tags because we only focus on binary trees for now 2015/10/1 The Second Asian Symposium on Programming Language and Systems 16 Example of programs Incrementing the value of each leaf Source language fix f. λt. case t of leaf x → leaf (x + 1) | node x1 x2 → node (f x1) (f x2) Target language fix f. λt. case read() of leaf → write(leaf); write(read() + 1) | node → write(node); f (); f () 2015/10/1 The Second Asian Symposium on Programming Language and Systems 17 Outline Syntax of languages Type system Translation algorithm Extension Related work Conclusion 2015/10/1 The Second Asian Symposium on Programming Language and Systems 18 Type system Utilizes ordered linear type Properties of well-typed programs: access each node of the input tree exactly once in left-to-right, depth-first order do not construct trees on memory 2015/10/1 The Second Asian Symposium on Programming Language and Systems 19 Types τ (types) ::= Int (integer) | τ1→τ2 (function) | InTree (input tree) | OutTree (output tree) 2015/10/1 The Second Asian Symposium on Programming Language and Systems 20 Type judgment Γ | Δ├ M :τ (Non-ordered) type environment: Γ { x1:τ1, x2:τ2, ..., xn:τn } (τi ∈ {InTree, OutTree} ) Set of bindings Represents that each of x1, x2, ..., xn is accessed exactly once in the order Ordered linear type environment: Δ x1:InTree, x2:InTree, ..., xn:InTree Sequence of bindings 2015/10/1 The Second Asian Symposium on Programming Language and Systems 21 Example of type judgment Γ = f : InTree → OutTree Δ = x1 : InTree, x2 : InTree Γ | Δ├ node (f x1) (f x2) : OutTree Γ | Δ├ node (f x2) (f x1) : OutTree Γ | Δ├ node (f x1) (f x1) : OutTree 2015/10/1 The Second Asian Symposium on Programming Language and Systems 22 Typing rules Γ|Δ1├M1 :OutTree Γ|Δ2├M2 :OutTree (T-NODE) Γ|Δ1,Δ2├ node M1 M2 :OutTree Γ|Δ1├M :InTree Γ,x:Int|Δ2├M1 :τ Γ|x1:InTree, x2:InTree, Δ2├M2 :τ (T-CASE) Γ|Δ1,Δ2├ case M of leaf x → M1 |node x1 x2 → M2 :τ 2015/10/1 The Second Asian Symposium on Programming Language and Systems 23 Outline Syntax of languages Type system Translation algorithm Extension Related work Conclusion 2015/10/1 The Second Asian Symposium on Programming Language and Systems 24 Translation algorithm (1/2) A(i) = i A(x) = x A(M1 + M2) = A(M1) + A(M2) A(λx.M) = λx. A(M) A(M1 M2) = A(M1) A(M2) A(fix f.M) = fix f. A(M) 2015/10/1 The Second Asian Symposium on Programming Language and Systems 25 Translation algorithm (2/2) A(leaf M) = write(leaf);write(A(M)) A(node M1 M2) = write(node);A(M1);A(M2) A(case M of leaf x → M1 | node x1 x2 → M2) = (case A(M);read() of leaf → let x=read() in A(M1) | node → [()/x1,()/x2]A(M2)) 2015/10/1 The Second Asian Symposium on Programming Language and Systems 26 Correctness of the translation If φ | x : InTree├ M : OutTree (M, {x=V}) * W iff (A(M), <V>, ) * ( ( ), , <W>) input stream 2015/10/1 output stream The Second Asian Symposium on Programming Language and Systems 27 Outline Syntax of languages Type system Translation algorithm Extension Related work Conclusion 2015/10/1 The Second Asian Symposium on Programming Language and Systems 28 Extensions Introduce buffering of trees Introduce primitives for buffered tree construction and destruction Extend typing rules and translation algorithm for those primitives 2015/10/1 The Second Asian Symposium on Programming Language and Systems 29 Source language M (terms) ::= ... | mleaf M (buffered leaf) | mnode M1 M2 (buffered branch) | (mcase M of (case analysis mleaf x → M1 for buffered tree) | mnode x1 x2 → M2) 2015/10/1 The Second Asian Symposium on Programming Language and Systems 30 Target language e (terms) ::= ... | mleaf M (buffered leaf) | mnode M1 M2 (buffered branch) | (mcase M of (case analysis mleaf x → M1 for buffered tree) | mnode x1 x2 → M2) 2015/10/1 The Second Asian Symposium on Programming Language and Systems 31 Types τ (types) ::= ... | MTree Type of buffered trees (kept in non-ordered environment) 2015/10/1 The Second Asian Symposium on Programming Language and Systems 32 Example of type judgment Γ = f : MTree → OutTree Γ’ = g : InTree → OutTree Γ, x : MTree | φ├ node (f x) (f x) : OutTree Γ’ | x : InTree ├ node (g x) (g x) : OutTree 2015/10/1 The Second Asian Symposium on Programming Language and Systems 33 Typing rules Γ|Δ1├M : MTree Γ,x:Int|Δ2├M1 :τ Γ, x1:MTree ,x2:MTree | Δ2├M2 :τ (T-MCASE) Γ|Δ1,Δ2├ mcase M of mleaf x → M1 |mnode x1 x2 → M2 :τ 2015/10/1 The Second Asian Symposium on Programming Language and Systems 34 Translation algorithm A(mleaf M) = mleaf A(M) A(mnode M1 M2) = mnode A(M1) A(M2) A(mcase M of mleaf x → M1 | mnode x1 x2 → M2) = (mcase A(M) of leaf x → A(M1) | node x1 x2 → A(M2)) 2015/10/1 The Second Asian Symposium on Programming Language and Systems 35 Example of programs fix s2m. λt. case t of leaf x → mleaf x | node x1 x2 → mnode (s2m x1) (s2m x2) fix m2s. λt. mcase t of mleaf x → leaf x | mnode x1 x2 → node (m2s x1) (m2s x2) 2015/10/1 The Second Asian Symposium on Programming Language and Systems 36 Outline Syntax of languages Type system Translation algorithm Extension Related work Conclusion 2015/10/1 The Second Asian Symposium on Programming Language and Systems 37 Related work Attribute grammar based approach [Nakano et al. 2004] Translates tree-processing attribute grammars into stream-processing attribute grammars Pros: need not be conscious which part of stream to be buffered Cons: cannot specify evaluation order Introduction of side-effect may be problematic 2015/10/1 The Second Asian Symposium on Programming Language and Systems 38 Related work Deforestation [Wadler 1988] Remove intermediate trees from programs Put syntactic restrictions (treelessness) on programs Variables have to occur only once Only variables can be passed to functions Do not require order restrictions Translated programs may not suitable for stream-processing 2015/10/1 The Second Asian Symposium on Programming Language and Systems 39 Related work Ordered linear logic [Polakow 2000] Formalizes ordered linear logic Type system for memory allocation and data layout [Petersen et al. 2003] Uses ordered linear type to express spatial order on memory 2015/10/1 The Second Asian Symposium on Programming Language and Systems 40 Outline Syntax of languages Type system Translation algorithm Extension Related work Conclusion 2015/10/1 The Second Asian Symposium on Programming Language and Systems 41 Conclusion Proposed a method of translating treeprocessing programs to stream-processing ones utilizing ordered linear type 2015/10/1 The Second Asian Symposium on Programming Language and Systems 42 Future work Providing automatic insertion of buffering primitives Extension to general XML documents Dealing with rose trees Considering closing tags 2015/10/1 The Second Asian Symposium on Programming Language and Systems 43 Fin
© Copyright 2024 ExpyDoc