木構造処理プログラムから

Translation of Tree-processing Programs
into Stream-processing Programs
based on Ordered Linear Type
Koichi Kodama (Tokyo Institute of Technology)
Kohei Suenaga (University of Tokyo)
Naoki Kobayashi (Tohoku University)
Two main methods of XML processing
Tree-processing
Views XML documents as tree structures
Manipulates tree structure of input data
Trees are kept on memory in many implementations
e.g., DOM API, XDuce, CDuce
Stream-processing
Views XML documents as stream of tokens
Read/Write tokens from/to streams
e.g., SAX
2015/10/1
The Second Asian Symposium on Programming
Language and Systems
2
Tree-processing
node
leaf
node
3
4
leaf 3
leaf 2
memory
leaf
Programmers
write here
disk, network
<node>
<leaf>2</leaf>
<leaf>3</leaf>
</node>
2015/10/1
The Second Asian Symposium on Programming
Language and Systems
<node>
<leaf>3</leaf>
<leaf>4</leaf>
</node>
3
Stream-processing
memory
disk, network
Programmers
write here <node>
<node>
<leaf>2</leaf>
<leaf>3</leaf>
</node>
2015/10/1
The Second Asian Symposium on Programming
Language and Systems
<leaf>3</leaf>
<leaf>4</leaf>
</node>
4
Example of programs in each style
 Incrementing the value of each leaf
 Tree-processing
fix f. λt.
case t of
leaf x → leaf (x + 1)
| node x1 x2 → node (f x1) (f x2)
 Stream-processing
fix f. λt.
case read() of
leaf → write(leaf); write(read() + 1)
| node → write(node); f (); f ()
2015/10/1
The Second Asian Symposium on Programming
Language and Systems
5
Comparison of two styles
Tree-processing
Stream-processing
Readability, writability
better
worse
Memory efficiency
worse
better
2015/10/1
The Second Asian Symposium on Programming
Language and Systems
6
Our goal
Taking the best of both world
Readability and writability of tree-processing
High memory efficiency of stream-processing
Approach
Automatic translation of tree-processing
programs into stream-processing programs
2015/10/1
The Second Asian Symposium on Programming
Language and Systems
7
Key observation (1/2)
 In order for stream-processing to be effective, a
program should access an input tree from left to
right, in the depth-first order.
fix f. λt.
case t of
leaf x → leaf (x + 1)
| node x1 x2 → node (f x2) (f x1)
2015/10/1
The Second Asian Symposium on Programming
Language and Systems
8
Key observation (2/2)
 If a program accesses an input tree in that order,
then there is a simple, structure-preserving
translation into an equivalent stream-processing
program.
fix f. λt.
case t of
leaf x →
leaf (x + 1)
| node x1 x2 →
node (f x1) (f x2)
2015/10/1
fix f. λt.
case read() of
leaf →
write(leaf);
write(read() + 1)
| node →
write(node); f (); f ()
The Second Asian Symposium on Programming
Language and Systems
9
Our solution
Use the idea of ordered linear types
[Polakow 2000]
Guarantee “correct” access order by assigning
ordered linear types to input trees
2015/10/1
The Second Asian Symposium on Programming
Language and Systems
10
Outline
Syntax of languages
Type system
Translation algorithm
Extension
Related work
Conclusion
2015/10/1
The Second Asian Symposium on Programming
Language and Systems
11
Outline
Syntax of languages
Type system
Translation algorithm
Extension
Related work
Conclusion
2015/10/1
The Second Asian Symposium on Programming
Language and Systems
12
Languages
Source language
Call-by-value λ-calculus +
primitives for binary tree processing
Target language
Call-by-value λ-calculus +
primitives for stream processing
2015/10/1
The Second Asian Symposium on Programming
Language and Systems
13
Source language
M (terms) ::= i
(integer)
| M1 + M2
(addition)
|x
(variable)
| λx.M
(abstraction)
| M1 M2
(application)
| fix f. M
(recursive function)
| leaf M
(leaf with an integer)
| node M1 M2
(branch)
| (case M of
(case analysis)
leaf x → M1
| node x1 x2 → M2)
2015/10/1
The Second Asian Symposium on Programming
Language and Systems
14
Target language
e (terms) ::= i
| e1 + e2
| ()
|x
|λx.e
| e1 e2
| fix f. e
| leaf | node
| read () | write e
| (case e of
leaf → e1
| node → e2)
2015/10/1
(integer)
(addition)
(unit)
(variable)
(abstraction)
(application)
(recursive function)
(tag)
(stream manipulation)
(case analysis)
The Second Asian Symposium on Programming
Language and Systems
15
Representation of trees in each language
node
Source language
node (leaf 1) (leaf 2)
Target language
node; leaf; 1; leaf; 2
leaf
leaf
1
2
We do not consider closing tags
because we only focus on binary trees for now
2015/10/1
The Second Asian Symposium on Programming
Language and Systems
16
Example of programs
 Incrementing the value of each leaf
 Source language
fix f. λt.
case t of
leaf x → leaf (x + 1)
| node x1 x2 → node (f x1) (f x2)
 Target language
fix f. λt.
case read() of
leaf → write(leaf); write(read() + 1)
| node → write(node); f (); f ()
2015/10/1
The Second Asian Symposium on Programming
Language and Systems
17
Outline
Syntax of languages
Type system
Translation algorithm
Extension
Related work
Conclusion
2015/10/1
The Second Asian Symposium on Programming
Language and Systems
18
Type system
Utilizes ordered linear type
Properties of well-typed programs:
access each node of the input tree exactly once
in left-to-right, depth-first order
do not construct trees on memory
2015/10/1
The Second Asian Symposium on Programming
Language and Systems
19
Types
τ (types) ::= Int
(integer)
| τ1→τ2 (function)
| InTree
(input tree)
| OutTree
(output tree)
2015/10/1
The Second Asian Symposium on Programming
Language and Systems
20
Type judgment
Γ | Δ├ M :τ
(Non-ordered) type environment: Γ
 { x1:τ1, x2:τ2, ..., xn:τn }
(τi ∈ {InTree, OutTree} )
Set of bindings
Represents that
each of x1, x2, ..., xn is
accessed exactly once
in the order
Ordered linear type environment: Δ
x1:InTree, x2:InTree, ..., xn:InTree
Sequence of bindings
2015/10/1
The Second Asian Symposium on Programming
Language and Systems
21
Example of type judgment
Γ = f : InTree → OutTree
Δ = x1 : InTree, x2 : InTree
Γ | Δ├ node (f x1) (f x2) : OutTree
Γ | Δ├ node (f x2) (f x1) : OutTree
Γ | Δ├ node (f x1) (f x1) : OutTree
2015/10/1
The Second Asian Symposium on Programming
Language and Systems
22
Typing rules
Γ|Δ1├M1 :OutTree Γ|Δ2├M2 :OutTree
(T-NODE)
Γ|Δ1,Δ2├ node M1 M2 :OutTree
Γ|Δ1├M :InTree
Γ,x:Int|Δ2├M1 :τ
Γ|x1:InTree, x2:InTree, Δ2├M2 :τ
(T-CASE)
Γ|Δ1,Δ2├ case M of leaf x → M1 |node x1 x2 → M2 :τ
2015/10/1
The Second Asian Symposium on Programming
Language and Systems
23
Outline
Syntax of languages
Type system
Translation algorithm
Extension
Related work
Conclusion
2015/10/1
The Second Asian Symposium on Programming
Language and Systems
24
Translation algorithm (1/2)
A(i) = i
A(x) = x
A(M1 + M2) = A(M1) + A(M2)
A(λx.M) = λx. A(M)
A(M1 M2) = A(M1) A(M2)
A(fix f.M) = fix f. A(M)
2015/10/1
The Second Asian Symposium on Programming
Language and Systems
25
Translation algorithm (2/2)
A(leaf M) = write(leaf);write(A(M))
A(node M1 M2) =
write(node);A(M1);A(M2)
A(case M of leaf x → M1
| node x1 x2 → M2)
= (case A(M);read() of
leaf → let x=read() in A(M1)
| node → [()/x1,()/x2]A(M2))
2015/10/1
The Second Asian Symposium on Programming
Language and Systems
26
Correctness of the translation
If φ | x : InTree├ M : OutTree
(M, {x=V}) * W
iff
(A(M), <V>, ) * ( ( ), , <W>)
input stream
2015/10/1
output stream
The Second Asian Symposium on Programming
Language and Systems
27
Outline
Syntax of languages
Type system
Translation algorithm
Extension
Related work
Conclusion
2015/10/1
The Second Asian Symposium on Programming
Language and Systems
28
Extensions
Introduce buffering of trees
Introduce primitives for buffered tree
construction and destruction
Extend typing rules and translation algorithm for
those primitives
2015/10/1
The Second Asian Symposium on Programming
Language and Systems
29
Source language
M (terms) ::= ...
| mleaf M
(buffered leaf)
| mnode M1 M2
(buffered branch)
| (mcase M of
(case analysis
mleaf x → M1
for buffered tree)
| mnode x1 x2 → M2)
2015/10/1
The Second Asian Symposium on Programming
Language and Systems
30
Target language
e (terms) ::= ...
| mleaf M
(buffered leaf)
| mnode M1 M2
(buffered branch)
| (mcase M of
(case analysis
mleaf x → M1
for buffered tree)
| mnode x1 x2 → M2)
2015/10/1
The Second Asian Symposium on Programming
Language and Systems
31
Types
τ (types) ::= ... | MTree
Type of buffered trees
(kept in non-ordered environment)
2015/10/1
The Second Asian Symposium on Programming
Language and Systems
32
Example of type judgment
Γ = f : MTree → OutTree
Γ’ = g : InTree → OutTree
Γ, x : MTree | φ├ node (f x) (f x) : OutTree
Γ’ | x : InTree ├ node (g x) (g x) : OutTree
2015/10/1
The Second Asian Symposium on Programming
Language and Systems
33
Typing rules
Γ|Δ1├M : MTree
Γ,x:Int|Δ2├M1 :τ
Γ, x1:MTree ,x2:MTree | Δ2├M2 :τ
(T-MCASE)
Γ|Δ1,Δ2├ mcase M of mleaf x → M1
|mnode x1 x2 → M2 :τ
2015/10/1
The Second Asian Symposium on Programming
Language and Systems
34
Translation algorithm
A(mleaf M) = mleaf A(M)
A(mnode M1 M2) = mnode A(M1) A(M2)
A(mcase M of mleaf x → M1
| mnode x1 x2 → M2)
= (mcase A(M) of
leaf x → A(M1)
| node x1 x2 → A(M2))
2015/10/1
The Second Asian Symposium on Programming
Language and Systems
35
Example of programs
fix s2m. λt. case t of
leaf x → mleaf x
| node x1 x2 →
mnode (s2m x1) (s2m x2)
fix m2s. λt. mcase t of
mleaf x → leaf x
| mnode x1 x2 →
node (m2s x1) (m2s x2)
2015/10/1
The Second Asian Symposium on Programming
Language and Systems
36
Outline
Syntax of languages
Type system
Translation algorithm
Extension
Related work
Conclusion
2015/10/1
The Second Asian Symposium on Programming
Language and Systems
37
Related work
Attribute grammar based approach
[Nakano et al. 2004]
Translates tree-processing attribute grammars into
stream-processing attribute grammars
Pros: need not be conscious which part of stream
to be buffered
Cons: cannot specify evaluation order
Introduction of side-effect may be problematic
2015/10/1
The Second Asian Symposium on Programming
Language and Systems
38
Related work
Deforestation [Wadler 1988]
Remove intermediate trees from programs
Put syntactic restrictions (treelessness) on
programs
Variables have to occur only once
Only variables can be passed to functions
Do not require order restrictions
Translated programs may not suitable
for stream-processing
2015/10/1
The Second Asian Symposium on Programming
Language and Systems
39
Related work
Ordered linear logic [Polakow 2000]
Formalizes ordered linear logic
Type system for memory allocation and
data layout [Petersen et al. 2003]
Uses ordered linear type to express spatial
order on memory
2015/10/1
The Second Asian Symposium on Programming
Language and Systems
40
Outline
Syntax of languages
Type system
Translation algorithm
Extension
Related work
Conclusion
2015/10/1
The Second Asian Symposium on Programming
Language and Systems
41
Conclusion
Proposed a method of translating treeprocessing programs to stream-processing
ones utilizing ordered linear type
2015/10/1
The Second Asian Symposium on Programming
Language and Systems
42
Future work
Providing automatic insertion of buffering
primitives
Extension to general XML documents
Dealing with rose trees
Considering closing tags
2015/10/1
The Second Asian Symposium on Programming
Language and Systems
43
Fin