BDD/ZDD を基盤とする種々の離散構造の演算処理と代数系 (algebra

Title
Author(s)
Citation
Issue Date
BDD/ZDDを基盤とする種々の離散構造の演算処理と代数
系(algebra)について
湊, 真一
2010年度科学技術振興機構ERATO湊離散構造処理系プロ
ジェクト講究録. p.148-157.
2011-06
DOI
Doc URL
http://hdl.handle.net/2115/48463
Right
Type
conference presentation
Additional
Information
File
Information
22_all.pdf
Instructions for use
Hokkaido University Collection of Scholarly and Academic Papers : HUSCAP
ERATO セミナ 2010 - No. 22
BDD/ZDD を基盤とする種々の離散構造の演算処理
と代数系 (algebra) について
湊 真一
北大情報科学研究科/JST ERATO 湊離散構造プロジェクト
2010/10/22
概要
本 ERATO プ ロ ジ ェ ク ト で は 、離 散 構 造 を 統 合 的 に 扱 う 基 本 処 理 系 と し て
BDD/ZDD を位置づけ,分野横断的な応用を持つ技術体系として再構築することを
目指して研究活動を開始している.本講演では,BDD/ZDD を基盤とする離散構造
とその演算処理の代数系 (algebra) の例として,BDD による論理関数処理系, ZDD
による組合せ集合の処理系,ZDD ベクトル表現による組合せ頻度表の演算処理系,
および Sequence BDD による系列集合の演算処理系を取り上げ、それらの代数系を
比較するとともに、今後の展望について述べる。
148
149
2010年度 ERATO湊離散構造処理系プロジェクト講究録
Background
„
Recent Topics on Decision Diagrams and
Discrete Structure Manipulation
BDD-based algorithms have been developed mainly in
VLSI logic design area. (since early 1990’s.)
„
„
„
„
„
Shin-ichi Minato
Graduate School of Information Science and Technology
Equivalence checking for combinational circuits.
Symbolic model checking for logic / behavioral designs.
Logic synthesis / optimization.
Test pattern generation.
Recently, BDDs are applied for not only VLSI design
but also for more general purposes.
„
Hokkaido University, Japan.
„
Data mining (Fast frequent itemset mining)
[Minato2005,2008][Loekito,Bailey2006]
Computation of Bayesian networks for probabilistic system
analysis.[Minato2007]
Sep. 17, 2010
Contents of this talk
„
„
„
„
„
„
„
Frequent itemset mining
ZDD vector and “Itemset-histogram algebra”
„
„
Set of sequences and ZDD
Sequence BDD and “sequence family algebra”
„
„
Sep. 17, 2010
3
Set of sequences and ZDD
Sequence BDD and “sequence family algebra”
Our project for discrete structure manipulation system
„
Shin-ichi Minato
Frequent itemset mining
ZDD vector and “Itemset-histogram algebra”
Sequence BDD for set of sequences
„
JST “ERATO” project and current status
BDD and Boolean function algebra
ZDD and “Family algebra”
Database analysis based on ZDD manipulation
„
Our project for discrete structure manipulation system
„
BDD and ZDD for discrete structure manipulation
„
Sequence BDD for set of sequences
„
„
„
BDD and Boolean function algebra
ZDD and “Family algebra”
Database analysis based on ZDD manipulation
„
„
Contents of this talk
BDD and ZDD for discrete structure manipulation
„
JST “ERATO” project and current status
Sep. 17, 2010
BDD reduction rules
Graph representation of Boolean function data.
Canonical form obtained by applying reduction rules
to a binary tree with a fixed variable ordering.
„
a
a
c
c
0
1
0
c
c
c
0
1
0
1
1
0
Binary decision tree
equivalent to truth table
Sep. 17, 2010
f
1
reduction
1
1
x
0
b
b
x
x
x
(share)
f0
f1
f0
f1
(jump)
f
Eliminate all redundant nodes.
b
Share all equivalent nodes.
1
0
Gives
Gives aa unique
unique and
and compressed
compressed representation
representation
for
for aa given
given Boolean
Boolean function
function
under
under aa fixed
fixed variable
variable ordering.
ordering.
1
Reduced Ordered BDD
Shin-ichi Minato
4
Shin-ichi Minato
BDD (Binary Decision Diagram) [Bryant86]
„
2
Shin-ichi Minato
5
Sep. 17, 2010
150
Shin-ichi Minato
6
Effect of BDD reduction rules
„
BDD-based logic operation algorithm
„
Exponential advantage can be seen in extreme cases.
„
Depends on instances, but effective for many practical ones.
„
If we generate BDDs from the binary tree:
always requires exponential time & space.
(Æ impracticable for large number of variables)
Innovative BDD synthesis algorithm
„
„
Proposed by R. Bryant in 1986.
Best cited paper for many years in EE&CS areas.
F
F and G
AND
(Reduced) BDD
BDD
BDD
BDD (Reduced)
G
(Reduced) BDD
BDD
A BDD can be constructed from the two operands of BDDs.
(Computation time is linear to BDD size.)
O(n)
O(2n)
Sep. 17, 2010
Shin-ichi Minato
7
Sep. 17, 2010
Boolean function and combinatorial itemset
a b c F
0
1
0
1
0
1
0
1
0
0
1
1
0
0
1
1
0
0
0
0
1
1
1
1
Sep. 17, 2010
Zero-suppressed BDD (ZDD) [Minato93]
„
Boolean function:
F = (a b ~c) V (~b c)
„
„
„
Eliminate all nodes whose “1-edge” directly points to 0-terminal.
Share equivalent nodes as well as ordinary BDDs.
If an item x does not appear in any itemset, the ZDD
node of x is automatically eliminated.
„
When average appearance ratio of each item is 1%, ZDDs are
more compact than ordinary BDDs, up to 100 times.
x
f
Complement set Æ logical NOT
Shin-ichi Minato
A variant of BDDs for combinatorial itemets.
Uses a new reduction rule different from ordinary BDDs.
„
0
Combinatorial itemset:
0
F = {ab, ac, c}
0
(customer’s choice)
1 Æ ab
1 Æ c „ Operations of combinatorial itemsets
can be done by BDD-based logic
1 Æ ac
operations.
0
„ Union of sets Æ logical OR
„ Intersection of sets Æ logical AND
0
„
„
9
Sep. 17, 2010
„
„
„
„
Shin-ichi Minato
0
f
Zero-suppressed reduction
10
Shin-ichi Minato
Knuth evaluated not only the data structure of ZDDs,
but more interested in the algebra on ZDDs.
㱢, {1}
P.top
P.onset(v)
P.offset(v)
P.change(v)
㺥, 㺤, 䌜
I honored to serve
proofreading of the draft
version of his article.
Knuth recommended to use
“ZDD” instead of “ZBDD.”
He reorganized ZDD
operations and named
“Family Algebra.”
2010/05, I visited Knuth’s
home and discussed the
direction of future work.
13 Sep. 17, 2010
f
f
(jump)
Algebraic operations for ZDDs
The latest Knuth’s book fascicle (Vol. 4-1) includes a
BDD section with 140 pages and 236 exercises.
In this section, Knuth used 30 pages for ZDDs,
including more than 70 exercises.
„
x
(jump)
Ordinary BDD reduction
BDDs/ZDDs in the Knuth’s book
„
8
Shin-ichi Minato
P.count
P*Q
P/Q
P%Q
Empty and singleton set. (0/1-terminal)
Returns the item-ID at the top node of P.
Selects the subset of itemsets
including or excluding v.
Switching v (add / delete) on each itemset.
Returns union, intersection, and
difference set.
Counts number of combinations in P.
Cartesian product set of P and Q.
Quotient set of P divided by Q.
Reminder set of P divided by Q.
Formerly I called this “unate cube set algebra,”
but Knuth reorganized as “Family algebra.”
11
Sep. 17, 2010
151
Shin-ichi Minato
Basic operations
(Corresponds to
Boolean algebra)
New operations
introduced by
Minato.
Useful
Usefulfor
formany
many
practical
practicalapplications.
applications.
12
Comparison of BDDs and ZDDs
Principles for performance improvement
„
„
General principles:
Data compression & Pruning search space.
Two basic techniques for data compression:
s
s
„
Direct
processing
(naïve data)
Sep. 17, 2010
compress
compressed
Shin-ichi Minato(output data)
x
= Efficient BDD/ZDD
algebraic operations.
13
„
„
„
„
BDD and Boolean function algebra
ZDD and “Family algebra”
„
„
Frequent itemset mining
ZDD vector and “Itemset-histogram algebra”
Set of sequences and ZDD
Sequence BDD and “sequence family algebra”
Our project for discrete structure manipulation system
„
„
JST “ERATO” project and current status
Sep. 17, 2010
Shin-ichi Minato
15
Sep. 17, 2010
„
Frequent itemset mining is one of the fundamental
data mining problems.
Previous work:
„
„
„
„
„
„
Apriori [Agrawal1993]
First efficient method of enumerating all frequent patterns.
Breadth-first search with dynamic programming.
Eclat [Zaki1997]
Depth-first search algorithm. Less memory consuming.
In some cases, faster than Apriori.
FP-growth [Han2000]
Depth-first search using “FP-tree,” graph-based data
structure.
„
Shin-ichi Minato
0
f
x
x
x
f0
f1
f0
f
(share)
f1
14
Tuple
abc
ab
abc
bc
ab
abc
c
abc
abc
ab
bc
Frequency threshold = 10 { b }
Frequency threshold = 8
{ ab, a, b, c }
Frequency threshold = 7
{ ab, bc, a, b, c }
Frequency threshold = 5
{abc, ab, bc, ac, a, b, c }
Frequency threshold = 1
{abc, ab, bc, ac, a, b, c }
Shin-ichi Minato
16
LCM: [Uno2003]
Output-linear time algorithm of frequent itemset mining.
ZDD: [Minato93]
A compact graph-based representation for large-scale
sets of combinations.
Generates
Generates large-scale
large-scale frequent
frequent itemsets
itemsets on
on the
the main
main
memory,
with
a
very
small
overhead
from
the
memory, with a very small overhead from the original
original LCM.
LCM.
with a theoretical bound as output linear time.
known as one of the fastest implementation.
Sep. 17, 2010
Quasi-reduced
Quasi-reducedBDD
BDD
Sub-graph sharing
(Dictionary-based)
Naïve
representation
Shin-ichi Minato
Naïve representation
(jump)
Combination of
the two techniques
LCM (Linear time Closed itemset Miner) [Uno2003]
„
x
Asymmetric reduction
“LCM over ZDDs” [Minato et al. 2008]
Existing itemset mining algorithms
„
ZDD
ZDD
Basic and well-known problem in database analysis.
Record
ID
1
2
3
4
5
6
7
8
9
10
11
Sequence BDD for set of sequences
„
etc.
Frequent itemset mining
Database analysis based on ZDD manipulation
„
f
Sep. 17, 2010
BDD and ZDD for discrete structure manipulation
„
(jump)
f
Contents of this talk
„
- Machine learning & clustering
BDD
BDD
Symmetric reduction
(input data)
decompress
compressed
- Risk analysis
- Calculation with Bayesian networks
- VLSI Logic Design
- Formal Verification
- etc.
Run-length coding
(= BDD/ZDD redundant node deletion)
Computability without decompression.
(naïve data)
- Data mining & Knowledge discovery
- Market data analysis
- Web data analysis
(Symmetric world)
0[15]
Dictionary-based coding
(= BDD/ZDD sub-graph sharing)
(Asymmetric world)
Many of real-life
problems are likely
asymmetric.
- Formal concept analysis
000000000000000
10111
10111
„
(Æ Sub-linear time and space to the number of solutions
when ZDD compression works well.)
17
Sep. 17, 2010
152
Shin-ichi Minato
18
# solutions
LCM over ZDDs: An example
LCM over ZDDs
Original LCM
The results of frequent itemsets are obtained as ZDDs
on the main memory. (not generating a file.)
„
Record
ID
1
2
3
4
5
6
7
8
9
10
11
Tuple
F
abc
ab
abc
bc
ab
abc
c
abc
abc
ab
bc
a
0
LCM over ZDDs
Freq. thres. 㱍 = 7
0
{ ab, bc, a, b, c }
c
0
b
1
b
1
c
1
1
0
0
Sep. 17, 2010
1
0
1
19
Shin-ichi Minato
Sep. 17, 2010
Performance of LCM over ZDDs
Post Processing after LCM over ZDDs
previous method (LCM-dump)
new method (LCM over ZDDs)
㪋㪇㪇
3843.06
Dataset
Dataset11
㪊㪌㪇
LCM over
ZDDs
CPU time (sec)
㪊㪇㪇
Dataset
Dataset22
㪉㪇㪇
ZDD
ZDD
?
ZDD
ZDD
LCM over
ZDDs
㪉㪌㪇
ZDD algebraic
operation
ZDD
ZDD
Distinctive
Frequent
Itemsets
Freq.
AllAll
Frequent
Itemsets
㪈㪌㪇
㪈㪇㪇
㪌㪇
„
㪇
m
20
Shin-ichi Minato
m
oo
hr
us
4D
0I
T1
0K
10
measured by a Linux PC,
Core2Duo E6600, 2.4GHz, 2GB memory.
Sep. 17, 2010
BM
We can extract distinctive itemsets by comparing
frequent itemsets for multiple sets of databases.
„
-1
ew
Vi
eb
W
S-
s t
e s e c sb - 2
ch nn um iew
co p ebV
Shin-ichi Minato
BM
W
S-
21
Various ZDD algebraic operations can be used for the
comparison of the huge number of frequent itemsets.
Sep. 17, 2010
Itemset-histograms for DB analysis
ZDD Vectors and Multi-Terminal BDD
We sometimes need to
represent not only
yes/no but also the
frequency (number of
Itemset-histogram occurrence) for each
itemset.
Itemset Freq.
„
Itemset DB
Record
ID
1
2
3
4
5
6
7
8
9
10
11
Sep. 17, 2010
Itemset
abc
ab
abc
bc
ab
abc
c
abc
abc
ab
bc
abc
5
ab
3
bc
2
c
1
„
„
„
Shin-ichi Minato
„
22
Shin-ichi Minato
„
„
The two representation can be considered.
Just a variable ordering problem of a same BDD.
This is a histogram of
itemsets if frequency of
all itemsets are
generated.
Integer-valued Sum-OfProduct form. (VSOP)
Function from
Combination to Integer.
(CtoI)
Multi-set of combinations
23
a
a
0
F0
F1
F2
0
1
a
0
1
b
1
c
1
b
1 0
0
0
0
ZDD Vector
Sep. 17, 2010
153
0
a
1
1
b
b
b
0
1
0
c
c
c
c
1
0
1
1
2
3
5
Multi-Terminal BDD (MTBDD)
Shin-ichi Minato
24
Algebra for itemset-histograms
ZDD vector for itemset-histogram
„
„
H
A ZDD distinguishes only existence of each tuple in the
transaction data. (cannot count frequency.)
We use a binary encoded method with ZDD vectors:
„
Tuple Freq.
Encode frequency numbers into m-bit binary code, and
represent each bit of combination set using a ZDD.
(଀) itemset frequency F2 F1 F0
abc
5 (101)
1
0
1
ab
3 (011)
0
1
1
bc
2 (010)
0
1
0
c
1 (001)
0
0
1
a
0
0
1
Sep. 17, 2010
0
1
b
1
1 0
0
0
F0 = {abc, ab, c}
F1 = {ab, bc}, F2 = {abc}
a
b
c
5
ab
3
F0
F1
F2
abc
bc
2
c
1
1
b
0
1
c
1
0
0
b
„
„
bcd
2
cd
1
„
„
27
„
„
Itemset-histogram
operation
LCM over
ZDDs
ZDDV
ZDDV
All Freq.
Frequent
Itemsets
Itemset-histograms
Distinctive
Itemsets
Extracting long/short frequent patterns.
Comparison of two sets of frequent patterns.
Calculating statistical data (e.g. confidence, support)
Finding disjoint sub-factors in frequent patterns.
Shin-ichi Minato
28
Sets of combinations:
„
„
„
Frequent itemset mining
ZDD vector and “Itemset-histogram algebra”
„
„
„
„
„
29
Distinguishes all finite sequences.
{㱗}, { ab, aba, bbc }, { a, aa, aaa, aaaa }, etc.
Here we exclude infinite sequences such as { a* }.
So many real-life applications.
„
JST “ERATO” project and current status
Don’t consider order and duplication of items
“abcc” and “bca” are the same.
Sets of sequences:
„
Set of sequence and ZDD
Sequence BDD and “sequence family algebra”
Shin-ichi Minato
ZDD
ZDD
Sets of sequences (sets of strings)
BDD and Boolean function algebra
ZDD and “Family algebra”
Sep. 17, 2010
?
ZDDV
ZDDV
Sep. 17, 2010
Our project for discrete structure manipulation system
„
LCM over
ZDDs
The result can be analyzed flexibly using itemsethistogram operations.
Addition of number of occurrences.
Sequence BDD for set of sequences
„
26
Shin-ichi Minato
Post Processing after LCM over ZDD(V)s
Database analysis based on ZDD manipulation
„
„
Factoring into two parts
by an item.
Attaching an item.
Sum of two histograms.
Counting lines in the table.
- Each table is compactly represented by ZDDs.
- ZDDs are shared each other in the memory.
Sep. 17, 2010
BDD and ZDD for discrete structure manipulation
„
1
3
Contents of this talk
„
c
„
d
„
„
2
1
Primitive operations:
ab
„
„
bc
„
5
„
„
3
Freq.
„
„
3
H0=H.factor0(a)
Dataset
Dataset22
Shin-ichi Minato
7
b
c
Dataset
Dataset11
Sep. 17, 2010
bc
Tuple Freq.
Basic operations of Itemset-histogram algebra
P+Q
5
Tuple
1
25
Tuple Freq.
abcd
1
Shin-ichi Minato
H1 + H0
Tuple Freq.
bc
H*d
a
H1=H.factor1(a)
Text search and indexing
Web (html/xml) data mining
Bio informatics
Sep. 17, 2010
154
Shin-ichi Minato
30
Sequence BDD (SeqBDD)
Encoded ZDDs for Sets of sequences
„
„
„
„
Pair of (Item - position) is
considered different symbol.
“aaa” Æ “a1 a2 a3”
“aba” Æ “a1 b2 a3”
Alphabet size: |㰿|
Maximum length of sequences: n
Total encoded symbols: |㰿|㬍n
Not very efficient.
„
„
Loekito, Bailey, and Pei (2009)
„
„
„
„
„
Same as ZDD reduction rule.
Only 0-edges keep variable ordering.
1-edges has no restriction.
Still unique representation for a given
set of sequences.
Each path from root to 1-terminal
corresponds to a sequence.
Many symbols needed.
We need to put a fixed maximum
length of sequences.
Sep. 17, 2010
Shin-ichi Minato
31
Sep. 17, 2010
32
Shin-ichi Minato
Basic operations of sequence family algebra
ZDD-like algebraic operations.
„
„
„
onset, offset, and push operations are different.
Other operations are almost same.
Sep. 17, 2010
Shin-ichi Minato
Suffix-DD
Sep. 17, 2010
33
„
„
„
Frequent itemset mining
ZDD vector and “Itemset-histogram algebra”
Sequence BDD for set of sequences
„
„
„
z Executed by JST
(Japan Science and Technology Agency).
z 5 projects / Year are accepted from all scientific subjects.
(Computer Science: 0 or 1 project / Year.)
z 100 projects have been accepted in 30 years.
z Each project has 5 years long, total 1 billion Yen.
about 10 PD researchers and 3 admin staffs.
BDD and Boolean function algebra
ZDD and “Family algebra”
Database analysis based on ZDD manipulation
„
„
„ Top projects for scientific research in Japan.
BDD and ZDD for discrete structure manipulation
„
Set of sequences and ZDD
Sequence BDD and “sequence family algebra”
„ This project is accepted on Oct. 2009.
z Research activities started from April 2010,
until March 2015.
Our project for discrete structure manipulation system
„
JST “ERATO” project and current status
Sep. 17, 2010
Shin-ichi Minato
34
ERATO Projects
Contents of this talk
„
Shin-ichi Minato
35
36
155
Discrete structures and applications
Technical stance of our project
Many problems solved by computers can be decomposed as a
type of discrete structures using simple primitive operations.
design
designautomation
automation
data
datamining
mining//knowledge
knowledgediscovery
discovery
fault
machinelearning
learning//classification
classification
bioinformatics
informatics machine
faultanalysis
analysis bio
constraint
constraintsatisfaction
satisfactionproblem
problem
web
webdata
dataanalysis
analysis
Our
Ourmain
main
subject
subject
Discrete structure
manipulation system
set
settheory
theory
symbolic
inductiveproof
proof
symboliclogic
logic inductive
combinatorics
probabilistictheory
theory
graphtheory
theory probabilistic
combinatorics graph
Knowledge
discovery &
data mining
Digital system
optimization &
verification
Æ Often needs a huge amount of enumerative operations.
Application-specific
technical areas.
So many
applications
Æ Great effects
for the society.
Application
(Engineering)
Our objective layer:
- Not only concept / theory, but also
efficiency of implementation.
- Beauty of simplicity / universality.
Performance
Improvement
(10–100x)
Application
(Engineering)
Discrete structure
manipulation system
(“Art” / algebraic architecture)
Basis of CS & math.
(conceptual, theoretical)
Foundational
materials for
C.S. and math.
Application
(Engineering)
Statistical
Analysis &
modeling
Computation theory
(Science / Mathematics)
37
38
Direction of our research
Primary
output Applications
in asymmetric world
Current
Data mining, Machine learning
ZDDs
Advanced searching
Location of laboratories
„ Main lab is located in Hokakido Univ.
z Center of attractive city: Sapporo.
z 300m2 space devoted for the project.
z Convenient access from the airport
and the station.
Æ Good environment for the members
to concentrate into research.
Still many applications
remains where ZDDs
would be effective.
etc.
(Combinatorial)
(Higher model)
Further outputs
- Multisets
Advanced
Advanced
- Sequences
ZDD-like
Advanced
ZDD-like
- Permutations structure
ZDD-like
structure
- Partitions
structure
- Trees, DAGs
- Networks
Develop special new
- etc.
algebraic operations.
Applications
with higher data model
Sapporo
„ Satellite labs located in Tokyo
and Osaka.
Sequence data analysis
z Collaboration with many
univ and companies.
z Connecting by highquality tele-conf.
system.
Numerical data processing
Processing of trees or
semi-structured data
Tokyo
Osaka
(Kyushu)
25
40
39
Visit to Knuth’s home
Outside of Knuth’s house
„ In the faculty’s house area of Stanford campus
on an 80 years’ lease (1968䌾2048)
„ May, 2010, during my trip to US
for attending SIAM Data Mining Conference
„ Visited Knuth’s home in Stanford Univ. campus.
He
Hehas
hasaaroom
roomininUniv.,
Univ.,but
but
he
hemainly
mainlyworks
worksininhis
hishome.
home.
z Greetings and thanks for
personal check giving
for me and our students
who found error of the Book.
z Discussion on future work
on BDD/ZDD and other
discrete structures.
No
Noeducational
educationalwork.
work.Only
Only
attends
attendsannual
annualspecial
speciallecture
lecture
And
Andweekly
weeklyfaculty’s
faculty’slunch.
lunch.
43
44
156
Knuth’s pipe-organ and his work room
Bookshelves
45
46
Material for his book
Kyoto-award medal and display frame
47
48
Discussion with Prof. Knuth
Summary
„ Focus on “discrete structure manipulation system.”
z Fundamentals for various practical applications.
„ Sequence BDD and applications.
„ Based on BDDs/ZDDs.
z He is very interested in the new data structure.
z Representing “logic” and “set,” primitive models of discrete
structures.
z We will consider higher-level algebraic structures.
„ I told him that I got too busy to manage the project
since ZDD is written in Knuth book.
„ Technical stance of the project.
z He recognizes his writing may have significant effect,
and he said:
“I’m partly responsible to make your life change.
So, let’s discuss future work of your project.”
z Knuth’s proposals have same direction as our ERATO
project: “Finding and organizing higher-level algebraic
structures based on BDD/ZDD.”
z Producing “Art” to connect Science and Engineering.
„ Two objectives of the project.
z Organizing new algebraic structures / operations.
z Providing implementation with deep knowhow.
„ Best use of ERATO framework
z Organize an active research group with synergistic effect.
z Contribution to the society to provide good research results
as well as good researchers.
49
50
157