Here - Julia Srl

A Thread-Safe Library for Binary Decision Diagrams
Alberto Lovato
Damiano Macedonio
Fausto Spoto
Dipartimento di Informatica, Universit`
a
di Verona, Verona, Italy
Julia Srl, Verona, Italy
1 / 46
Binary Decision Diagrams
The Problem
Existing Libraries
Our Solution
Implementation
Basics
Concurrency
Parallel Benchmarks
Time
Memory
Sequential Benchmarks
2 / 46
Binary Decision Diagrams
Binary Decision Diagrams (BDDs) are compact data structures for
Boolean function representation
They are state-of-the-art structures, allowing fast manipulation of
Boolean functions
Example: BDD for the function f (x1, x2) = x1 ∨ x2
round nodes contain variables
a truth assignment defines a path
variable is false(0) → dashed edge
(leads to the low child)
variable is true(1) → solid edge (leads
to the high child)
boxes are final values for the function
3 / 46
x1
x2
0
1
The problem and our solution
4 / 46
The Context
The information flow analysis of our Julia program analyzer uses
BDDs to encode program information
Several analyses (for different information sources) can be
performed in parallel—significant data duplication between
analyses!
Indeed, statements of the program are mapped to Boolean
functions (BDDs), but only a small number of them differ when
the flow source type changes
5 / 46
The Problem
We experienced high memory consumption analyzing large
codebases, due to JavaBDD, the BDD library we used
The capability to share the memory holding duplicate BDDs
appeared to be a desirable property
We needed a better alternative to JavaBDD
6 / 46
Existing Libraries
C libraries BuDDy, CUDD, . . .
Difficult to integrate in a Java environment
Not cross-platform
Java libraries JavaBDD, JDD
Unsatisfactory time (JDD) and memory (JavaBDD)
performance
7 / 46
Our Solution
We started developing a new library for BDD manipulation. . .
. . . aiming at the best performance, as well as at thread-safety
8 / 46
Our Solution
Two words about concurrency
Single BDD operations are not easily parallelizable
So we focused on making BDD operations executing in parallel
with each other
9 / 46
Implementation
10 / 46
Implementation - Basics
Data Structures
Shared BDD all the BDDs in an application can
be considered subgraphs of a single
graph, called shared BDD; a
particular BDD is selected by
starting evaluation from its root
node
Unique table contains all the nodes of the
shared BDD, without
duplicates—implemented as a hash
table
Computed table contains results of recent BDD
computations (a cache)
11 / 46
x1 AND x2
x1 OR x2
x1
x1
x2
0
0
1
2
3
4
M
M
2
1
1
1
-1
-1
0
0
2
-1
-1
1
2
1
...
...
...
...
...
Implementation - Basics
Resize and Garbage Collection
When there is no room to store a node in the unique table, it must be
extended.
At the same time, nodes not referenced by BDD objects can be
removed by a node garbage collection (GC)
Usually a counter of incoming references is used—a node with 0
references is dead
We implemented a mark & sweep garbage collection, instead:
removing all nodes not reachable from BDD roots (cleaner
implementation)
12 / 46
Implementation - Concurrency
The BeeDeeDee Library
We wanted to minimize memory footprint, by sharing BDD
memory in order to have no duplicate subgraphs
In other words, all threads in an application should use a single
Unique Table
This object must be accessed safely: we need to synchronize
accesses to the UT
13 / 46
Implementation - Concurrency
As our first attempt, we tried to synchronize accesses at a high
level
But this made computation virtually sequential
So we worked at a lower level, trying to iteratively reduce thread
contention as much as possible
14 / 46
Implementation - Concurrency
We used two strategies:
pools of locks evenly distributed between threads (implemented
with arrays)
optimistic behaviors such as using a lock only if a condition were
(or were not) met
15 / 46
Implementation - Concurrency
x1:BDD
Thread1
:UT
x3:BDD
Thread2
or(x2)
and(x6)
x1 ∨ x2
x3 ∧ x6
Several BDD operations can use the library at the same time,
but. . .
. . . they may need to request a node from the UT
16 / 46
Implementation - Concurrency
x1:BDD
Thread1
:UT
x3:BDD
or(x2)
Thread2
and(x6)
get(1,4,1)
get(3,8,1)
node index
node index
x1 ∨ x2
x3 ∧ x6
UT.get(var, low, high) searches for a node with the given
variable number, low and high children
The node is added if it is not in the UT
17 / 46
Implementation - Concurrency
Invocations of get(...) can run in parallel, but. . .
. . . some operations can change the hash value (Resize) or the
position (GC) of a node stored in the UT!
Resize increases the size, used to compute the hash value
GC compacts the table, overwriting rows representing dead nodes
So, get(...) must not run together with these operations
18 / 46
Implementation - Concurrency
GC
x1:BDD
Thread1
:UT
or(x2)
x1 ∨ x2
Each BDD operation is mutually exclusive wrt GC
19 / 46
Implementation - Concurrency
GC
x1:BDD
Thread1
:UT
or(x2)
getGCLock()
lock for GC
x1 ∨ x2
Before using the table, every BDD operation synchronizes on a
lock taken from a pool
20 / 46
Implementation - Concurrency
GC
Low probability of contention: each thread acquires a lock
contiguous to the last acquired lock
Before starting, GC has to acquire all locks
get(...) is called only in the context of a BDD operation, so we
can prove that it cannot run in parallel with GC
21 / 46
Implementation - Concurrency
Resize
For Resize, we adopted a finer synchronization
get(...) first checks if the node already exists in the table
This happens without synchronization—an optimistic behavior
Indeed, the hash value of the node might have changed by Resize,
but if it did not, we can find the node at the same position
22 / 46
Implementation - Concurrency
Resize
If the node does not exist, get(...) synchronizes on a lock taken
from a pool—implemented with an array—in a position dependent
from the hash value of the node—and adds it
23 / 46
Implementation - Concurrency
Resize
x1:BDD
:UT
get(1,4,1)
node index
Let’s zoom in a bit on a call of get(...). . .
24 / 46
Implementation - Concurrency
Resize
x1:BDD
:UT
get(1,4,1)
getResizeLock()
lock for Resize
node index
Before extending the table, get(...) synchronizes on a lock
taken from a pool
25 / 46
Implementation - Concurrency
Resize
Low probability of contention: each lock protects a different area
of the unique table
Before starting, Resize has to acquire all locks
26 / 46
Implementation - Concurrency
Caches
A similar pattern is used by cache accesses (get/put)
We implemented caches to improve performance of these operations:
Boolean operations
Quantification
Restriction
Variable replacement
Optimistic behavior: synchronization (lock dependent on position)
occurs only when
get finds a result in the expected position
put (ComputationCache only) does not find a result in the
expected position
27 / 46
Implementation - Concurrency
Hash Update
After GC compacted the UT or Resize changed UT’s size, hash
values of nodes need to be recomputed
For big tables, this can take a significant amount of time
So, the recomputation is done in parallel by more threads, each
acting on a different region of the UT
Here, too, a pool of locks is used to reduce contention
28 / 46
Implementation - Concurrency
We verified experimentally that using pools distributes locks evenly
between threads, leading to very low thread contention
29 / 46
Benchmark Programs
30 / 46
Benchmark Programs
12-queens problem
Often used for BDD packages
comparison
Goal: to find a placement of 12
queens on a 12x12 chessboard,
so that no two queens attack
each other
It involves the construction of a
big BDD representing
constraints on queens’ position,
using simple Boolean operations
31 / 46
Benchmark Programs
Transition relation BDD construction for sequential circuits
It uses Boolean operations to
combine current state variables
V with next state variables V 0
The relation is represented by
the function
N(V , V 0 ) =
0
(v00 ⇔ f0 (V )) ∧ . . . ∧ (vn−1
⇔
fn−1 (V )),
binding n new variables to n old
variables
32 / 46
Inputs
Internal
Inputs
Combinational
Logic
Memory
Outputs
Internal
Outputs
Benchmark Programs
Information flow analysis of the JFlex JAR file
Statements in a Java program are encoded as
BDDs and combined in bigger BDDs
In contrast with the other benchmarks, flow
analysis uses replacement and quantification
along with Boolean operations
33 / 46
Parallel Benchmarks
Time
34 / 46
Parallel Benchmarks - Time
Parallel 12-queens BDD construction on an Intel i7 quad-core
with hyper-threading
Time (s)
25
20
15
1
35 / 46
2
3
4
5
6
7
Number of 12-queens instances
8
Parallel Benchmarks - Time
Time (s)
Julia analyzer, parallel flow analyses, 1-4 sources of information
100
80
60
40
20
BeeDeeDee
JavaBDD
JDD
1
2
3
Number of sources
4
For BeeDeeDee, the same table is used concurrently by all threads
For others, each thread uses its own table
JDD is not optimized for variable replacement and quantification
36 / 46
Parallel Benchmarks
Memory
37 / 46
Parallel Benchmarks - Memory
4 parallel instances of 12-queens—Table size
1,000
40
20
0
Megabytes
Millions of nodes
60
14
56
500
0
336
1,120
e
e
eeDeJavaBDD
eeDeJavaBDD
BeeD
BeeD
Perfect data duplication: each instance constructs exactly the
same BDD
38 / 46
Parallel Benchmarks - Memory
4 parallel flow analyses—Memory traces
12
12
10
Memory (Gigabytes)
Memory (Gigabytes)
11.5 GB
9.5 GB
10
8
6
4
2
0
6
4
2
0
50
100
150
200
Time (s)
BeeDeeDee
39 / 46
8
250
300
50
100
150
Time (s)
JavaBDD
200
250
300
Parallel Benchmarks - Memory
4 parallel flow analyses—Memory footprint of BDD data
structures
Millions of nodes
6
Megabytes
100
4
2
2.2
0
5.5
Dee vaBDD
Ja
ee
BeeD
40 / 46
50
0
52.8
110
e
eeDeJavaBDD
BeeD
Parallel Benchmarks
The concurrent use of BeeDeeDee does not add much time
overhead, and decreases memory usage
But how does it compare to other BDD libraries in purely
sequential problems?
41 / 46
Sequential Benchmarks
42 / 46
Sequential Benchmarks
12-queens BDD construction
31
BeeDeeDee
JavaBDD
JDD
CUDD
BuDDy
39
33
34
30
0
43 / 46
5
10
15
20
25
30
35
40
Sequential Benchmarks
Transition relation BDD construction for three ITC99 circuits
JDD
JavaBDD
BeeDeeDee
3.59
3.89
5.59
b22-1
26.71
b22
32.24
31.12
1.9
2.04
4.04
b20
0
44 / 46
5
10
15
20
25
30
35
40
Sequential Benchmarks
Even for single-threaded applications, performance is very good, at
the same level of other libraries (sometimes faster, sometimes
slower)
It seems that we succeeded in developing a fast sequential BDD
library!
45 / 46
Thank you
Our library is free for non-commercial applications (GPLv3)
Commercial license available
It can be downloaded in source and binary formats from
http://www.juliasoft.com/beedeedee
Try the Queens! Example, for 3 parallel instances of 12-queens type
time java -jar beedeedee.jar 12 12 12
Warning: battery killer! [≤ 10]-queens are fast, so you can try e.g.
time java -jar beedeedee.jar 7 8 9 10
46 / 46