A Thread-Safe Library for Binary Decision Diagrams Alberto Lovato Damiano Macedonio Fausto Spoto Dipartimento di Informatica, Universit` a di Verona, Verona, Italy Julia Srl, Verona, Italy 1 / 46 Binary Decision Diagrams The Problem Existing Libraries Our Solution Implementation Basics Concurrency Parallel Benchmarks Time Memory Sequential Benchmarks 2 / 46 Binary Decision Diagrams Binary Decision Diagrams (BDDs) are compact data structures for Boolean function representation They are state-of-the-art structures, allowing fast manipulation of Boolean functions Example: BDD for the function f (x1, x2) = x1 ∨ x2 round nodes contain variables a truth assignment defines a path variable is false(0) → dashed edge (leads to the low child) variable is true(1) → solid edge (leads to the high child) boxes are final values for the function 3 / 46 x1 x2 0 1 The problem and our solution 4 / 46 The Context The information flow analysis of our Julia program analyzer uses BDDs to encode program information Several analyses (for different information sources) can be performed in parallel—significant data duplication between analyses! Indeed, statements of the program are mapped to Boolean functions (BDDs), but only a small number of them differ when the flow source type changes 5 / 46 The Problem We experienced high memory consumption analyzing large codebases, due to JavaBDD, the BDD library we used The capability to share the memory holding duplicate BDDs appeared to be a desirable property We needed a better alternative to JavaBDD 6 / 46 Existing Libraries C libraries BuDDy, CUDD, . . . Difficult to integrate in a Java environment Not cross-platform Java libraries JavaBDD, JDD Unsatisfactory time (JDD) and memory (JavaBDD) performance 7 / 46 Our Solution We started developing a new library for BDD manipulation. . . . . . aiming at the best performance, as well as at thread-safety 8 / 46 Our Solution Two words about concurrency Single BDD operations are not easily parallelizable So we focused on making BDD operations executing in parallel with each other 9 / 46 Implementation 10 / 46 Implementation - Basics Data Structures Shared BDD all the BDDs in an application can be considered subgraphs of a single graph, called shared BDD; a particular BDD is selected by starting evaluation from its root node Unique table contains all the nodes of the shared BDD, without duplicates—implemented as a hash table Computed table contains results of recent BDD computations (a cache) 11 / 46 x1 AND x2 x1 OR x2 x1 x1 x2 0 0 1 2 3 4 M M 2 1 1 1 -1 -1 0 0 2 -1 -1 1 2 1 ... ... ... ... ... Implementation - Basics Resize and Garbage Collection When there is no room to store a node in the unique table, it must be extended. At the same time, nodes not referenced by BDD objects can be removed by a node garbage collection (GC) Usually a counter of incoming references is used—a node with 0 references is dead We implemented a mark & sweep garbage collection, instead: removing all nodes not reachable from BDD roots (cleaner implementation) 12 / 46 Implementation - Concurrency The BeeDeeDee Library We wanted to minimize memory footprint, by sharing BDD memory in order to have no duplicate subgraphs In other words, all threads in an application should use a single Unique Table This object must be accessed safely: we need to synchronize accesses to the UT 13 / 46 Implementation - Concurrency As our first attempt, we tried to synchronize accesses at a high level But this made computation virtually sequential So we worked at a lower level, trying to iteratively reduce thread contention as much as possible 14 / 46 Implementation - Concurrency We used two strategies: pools of locks evenly distributed between threads (implemented with arrays) optimistic behaviors such as using a lock only if a condition were (or were not) met 15 / 46 Implementation - Concurrency x1:BDD Thread1 :UT x3:BDD Thread2 or(x2) and(x6) x1 ∨ x2 x3 ∧ x6 Several BDD operations can use the library at the same time, but. . . . . . they may need to request a node from the UT 16 / 46 Implementation - Concurrency x1:BDD Thread1 :UT x3:BDD or(x2) Thread2 and(x6) get(1,4,1) get(3,8,1) node index node index x1 ∨ x2 x3 ∧ x6 UT.get(var, low, high) searches for a node with the given variable number, low and high children The node is added if it is not in the UT 17 / 46 Implementation - Concurrency Invocations of get(...) can run in parallel, but. . . . . . some operations can change the hash value (Resize) or the position (GC) of a node stored in the UT! Resize increases the size, used to compute the hash value GC compacts the table, overwriting rows representing dead nodes So, get(...) must not run together with these operations 18 / 46 Implementation - Concurrency GC x1:BDD Thread1 :UT or(x2) x1 ∨ x2 Each BDD operation is mutually exclusive wrt GC 19 / 46 Implementation - Concurrency GC x1:BDD Thread1 :UT or(x2) getGCLock() lock for GC x1 ∨ x2 Before using the table, every BDD operation synchronizes on a lock taken from a pool 20 / 46 Implementation - Concurrency GC Low probability of contention: each thread acquires a lock contiguous to the last acquired lock Before starting, GC has to acquire all locks get(...) is called only in the context of a BDD operation, so we can prove that it cannot run in parallel with GC 21 / 46 Implementation - Concurrency Resize For Resize, we adopted a finer synchronization get(...) first checks if the node already exists in the table This happens without synchronization—an optimistic behavior Indeed, the hash value of the node might have changed by Resize, but if it did not, we can find the node at the same position 22 / 46 Implementation - Concurrency Resize If the node does not exist, get(...) synchronizes on a lock taken from a pool—implemented with an array—in a position dependent from the hash value of the node—and adds it 23 / 46 Implementation - Concurrency Resize x1:BDD :UT get(1,4,1) node index Let’s zoom in a bit on a call of get(...). . . 24 / 46 Implementation - Concurrency Resize x1:BDD :UT get(1,4,1) getResizeLock() lock for Resize node index Before extending the table, get(...) synchronizes on a lock taken from a pool 25 / 46 Implementation - Concurrency Resize Low probability of contention: each lock protects a different area of the unique table Before starting, Resize has to acquire all locks 26 / 46 Implementation - Concurrency Caches A similar pattern is used by cache accesses (get/put) We implemented caches to improve performance of these operations: Boolean operations Quantification Restriction Variable replacement Optimistic behavior: synchronization (lock dependent on position) occurs only when get finds a result in the expected position put (ComputationCache only) does not find a result in the expected position 27 / 46 Implementation - Concurrency Hash Update After GC compacted the UT or Resize changed UT’s size, hash values of nodes need to be recomputed For big tables, this can take a significant amount of time So, the recomputation is done in parallel by more threads, each acting on a different region of the UT Here, too, a pool of locks is used to reduce contention 28 / 46 Implementation - Concurrency We verified experimentally that using pools distributes locks evenly between threads, leading to very low thread contention 29 / 46 Benchmark Programs 30 / 46 Benchmark Programs 12-queens problem Often used for BDD packages comparison Goal: to find a placement of 12 queens on a 12x12 chessboard, so that no two queens attack each other It involves the construction of a big BDD representing constraints on queens’ position, using simple Boolean operations 31 / 46 Benchmark Programs Transition relation BDD construction for sequential circuits It uses Boolean operations to combine current state variables V with next state variables V 0 The relation is represented by the function N(V , V 0 ) = 0 (v00 ⇔ f0 (V )) ∧ . . . ∧ (vn−1 ⇔ fn−1 (V )), binding n new variables to n old variables 32 / 46 Inputs Internal Inputs Combinational Logic Memory Outputs Internal Outputs Benchmark Programs Information flow analysis of the JFlex JAR file Statements in a Java program are encoded as BDDs and combined in bigger BDDs In contrast with the other benchmarks, flow analysis uses replacement and quantification along with Boolean operations 33 / 46 Parallel Benchmarks Time 34 / 46 Parallel Benchmarks - Time Parallel 12-queens BDD construction on an Intel i7 quad-core with hyper-threading Time (s) 25 20 15 1 35 / 46 2 3 4 5 6 7 Number of 12-queens instances 8 Parallel Benchmarks - Time Time (s) Julia analyzer, parallel flow analyses, 1-4 sources of information 100 80 60 40 20 BeeDeeDee JavaBDD JDD 1 2 3 Number of sources 4 For BeeDeeDee, the same table is used concurrently by all threads For others, each thread uses its own table JDD is not optimized for variable replacement and quantification 36 / 46 Parallel Benchmarks Memory 37 / 46 Parallel Benchmarks - Memory 4 parallel instances of 12-queens—Table size 1,000 40 20 0 Megabytes Millions of nodes 60 14 56 500 0 336 1,120 e e eeDeJavaBDD eeDeJavaBDD BeeD BeeD Perfect data duplication: each instance constructs exactly the same BDD 38 / 46 Parallel Benchmarks - Memory 4 parallel flow analyses—Memory traces 12 12 10 Memory (Gigabytes) Memory (Gigabytes) 11.5 GB 9.5 GB 10 8 6 4 2 0 6 4 2 0 50 100 150 200 Time (s) BeeDeeDee 39 / 46 8 250 300 50 100 150 Time (s) JavaBDD 200 250 300 Parallel Benchmarks - Memory 4 parallel flow analyses—Memory footprint of BDD data structures Millions of nodes 6 Megabytes 100 4 2 2.2 0 5.5 Dee vaBDD Ja ee BeeD 40 / 46 50 0 52.8 110 e eeDeJavaBDD BeeD Parallel Benchmarks The concurrent use of BeeDeeDee does not add much time overhead, and decreases memory usage But how does it compare to other BDD libraries in purely sequential problems? 41 / 46 Sequential Benchmarks 42 / 46 Sequential Benchmarks 12-queens BDD construction 31 BeeDeeDee JavaBDD JDD CUDD BuDDy 39 33 34 30 0 43 / 46 5 10 15 20 25 30 35 40 Sequential Benchmarks Transition relation BDD construction for three ITC99 circuits JDD JavaBDD BeeDeeDee 3.59 3.89 5.59 b22-1 26.71 b22 32.24 31.12 1.9 2.04 4.04 b20 0 44 / 46 5 10 15 20 25 30 35 40 Sequential Benchmarks Even for single-threaded applications, performance is very good, at the same level of other libraries (sometimes faster, sometimes slower) It seems that we succeeded in developing a fast sequential BDD library! 45 / 46 Thank you Our library is free for non-commercial applications (GPLv3) Commercial license available It can be downloaded in source and binary formats from http://www.juliasoft.com/beedeedee Try the Queens! Example, for 3 parallel instances of 12-queens type time java -jar beedeedee.jar 12 12 12 Warning: battery killer! [≤ 10]-queens are fast, so you can try e.g. time java -jar beedeedee.jar 7 8 9 10 46 / 46
© Copyright 2025 ExpyDoc