Experience in Testing Compiler Optimizers Using Comparison Checking Masataka Sassa and Daijiro Sudo Dept. of Mathematical and Computing Sciences Tokyo Institute of Technology PLC '06 Background Compiler Optimization Improves time efficiency of object code Optimization is a program transformation Should not change the meaning of programs Algorithm error Implementation error Testing Optimizers Incorrect optimizations cause different output Difficult to identify the place of incorrect optimization Even incorrect optimizations may not change the output (seems correct at first glance) It will cause incorrect result in other untested programs PLC '06 Contribution Systematic test whether the optimizations are correct Compare the programs before and after optimization Not the proof of optimizers Confirm that the optimizers are correct in fine granularity and with relatively high reliability Values of all the left-hand sides of assignment statements, etc. are output to trace file Compare the trace file In case of error, error messages are given together with the intermediate code shown in C-style, which makes finding the cause of errors easily We found four bugs through experiments, including two unknown bugs Time for testing may be long, but we regard it as not important PLC '06 Testing Optimizers through Comparison Checking First proposed by Jaramillo [2002], but no enough experience Our paper focuses on its experience in a real development of an optimizing compiler COINS (http://coins-project.org, around 230 kloc) to test its SSA optimizers PLC '06 Outline of the method 1. Duplicate intermediate code (no. of optimization + 1) source front-end intermediate code copy interm code interm code interm code interm code interm code loop transformation loop transformation loop transformation loop transformation to SSA to SSA to SSA to SSA no opt optimize1 optimize1/2 optimize1/2/3 loop analysis loop analysis loop analysis loop analysis insert trace output insert trace output insert trace output insert trace output SSA back-trans SSA back-trans SSA back-trans SSA back-trans object code object code object code object code run run run run var info var info var info var info comparison check display error comparison check display error PLC '06 comparison check display error Static Single Assignment (SSA) Form Definitions of variables are unique in the program text. Each variable definition is distinguished by attaching a suffix. Introduce the -function at the join point where different definitions of a variable reach. a:= a:=x+y a:=a+3 b:=x+y normal form a1:=x0+y0 a2:=a1+3 b1:=x0+y0 SSA form a:= :=a normal form PLC '06 a1:= a2:= a3:= (a1,a2) :=a3 SSA form 2. Loop transformation on natural loops L0 L0 a=10 b=20 c=0 d=0 a=10 b=20 c=0 d=0 L1 L1 a>5 a>5 L2 L2 b>10 b>10 L3 L3 c=a+b c=3 d=a+b a=0 c=a+b c=3 d=a+b a=0 L4 a=1 L4 a=1 Lt L5 printf(c) L5 printf(c) PLC '06 3. SSA translation L0 a_0 =10 b_0=20 c_0 =0 d_0=0 L0 a=10 b=20 c=0 d=0 L1 a_1=(a_0,a_4) c_1=(c_0,c_4) d_1=(d_0,d_3) a_1>5 L1 a>5 L2 L2 b_0 >10 b>10 L3 L3 c=a+b c=3 d=a+b a=0 c_2=a_1+b_0 c_3=3 d_2=a_1+b_0 a_2=0 L4 a=1 L4 a_3=1 Lt Lt a_4= (a_2,a_3) c_4= (c_3,c_1) d_3= (d_2,d_1) L5 printf(c) L5 PLC '06 printf(c_1) 4. SSA form optimization (1) Loop invariant code motion L0 L0 a_0=10 b_0=20 c_0=0 d_0=0 c_3=3 a_2=0 a_3=1 a_0 =10 b_0=20 c_0 =0 d_0=0 L1 a_1= (a_0,a_4) c_1= (c_0,c_4) d_1= (d_0,d_3) a_1>5 L1 a_1= (a_0,a_4) c_1= (c_0,c_4) d_1= (d_0,d_3) a_1>5 L2 b_0 >10 L2 L3 c_2=a_1+b_0 c_3=3 d_2=a_1+b_0 a_2=0 b_0>10 L4 L3 a_3=1 c_2=a_1+b_0 d_2=a_1+b_0 Lt L4 Lt a_4= (a_2,a_3) c_4= (c_3,c_1) d_3= (d_2,d_1) a_4= (a_2,a_3) c_4= (c_3,c_1) d_3= (d_2,d_1) L5 L5 printf(c_1) PLC '06 printf(c_1) 4. SSA form optimization (2) Common subexpression elimination L0 L0 a_0=10 b_0=20 c_0=0 d_0=0 c_3=3 a_2=0 a_3=1 a_0=10 b_0=20 c_0=0 d_0=0 c_3=3 a_2=0 a_3=1 L1 L1 a_1= (a_0,a_4) c_1= (c_0,c_4) d_1= (d_0,d_3) a_1>5 a_1= (a_0,a_4) c_1= (c_0,c_4) d_1= (d_0,d_3) a_1>5 L2 L2 b_0>10 L3 c_2=a_1+b_0 d_2=a_1+b_0 b_0>10 L3 L4 c_2=a_1+b_0 d_2=c_2 Lt L4 Lt a_4= (a_2,a_3) c_4= (c_3,c_1) d_3= (d_2,d_1) a_4= (a_2,a_3) c_4= (c_3,c_1) d_3= (d_2,d_1) L5 L5 printf(c_1) PLC '06 printf(c_1) 4. SSA form optimization (3) Dead code elimination L0 L0 a_0=10 b_0=20 c_0=0 d_0=0 c_3=3 a_2=0 a_3=1 a_0=10 b_0=20 c_0=0 c_3=3 a_2=0 a_3=1 L1 L1 a_1= (a_0,a_4) c_1= (c_0,c_4) d_1= (d_0,d_3) a_1>5 a_1= (a_0,a_4) c_1= (c_0,c_4) a_1>5 L2 L2 b_0>10 L3 c_2=a_1+b_0 d_2=c_2 b_0>10 L3 L4 Lt L4 Lt a_4= (a_2,a_3) c_4= (c_3,c_1) d_3= (d_2,d_1) a_4= (a_2,a_3) c_4= (c_3,c_1) L5 L5 printf(c_1) PLC '06 printf(c_1) Outline of the method (cont.) 5. Loop analysis 6. Insertion of statements for outputting trace The value of the left-hand side variable after each assignment statement The values of both sides of relational operator before each conditional expression The values of parameters before and after each function call PLC '06 Outline of the method (cont.) 7. SSA back-translation and C-style program output ◆ Output intermediate code (in SSA form) in a C-style program 8. Execution Give each duplicated compiler the same input and execute PLC '06 8. Execution source front-end intermediate code copy interm code interm code interm code interm code interm code loop transformation loop transformation loop transformation loop transformation to SSA to SSA to SSA to SSA no opt optimize1 optimize1/2 optimize1/2/3 loop analysis loop analysis loop analysis loop analysis insert trace output insert trace output insert trace output insert trace output SSA back-trans SSA back-trans SSA back-trans SSA back-trans object code object code object code object code run run run run var info var info var info var info comparison check display error comparison check display error PLC '06 comparison check display error Example of trace file 0_1:a_0:10 0_2:b_0:20 0_3:c_0:0 0_4:d_0:0 5_1:CALL_printf_1_1_bef:3 (i) basic block number (ii) instruction number in the basic block (iii) variable name, parameters etc. (iv) value (of variable name etc.). 5_1:CALL_printf_1_1_aft:3 PLC '06 Output of intermediate code in C-style (part) PLC '06 9. Comparison Checking Result of trace (Error messages are output if the values do not match) Loop Lv.0 Loop Lv.0 0_1:a_0:10 0_2:b_0:20 0_3:c_0:0 0_4:d_0:0 5_1:CALL_printf_1_1_bef:3 5_1:CALL_printf_1_1_aft:3 Loop Lv.1 <L0> 1_1:a_1:10 1_2:c_1:0 1_3:d_1:0 1_4:JUMP_l:10 1_4:JUMP_r:5 2_1:JUMP_l:20 2_1:JUMP_r:10 3_1:c_2:30 3_2:c_3:3 3_3:d_2:30 3_4:a_2:0 t_1:a_4:0 t_2:c_4:3 t_3:d_3:30 1_1:a_1:0 1_2:c_1:3 1_3:d_1:30 1_4:JUMP_l:0 1_4:JUMP_r:5 </L0> (a) no optimization Loop Lv.0 0_1:a_0:10 0_2:b_0:20 0_3:c_0:0 0_4:d_0:0 0_5:c_3:3 0_6:a_2:0 0_7:a_3:1 5_1:CALL_printf_1_1_bef:3 5_1:CALL_printf_1_1_aft:3 Loop Lv.0 0_1:a_0:10 0_2:b_0:20 0_3:c_0:0 0_4:d_0:0 0_5:c_3:3 0_6:a_2:0 0_7:a_3:1 5_1:CALL_printf_1_1_bef:3 5_1:CALL_printf_1_1_aft:3 Loop Lv.1 Loop Lv.1 Loop Lv.1 <L0> 1_1:a_1:10 1_2:c_1:0 1_3:d_1:0 1_4:JUMP_l:10 1_4:JUMP_r:5 2_1:JUMP_l:20 2_1:JUMP_r:10 3_1:c_2:30 3_2:d_2:30 t_1:a_4:0 t_2:c_4:3 t_3:d_3:30 1_1:a_1:0 1_2:c_1:3 1_3:d_1:30 1_4:JUMP_l:0 1_4:JUMP_r:5 </L0> (b) loop invariant code motion 0_1:a_0:10 0_2:b_0:20 0_3:c_0:0 0_4:c_3:3 0_5:a_2:0 0_6:a_3:1 5_1:CALL_printf_1_1_bef:3 5_1:CALL_printf_1_1_aft:3 <L0> 1_1:a_1:10 1_2:c_1:0 1_3:d_1:0 1_4:JUMP_l:10 1_4:JUMP_r:5 2_1:JUMP_l:20 2_1:JUMP_r:10 3_1:c_2:30 3_2:d_2:30 t_1:a_4:0 t_2:c_4:3 t_3:d_3:30 1_1:a_1:0 1_2:c_1:3 1_3:d_1:30 1_4:JUMP_l:0 1_4:JUMP_r:5 </L0> PLC '06 (c) common subexpresssion elimination <L0> 1_1:a_1:10 1_2:c_1:0 1_4:JUMP_l:10 1_4:JUMP_r:5 2_1:JUMP_l:20 2_1:JUMP_r:10 t_1:a_4:0 t_2:c_4:3 1_1:a_1:0 1_2:c_1:3 1_3:JUMP_l:0 1_3:JUMP_r:5 </L0> (d) dead code elimination How to compare instructions moved by code motion optimization In loop invariant code motion and partial redundancy elimination, code is moved. The system compares instructions moved out of the loop after optimization with all instructions in the corresponding loop before optimization, by looking at the tags made by the loop analysis Ex: in comparing figures (a) and (b), c_3 and a_2 are moved out of the loop PLC '06 9. Comparison Checking Result of trace (Error messages are output if the values do not match) Loop Lv.0 Loop Lv.0 0_1:a_0:10 0_2:b_0:20 0_3:c_0:0 0_4:d_0:0 5_1:CALL_printf_1_1_bef:3 5_1:CALL_printf_1_1_aft:3 Loop Lv.1 <L0> 1_1:a_1:10 1_2:c_1:0 1_3:d_1:0 1_4:JUMP_l:10 1_4:JUMP_r:5 2_1:JUMP_l:20 2_1:JUMP_r:10 3_1:c_2:30 3_2:c_3:3 3_3:d_2:30 3_4:a_2:0 t_1:a_4:0 t_2:c_4:3 t_3:d_3:30 1_1:a_1:0 1_2:c_1:3 1_3:d_1:30 1_4:JUMP_l:0 1_4:JUMP_r:5 </L0> (a) no optimization Loop Lv.0 0_1:a_0:10 0_2:b_0:20 0_3:c_0:0 0_4:d_0:0 0_5:c_3:3 0_6:a_2:0 0_7:a_3:1 5_1:CALL_printf_1_1_bef:3 5_1:CALL_printf_1_1_aft:3 Loop Lv.0 0_1:a_0:10 0_2:b_0:20 0_3:c_0:0 0_4:d_0:0 0_5:c_3:3 0_6:a_2:0 0_7:a_3:1 5_1:CALL_printf_1_1_bef:3 5_1:CALL_printf_1_1_aft:3 Loop Lv.1 Loop Lv.1 Loop Lv.1 <L0> 1_1:a_1:10 1_2:c_1:0 1_3:d_1:0 1_4:JUMP_l:10 1_4:JUMP_r:5 2_1:JUMP_l:20 2_1:JUMP_r:10 3_1:c_2:30 3_2:d_2:30 t_1:a_4:0 t_2:c_4:3 t_3:d_3:30 1_1:a_1:0 1_2:c_1:3 1_3:d_1:30 1_4:JUMP_l:0 1_4:JUMP_r:5 </L0> (b) loop invariant code motion 0_1:a_0:10 0_2:b_0:20 0_3:c_0:0 0_4:c_3:3 0_5:a_2:0 0_6:a_3:1 5_1:CALL_printf_1_1_bef:3 5_1:CALL_printf_1_1_aft:3 <L0> 1_1:a_1:10 1_2:c_1:0 1_3:d_1:0 1_4:JUMP_l:10 1_4:JUMP_r:5 2_1:JUMP_l:20 2_1:JUMP_r:10 3_1:c_2:30 3_2:d_2:30 t_1:a_4:0 t_2:c_4:3 t_3:d_3:30 1_1:a_1:0 1_2:c_1:3 1_3:d_1:30 1_4:JUMP_l:0 1_4:JUMP_r:5 </L0> PLC '06 (c) common subexpresssion elimination <L0> 1_1:a_1:10 1_2:c_1:0 1_4:JUMP_l:10 1_4:JUMP_r:5 2_1:JUMP_l:20 2_1:JUMP_r:10 t_1:a_4:0 t_2:c_4:3 1_1:a_1:0 1_2:c_1:3 1_3:JUMP_l:0 1_3:JUMP_r:5 </L0> (d) dead code elimination How to compare instructions deleted or added by optimization We do not compare variables deleted or added by optimization. We compare only those variables that retain correspondence. Ex: in comparing figures (c) and (d), d_0, d_1, d_2, d_3, c_2 are deleted by dead code elimination, and are not compared. PLC '06 9. Comparison Checking Result of trace (Error messages are output if the values do not match) Loop Lv.0 Loop Lv.0 0_1:a_0:10 0_2:b_0:20 0_3:c_0:0 0_4:d_0:0 5_1:CALL_printf_1_1_bef:3 5_1:CALL_printf_1_1_aft:3 Loop Lv.1 <L0> 1_1:a_1:10 1_2:c_1:0 1_3:d_1:0 1_4:JUMP_l:10 1_4:JUMP_r:5 2_1:JUMP_l:20 2_1:JUMP_r:10 3_1:c_2:30 3_2:c_3:3 3_3:d_2:30 3_4:a_2:0 t_1:a_4:0 t_2:c_4:3 t_3:d_3:30 1_1:a_1:0 1_2:c_1:3 1_3:d_1:30 1_4:JUMP_l:0 1_4:JUMP_r:5 </L0> (a) no optimization Loop Lv.0 0_1:a_0:10 0_2:b_0:20 0_3:c_0:0 0_4:d_0:0 0_5:c_3:3 0_6:a_2:0 0_7:a_3:1 5_1:CALL_printf_1_1_bef:3 5_1:CALL_printf_1_1_aft:3 Loop Lv.0 0_1:a_0:10 0_2:b_0:20 0_3:c_0:0 0_4:d_0:0 0_5:c_3:3 0_6:a_2:0 0_7:a_3:1 5_1:CALL_printf_1_1_bef:3 5_1:CALL_printf_1_1_aft:3 Loop Lv.1 Loop Lv.1 Loop Lv.1 <L0> 1_1:a_1:10 1_2:c_1:0 1_3:d_1:0 1_4:JUMP_l:10 1_4:JUMP_r:5 2_1:JUMP_l:20 2_1:JUMP_r:10 3_1:c_2:30 3_2:d_2:30 t_1:a_4:0 t_2:c_4:3 t_3:d_3:30 1_1:a_1:0 1_2:c_1:3 1_3:d_1:30 1_4:JUMP_l:0 1_4:JUMP_r:5 </L0> (b) loop invariant code motion 0_1:a_0:10 0_2:b_0:20 0_3:c_0:0 0_4:c_3:3 0_5:a_2:0 0_6:a_3:1 5_1:CALL_printf_1_1_bef:3 5_1:CALL_printf_1_1_aft:3 <L0> 1_1:a_1:10 1_2:c_1:0 1_3:d_1:0 1_4:JUMP_l:10 1_4:JUMP_r:5 2_1:JUMP_l:20 2_1:JUMP_r:10 3_1:c_2:30 3_2:d_2:30 t_1:a_4:0 t_2:c_4:3 t_3:d_3:30 1_1:a_1:0 1_2:c_1:3 1_3:d_1:30 1_4:JUMP_l:0 1_4:JUMP_r:5 </L0> PLC '06 (c) common subexpresssion elimination <L0> 1_1:a_1:10 1_2:c_1:0 1_4:JUMP_l:10 1_4:JUMP_r:5 2_1:JUMP_l:20 2_1:JUMP_r:10 t_1:a_4:0 t_2:c_4:3 1_1:a_1:0 1_2:c_1:3 1_3:JUMP_l:0 1_3:JUMP_r:5 </L0> (d) dead code elimination Example trace output with errors 0_1:a:1 0_2:t_0:1 0_3:b:3 0_4:t_1:3 0_5:t_2:1 0_6:t_3:0 0_7:t_4:0 0_8:c:3 0_9:t_5:1 0_10:t_6:3 0_11:t_7:3 0_12:t_8:6 0_13:t_9:1 0_14:t_10:6 0_15:x:7 before optimization 0_1:a:1 0_2:t_0:1 0_3:b:3 0_4:t_1:3 0_5:t_2:1 0_6:t_3:0 0_7:t_4:0 0_8:c:3 0_9:t_5:1 0_10:t_6:3 0_11:t_7:0 0_12:t_8:3 0_13:t_9:1 0_14:t_10:3 0_15:x:4 Errors in t_7, t_8, t_10, x after optimization (CSEQP) PLC '06 Example of error message -----NG!----opt::LoopLevel=0,BlkID=0,InstrNumber=11,t_7:0 unopt::LoopLevel=0,BlkID=0,InstrNumber=11,t_7:3 -----------------NG!----opt::LoopLevel=0,BlkID=0,InstrNumber=12,t_8:3 unopt::LoopLevel=0,BlkID=0,InstrNumber=12,t_8:6 -----------------NG!----opt::LoopLevel=0,BlkID=0,InstrNumber=14,t_10:3 unopt::LoopLevel=0,BlkID=0,InstrNumber=14,t_10:6 -----------------NG!----opt::LoopLevel=0,BlkID=0,InstrNumber=15,x:4 unopt::LoopLevel=0,BlkID=0,InstrNumber=15, x:7 ------------- PLC '06 Experiments We used the COINS C compiler. We implemented our system in Java. Approximately 5000 lines were added, of which 2000 lines are for comparison checker. PLC '06 Tested optimizations Tested optimizations are: copy propagation, constant propagation, common subexpression elimination, CSEQP, partial redundancy elimination, loop invariant code motion, strength reduction and test replacement of induction variables, dead code elimination, removal of redundant -functions, and removal of empty basic blocks The test programs used are: 700 small test programs 181.mcf from SPEC CINT2000. PLC '06 Results We found four bugs in the COINS' SSA optimizers: Two for PRE (partial redundancy elimination) (unknown bugs) One for CSEQP (common subexpression elimination based on question propagation) One for operator strength reduction and test replacement of induction variable PLC '06 A bug in partial redundancy elimination PLC '06 False alarm False alarm is a situation where comparison checking displays errors although the optimization is correct. Major cause: Values representing addresses that are dynamically allocated at runtime. They may change if optimizations are applied. Errors concerning source level integers usually have a small number of digits. We provided an option to let the user specify the limiting number of digits for checking integer values. By specifying this limit as seven digits, the ratio of false alarm becomes approximately 5%. In practice, values of addresses that are a false alarm can be easily recognized. PLC '06 Conclusions Experience in testing compiler optimizers using comparison checking Could find bugs in optimizers Almost all program values before and after optimizations are output as a trace and checked Applied to the SSA optimizers of a real project COINS Four bugs of optimizers including two unknown bugs The false alarm can be made as low as approx. 5% Can find erroneous optimization point easily Error message indicates the critical point Intermediate code in C-style helps locate the point Quite beneficial in practice PLC '06 PLC '06 Difference with Jaramillo’s work 1. Alternate execution is not used and no breakpoints are used 2. Trace files are generated 3. The intermediate code is displayed in C-style program in the case of error 4. False alarms by pointers are well managed 5. SSA form is used 6. Other implementation issues, such as handling of natural loops, checking conditional expressions, and handling of each type of optimization are described PLC '06 Related work Method by Necula et al. Validation using symbolic execution Found bugs of gcc2.7.2.2 Method by Zuck, Pnueli et al. loop expansion, register allocation Make correspondence to computation model Confirm using validation tool Method by Okuma and Minamide Prove definition of each pass using a theorem prover Generate compiler programs automatically PLC '06 Trace file A trace of the values of variables etc. called “Variable information” is taken at execution time before and after optimization (i) basic block number (ii) instruction number in the basic block (iii) variable name etc. (iv) value (of variable name etc.). Values of left-hand side of assignment statements Values of both sides of the comparison of conditional expressions Values of parameters of function calls PLC '06 _L2: ((int)(*(((int *)(unsigned char *)&(a))))) = ((int)( 1)); divexI32_0 = ((int)(((int)(*(((int *)(unsigned char *)&(a))))))); ((int)(*(((int *)(unsigned char *)&(b))))) = ((int)(((int)(divexI32_0 + 2)))); divexI32_1 = ((int)(((int)(*(((int *)(unsigned char *)&(b))))))); divexI32_2 = ((int)(((int)(*(((int *)(unsigned char *)&(a))))))); divexI32_3 = ((int)(((int)(*(((int *)(unsigned char *)&(c))))))); divexI32_4 = ((int)(((int)(divexI32_2 * divexI32_3)))); ((int)(*(((int *)(unsigned char *)&(c))))) = ((int)(((int)(divexI32_1 + divexI32_4)))); divexI32_5 = ((int)(((int)(*(((int *)(unsigned char *)&(a))))))); divexI32_6 = ((int)(((int)(*(((int *)(unsigned char *)&(b))))))); divexI32_7 = ((int)(((int)(*(((int *)(unsigned char *)&(c))))))); divexI32_8 = ((int)(((int)(divexI32_6 + divexI32_7)))); divexI32_9 = ((int)(((int)(*(((int *)(unsigned char *)&(a))))))); divexI32_10 = ((int)(((int)(divexI32_8 * divexI32_9)))); ((int)(*(((int *)(unsigned char *)&(x))))) = ((int)(((int)(divexI32_5 + divexI32_10)))); _L2: ((int)(*(((int *)(unsigned char *)&(a))))) = ((int)( 1)); divexI32_0 = ((int)(((int)(*(((int *)(unsigned char *)&(a))))))); ((int)(*(((int *)(unsigned char *)&(b))))) = ((int)(((int)(divexI32_0 + 2)))); divexI32_1 = ((int)(((int)(*(((int *)(unsigned char *)&(b))))))); divexI32_2 = ((int)(divexI32_0)); divexI32_3 = ((int)(((int)(*(((int *)(unsigned char *)&(c))))))); divexI32_4 = ((int)(((int)(divexI32_2 * divexI32_3)))); ((int)(*(((int *)(unsigned char *)&(c))))) = ((int)(((int)(divexI32_1 + divexI32_4)))); divexI32_5 = ((int)(divexI32_0)); divexI32_6 = ((int)(divexI32_1)); divexI32_7 = ((int)(divexI32_3)); divexI32_8 = ((int)(((int)(divexI32_6 + divexI32_7)))); divexI32_9 = ((int)(divexI32_0)); divexI32_10 = ((int)(((int)(divexI32_8 * divexI32_9)))); ((int)(*(((int *)(unsigned char *)&(x))))) = ((int)(((int)(divexI32_5 + divexI32_10)))); PLC '06
© Copyright 2024 ExpyDoc