How to select superinstructions for Ruby ZAKIROV Salikh*, CHIBA Shigeru*, and SHIBAYAMA Etsuya** * Tokyo Institute of Technology, dept. of Mathematical and Computing Sciences ** Tokyo University, Information Technology Center Ruby • Dynamic language • Becoming popular recently • Numeric benchmarks 100—1000 times slower than equivalent program in C Numeric benchmarks marked in red * http://shootout.alioth.debian.org/ 2 Interpreter optimization efforts • Many techniques to optimize interpreter were proposed – Threaded interpretation – Stack top caching – Pipelining – Superinstructions Focus of this presentation • Superinstructions – Merge code of operations executed in sequence 3 Superinstructions (contrived example) PUSH: // put <imm> argument on stack stack[sp++] = *pc++; goto **pc++; ADD: // add two topmost values on stack sp--; stack[sp-1] += stack[sp]; goto **pc++; Optimizations applied PUSH_ADD: // add <imm> to stack top stack[sp-1] += *pc++; goto **pc++; PUSH_ADD: // add <imm> to stack top stack[sp++] = *pc++; //goto **pc++; sp--; stack[sp-1] += stack[sp]; goto **pc++; Dispatch eliminated 4 Superinstructions (effects) • Effects 1. Reduce dispatch overhead a. Eliminate some jumps b. Provide more context for indirect branch predictorby replicating indirect jump instructions 2. Allow more optimizations within VM op 5 Good for reducing dispatch overhead Superinstructions help when: • VM operations are small (~10 hwop/vmop) • Dispatch overhead is high (~50%) Examples of successful use in prior research • ANSI C interpreter: 2-3 times improvement (Proebsting 1995) • Ocaml: more than 50% improvement (Piumarta 1998) • Forth: 20-80% improvement (Ertl 2003) 6 Ruby does not fit well Superinstructions help when: • VM operations are small (~10 hwop/vmop) • Dispatch overhead is high (~50%) Only 1-3% BUT misprediction overhead Hardware profiling data on Intel Core 2 Duo on interpreter dispatch 60-140 hardware ops per VM op 7 Superinstructions for Ruby • We experimentally evaluated effect of “naive” superinstructions on Ruby – Superinstructions are selected statically – Frequently occurring in training run combinations of length 2 selected as superinstructions – Training run uses the same benchmark – Superinstructions constructed by concatenating C source code, C compiler optimizations applied 8 Naive superinstructions effect on Ruby Limited benefit 4 benchmarks Normalized execution time Number of superinstructions used Unpredictable effects 9 Branch mispredictions 2 benchmarks: mandelbrot and spectral_norm Normalized execution time Number of superinstructions used 10 Branch mispredictions, reordered 2 benchmarks: mandelbrot and spectral_norm Normalized execution time Number of superinstructions used, reordered by execution time 11 So why Ruby is slow? • Profile of numeric benchmarks Garbage collection takes significant time Boxed floating point values dominate allocation 12 Floating point value boxing Typical Ruby 1.9 VM operation OPT_PLUS: New “box” object is allocated VALUE a = *(sp-2); on each operation VALUE b = *(sp-1); /* ... */ if (CLASS_OF(a) == Float && CLASS_OF(b) == Float) { sp--; *(sp-1) = NEW_FLOAT(DOUBLE_VALUE(a) + DOUBLE_VALUE(b)); } else { CALL(1/*argnum*/, PLUS, a); } goto **pc++; 13 Proposal: use superinstructions for boxing optimization • 2 operation per allocation instead of 1 Boxing of intermediate result eliminated OPT_MULT_OPT_PLUS: VALUE a = *(sp-3); VALUE b = *(sp-2); VALUE c = *(sp-1); /* ... */ if (CLASS_OF(a) == Float && CLASS_OF(b) == Float && CLASS_OF(c) == Float) { sp-=2; *(sp-1) = NEW_FLOAT(DOUBLE_VALUE(a) + DOUBLE_VALUE(b)*DOUBLE_VALUE(c)); } else { CALL(1/*argnum*/, MULT/*method*/, b/*receiver*/); CALL(1/*argnum*/, PLUS/*method*/, a/*receiver*/); } goto **pc++; 14 Implementation • VM operations that handle floating point values directly: – – – – – opt_plus opt_minus opt_mult opt_div opt_mod • We implemented all 25 combinations of length 2 – Based on Ruby 1.9.1 – Using existing Ruby infrastructure for superinstructions with some modifications 15 Limitations • Coding style-sensitive • Not applicable to other types (e.g. Fixnum, Bignum, String) – Fixnum is already unboxed – Bignum and String cannot be unboxed • Sequences of 3 arithmetic instructions or longer virtually non-existent – No occurrences in the benchmarks 16 Evaluation • Methodology – median time of 30 runs • Reduction in allocation 17 Results • Up to 22% benefit on numeric benchmarks • No slowdown on other benchmarks 18 Example: mandelbrot tweak • Slight modification produces 20% difference in performance + + ITER.times do tr = zrzr - zizi + cr tr = cr + (zrzr - zizi) ti = 2.0*zr*zi + ci ti = ci + 2.0*zr*zi Normalized execution time – 4 of 9 arithmetic instructions get merged into 2 superinstructions – 24% reduction in float allocation 19 Discussion of alternative approaches • Faster GC would improve performance as well – Superinstructions still apply, but with reduced benefit • Type inference – Would allow to specialize expressions and eliminate boxing – Interoperability with dynamic code is an issue • Dynamic specialization – Topic for further research 20 Related work: Tagged values • Use lower bits of pointers to trigger alternative handling • Embed floating point value into higher bits • Limited to 64-bit platforms, as Ruby uses double precision 64 bit floating point arithmetic – Our approach has same effect on 32 and 64 bit platforms • Allows to eliminate majority of boxed floats • Provides 28-35% benefit (on the same benchmarks) * Sasada 2008 21 Related work: Lazy boxing • Java-like language with generics over valuetypes • Boxing needed to avoid duplication of template instantiation code for primitive types • Lazy optimization works by allocating boxed objects in the stack frame, and moving to heap as needed • Relies on static compiler analysis for escape path detection, and runtime checks * Owen 2004 22 Related work:Superinstructions Superinstructions used for code compression – ANSI C hybrid compiler-interpreter – Trimedia code compression system • Superinstructions chosen statically to minimize code size * Proebsting 1995 * Hoogerbrugge 1999 Superinstructions used to reduce dispatch overhead – Forth, Ocaml • Superinstructions chosen dynamically * Piumarta 1998 * Ertl 2003 23 Conclusion • Naive approach to superinstructions does not produce substantial benefit for Ruby • Floating point values boxing overhead is a problem of Ruby • Superinstructions provide some help (up to 22%) Future work • Eliminate float boxing further – Specializing computation loop 24
© Copyright 2025 ExpyDoc