Re-Visiting Power Measurement for the Green500 Thomas R. W. Scogland (LLNL/CASC, Green500) EE HPC Working Group http://eehpcwg.lbl.gov/ New Considerations for the Level 1 Measurement Methodology 1 The Green500 List and its Continuing Evolution BoF, November 2014 http://www.green500.org Level 1 Requirements • Workload phase: Measure at least 20% of the middle 80% of the core phase • Machine fraction: Measure at least 1/64th of the system or 1kW, whichever is greater • Subsystems measured: Measure the compute components, network, storage and other subsystems are not required 2 EE HPC Working Group http://eehpcwg.lbl.gov/ New Considerations for the Level 1 Measurement Methodology The Green500 List and its Continuing Evolution BoF, November 2014 http://www.green500.org Workload Phase: A classic HPL Profile 400 Power (kW) 350 Nearly flat, except… 300 Job launch Job cleanup 250 0 EE HPC Working Group http://eehpcwg.lbl.gov/ 10000 20000 3 Time from start (seconds) New Considerations for the Level 1 Measurement Methodology The Green500 List and its Continuing Evolution BoF, November 2014 http://www.green500.org The Core Phase • The time period under test • Possible core phases: • Job scheduling -> Job completion • Application start -> application end • Benchmark start -> benchmark end • Any is valid, so long as it matches your other metrics 4 EE HPC Working Group http://eehpcwg.lbl.gov/ New Considerations for the Level 1 Measurement Methodology The Green500 List and its Continuing Evolution BoF, November 2014 http://www.green500.org The Core Phase: Linpack Example Segment Core Startup Tear−down 400 Power (kW) 350 Core phase cuts off most of the cruft 300 250 EE HPC Working Group http://eehpcwg.lbl.gov/ 0 10000 20000 5 Time from start (seconds) New Considerations for the Level 1 Measurement Methodology The Green500 List and its Continuing Evolution BoF, November 2014 http://www.green500.org What do we require now? 6 EE HPC Working Group http://eehpcwg.lbl.gov/ New Considerations for the Level 1 Measurement Methodology The Green500 List and its Continuing Evolution BoF, November 2014 http://www.green500.org Workload Timing by Measurement Level Segment Core Startup Tear−down 400 Level 1 Power (kW) 350 20% Level 2: evenly spaced average measurements 300 Level 3: Continuously integrated energy 250 0 EE HPC Working Group http://eehpcwg.lbl.gov/ 10000 20000 7 Time from start (seconds) New Considerations for the Level 1 Measurement Methodology The Green500 List and its Continuing Evolution BoF, November 2014 http://www.green500.org Power Variability Segment Core Startup Tear−down 400 First 20%: 398.1 Power (kW) 350 300 Core phase average: 398.7 250 0 EE HPC Working Group http://eehpcwg.lbl.gov/ Last 20%: 398.2 10000 20000 8 Time from start (seconds) New Considerations for the Level 1 Measurement Methodology The Green500 List and its Continuing Evolution BoF, November 2014 http://www.green500.org Why Change the Requirement? 9 EE HPC Working Group http://eehpcwg.lbl.gov/ New Considerations for the Level 1 Measurement Methodology The Green500 List and its Continuing Evolution BoF, November 2014 http://www.green500.org Newer system designs have a different pattern. 10 EE HPC Working Group http://eehpcwg.lbl.gov/ New Considerations for the Level 1 Measurement Methodology The Green500 List and its Continuing Evolution BoF, November 2014 http://www.green500.org Piz Daint (GPU accelerated) Linpack Profile Segment Core Startup Tear−down Power (kW) 800 Tail-off is much longer 600 400 0 EE HPC Working Group http://eehpcwg.lbl.gov/ 2000 4000 6000 8000 11 Time from start (seconds) New Considerations for the Level 1 Measurement Methodology The Green500 List and its Continuing Evolution BoF, November 2014 http://www.green500.org Core Phase Averaged for Piz Daint Segment Core Startup Tear−down Power (kW) 800 First 20%: 873.8 600 Last 20%: 698.4 Core phase average: 833.4 400 200 0 EE HPC Working Group http://eehpcwg.lbl.gov/ 2000 4000 6000 8000 12 Time from start (seconds) New Considerations for the Level 1 Measurement Methodology The Green500 List and its Continuing Evolution BoF, November 2014 http://www.green500.org Core Phase Averaged for Piz Daint Segment Core Startup Tear−down Power (kW) 800 25% Lower average power in the last 20%! 600 400 200 0 EE HPC Working Group http://eehpcwg.lbl.gov/ 2000 4000 6000 8000 13 Time from start (seconds) New Considerations for the Level 1 Measurement Methodology The Green500 List and its Continuing Evolution BoF, November 2014 http://www.green500.org What do we propose? 14 EE HPC Working Group http://eehpcwg.lbl.gov/ New Considerations for the Level 1 Measurement Methodology The Green500 List and its Continuing Evolution BoF, November 2014 http://www.green500.org Workload Timing by Measurement Level Segment Core Startup Tear−down 400 100% Level 1 Power (kW) 350 Level 2: evenly spaced average measurements 300 Level 3: Continuously integrated energy 250 0 EE HPC Working Group http://eehpcwg.lbl.gov/ 10000 20000 15 Time from start (seconds) New Considerations for the Level 1 Measurement Methodology The Green500 List and its Continuing Evolution BoF, November 2014 http://www.green500.org Measurement Fraction • Level 1 requires 1/64th of the machine • Which 64th of the machine? 16 EE HPC Working Group http://eehpcwg.lbl.gov/ New Considerations for the Level 1 Measurement Methodology The Green500 List and its Continuing Evolution BoF, November 2014 http://www.green500.org Variability Across Levels: SuperMUC Quality Level L1 (compute only) L2 (>10kW) (compute and interconnect) L2 (>1/8) (compute and interconnect) L3 (compute, interconnect, storage, cooling, power distribution) Mflops/Watt full run Efficiency Drop From Level 1 1055 0 1011 44 (~4%) 994 61 (~6%) 887 168 (~16%) 17 EE HPC Working Group http://eehpcwg.lbl.gov/ A Power-Measurement Methodology for Large-Scale, High-Performance Computing, International Conference on Performance Engineering, March 2014 http://www.green500.org Subsystem Contribution • Networks have been considered “in the noise” by Level 1 to this point • We have increasing reports of the network contributing 10-20% of overall power use 18 EE HPC Working Group http://eehpcwg.lbl.gov/ A Power-Measurement Methodology for Large-Scale, High-Performance Computing, International Conference on Performance Engineering, March 2014 http://www.green500.org Conclusions • Our current requirements for level 1 are no longer sufficient • We propose raising the requirements of Level 1: • Measurement phase: 100% of the core phase • System fraction: 1/16th or more • Subsystems included: Compute and networking 19 EE HPC Working Group http://eehpcwg.lbl.gov/ New Considerations for the Level 1 Measurement Methodology The Green500 List and its Continuing Evolution BoF, November 2014 http://www.green500.org
© Copyright 2025 ExpyDoc