DI-MMAP: Data-Intensive Memory-map Runtime Motivation Workload: e.g. simulation + analysis Process Threads Process Network I/O Network I/O CPU load/store direct access PerMA Sim. + DI-MMAP DRAM Page Cache Hotpage FIFO Our goal is to enable scalable out-of-core computations for dataintensive computing by effectively integrating non-volatile random access memory into the HPC node’s memory architecture. We are making NVRAM effective for supporting large persistent data structures and for using DRAM-cached NVRAM as an extension to main memory. Overall, we are enabling latency-tolerant applications to smoothly transition a larger percentage of their working set out-ofcore and into persistent memory with minimal performance loss. is a hot page ? Primary FIFO Eviction Queue writeback page CPU read/write direct access load/store I/O page fault RRAM Read/Write I/O NVRAM Simulated NVRAM Next Gen. PCIe flash PCIe flash SSD flash DI-MMAP runtime & PerMA simulator system diagram 80000 K−mers per second 70000 4x faster than mmap 60000 50000 40000 standard−fs−mmap 20000 10000 0 0 50 100 150 200 250 Data-intensive high-performance computing applications have large data sets and large working sets. Furthermore, they tend to be memory bound due to either irregular memory access patterns, poor computation to communication ratio, or latency sensitive algorithms. Examples of data-intensive applications are the processing of massive real-world graphs, bioinformatics / computational biology, and in-situ VDA algorithms such as streamline tracing. To make applications work well with NVRAM it is important to tune them to optimize their data structures for page-level locality and have sufficient I/O concurrency. Additionally, adapting algorithms to be more latency tolerant is important for allowing computation and communication to overlap longer memory latencies. DI-MMAP runtime DI−MMAP 30000 Data-Intensive HPC Applications 300 350 Number of Threads Linux RHEL6 2.6.32 mmap vs. DI-MMAP using Livermore Metagenomic Analysis Toolkit (LMAT) with a 16 GiB DRAM buffer and 375 GiB persistent database. Linux mmap performance peaks at 16 threads and DI-MMAP scales past 200 threads. At peak performance DI-MMAP is 4x faster than Linux mmap. Only 23% slower with 50% less DRAM 7.44x faster than mmap Performance comparison of Linux RHEL6 2.6.32 mmap and DI-MMAP using HavoqGT library large-scale Breadth-First Search (BFS) Graph Analysis. DI-MMAP provides high performance and scalable out-of-core performance. Graph was R-MAT Scale 31 (146 GiB of vertex and edge data) and system was provisioned with 16 GiB DRAM for buffer cache + 24 GiB DRAM for algorithmic data. Our data-intensive memory-map runtime provides a high performance alternative to the standard Linux memory-map system. The performance of DI-MMAP scales up with increased application I/O concurrency and does not degrade under memory pressure. It is a loadable kernel module that provides a custom buffering scheme for NVRAM data. Additionally, is open source and integrated into the Simple Linux Utility for Resource Management (SLURM) and TriLabs Open Source Software (TOSS) environment. PerMA simulator Our persistent memory simulator is a companion project to DI-MMAP that allows our runtime to simulate the performance of future generations of NVRAM technologies. This allows us to execute applications at scale and test how they will perform on future platforms. Runtime Analysis DI-MMAP provides introspection interfaces that allow applications to log the sequence of page faults and frequency of page (re-)faults for both online and offline analysis. Additionally, applications can track page-specific statistics such as residency, number of major and minor faults, as well as dynamic fault rates. # Monitoring di-mmap fault sequence window every 1 ms # Fault Statistics # fault_seq V T D Q FC DevID pgOffset V 1 1 1 0 F 1 3047. 693438 0x000000000000 0 2 1 1 0 F 1 3047. 693438 0x00074aaaa000 0 3 1 1 0 F 1 178. 47476 0x000000000000 0 ... 42938497 1 2 1 V 2 1765. 431957 0x000429cea000 0 42938498 1 1 0 F 1 2362. 514471 0x00045ca08000 0 42938499 1 1 0 F 1 1765. 431957 0x000429d15000 0 42938500 1 1 0 F 1 1324. 340240 0x000419910000 0 42938501 1 1 0 F 1 2014. 694450 0x00040381f000 0 42938502 1 1 0 F 1 1053. 984643 0x000419931000 0 42938503 1 2 1 V 2 548. 72548 0x00040a071000 0 FUNDING AGENCY: LLNL/LDRD, ASCR FUNDING ACKNOWLEDGEMENT: This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DE-C52-07NA27344. AUTHORS: Brian Van Essen [email protected], Roger Pearce, Sasha Ames, Maya Gokhale RESOURCES: Center for Applied Scientific Computing (LLNL), Livermore Computing (LLNL), https://computation-rnd.llnl.gov/perma/activities.php LLNL-POST-662211 T 0 0 0 D 0 0 0 Q F F F FC 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 P H P P P P P 1 1 1 1 1 1 1 Evicted Page Statistics DevID pgOffset 0. 0 0x000000000000 0. 0 0x000000000000 0. 0 0x000000000000 3047. 989. 2076. 4006. 2416. 2660. 1636. 693438 458561 191920 935087 804018 608042 362025 0x00000054a5c5 0x00000000cd97 0x0000002b0dd8 0x0000003b01f2 0x0000004c2972 0x000000152914 0x00000027b308
© Copyright 2024 ExpyDoc