Mai Zheng Research Statement Storage system failures are extremely damaging — if your browser crashes you sigh, but when your family photos disappear you cry. So we need highly reliable storage systems that can keep our data safe no matter what happens. Such high standard of reliability is difficult to achieve, which makes the problem even more fascinating. Thus, the primary focus of my research is on the reliability of storage systems, which is broadly defined as everything from low-level storage devices to high-level information management systems like databases. Besides data storage, another challenge in this era of Big Data is how to process the huge volume of data correctly and efficiently. From systems’ perspective, the question boils down to how to make full use of parallelism and achieve efficient parallel or distributed computation while maintaining reliability, which is much more challenging compared to the traditional sequential computation. Thus, my research has been centered around the two critical aspects of Big Data: reliability and efficiency of data storage and computation. In terms of categories of systems, I mainly focus on storage systems and parallel and distributed systems. In particular, my work on reliability of systems includes a record-and-replay framework that helps test and diagnose the failure-recovery capability of modern databases [1], a fault-injection framework that uncovers the failure patterns of flash-based solid state drives (SSDs) [2], and two low-overhead bug detectors for parallel and concurrent programs [3][4]. My work on efficiency of systems includes a fine-grained profiler for tuning the shared memory usage of parallel programs [5] and a low-latency data layout scheme for distributed storage systems [6]. My future research will continuously focus on making computer systems more reliable and more efficient, and expand to closely related areas like security. Specifically, in the short term, I plan to build an enhanced reliability-analysis framework for storage systems along two dimensions: first, analyzing the failure propagations among different layers in the whole storage stack (e.g., file systems, logical volume manager, software RAID, etc.) on a single machine; second, analyzing the failure handling in cloud storage systems with multiple replicas. Besides, another interesting opportunity opened up by the analysis is enhancing the reliability of data transactions based on the errors exposed by our analysis framework. Moreover, when a system becomes un-reliable (e.g., a transaction is partially committed), it may also become more vulnerable to security attacks. Thus, I will explore the combination effect when reliability meets security. In the longer term, I will explore the reliability, security, and efficiency of emerging systems with emerging technologies (e.g., mobile systems and phase change memory or PCM). My ultimate goal is to make computer systems better so that people may benefit from them more. To this end, I will keep looking for cross-area and cross-disciplinary challenges and opportunities. Through the collaboration with researchers from different domains, I can amplify my expertise and experience and thus maximize my contribution to the society as a researcher. The following sections elaborate on my existing and future research mentioned above. 1. Previous and ongoing research Reliability analysis of modern databases [1]. People use databases when they want a high level of reliability. Specifically, they want the sophisticated ACID (atomicity, consistency, isolation, and durability) protection modern databases provide. However, the ACID properties are far from trivial to provide, particularly when high performance must be achieved. This leads to complex and error-prone code—even at a low defect rate of one bug per thousand lines, the millions of lines of code in a commercial OLTP database can harbor thousands of bugs. In particular, checking for the ACID properties under failure is notoriously hard since a failure scenario may not be conveniently reproducible. Mai Zheng – Research Statement Page 1 of 4 My colleagues and I built a framework to expose and diagnose ACID violations in databases under failures. By decoupling the databases from the main framework through iSCSI, which is a protocol that allows one machine to access the block devices on anther machine over the network, we can test databases on multiple operating systems with high fidelity. By running carefully-designed workloads, we can stress different functionalities of databases and check each type of ACID properties easily. By using record-and-replay technique, we can systematically simulate the effect of power outages or system crashes at every possible point during a workload and re-create the failure scenarios precisely. Moreover, we discover five low-level I/O patterns that can be used to predict the critical operations of the workloads which are most vulnerable to power faults. In addition, we develop a multi-layer tracer, which covers everything from function-call level semantics to the lowest level of block operations and provides an excellent whole picture for diagnosis. Using our framework, we study 8 widely-used databases, ranging from open-source key-value stores to highend commercial OLTP servers. Surprisingly, all 8 databases exhibit erroneous behavior. We are working with the developers of the databases to fix the defects exposed by our framework. Meanwhile, we have been contacted by several interested companies including Microsoft and Facebook. Reliability analysis of flash-based solid state drives (SSDs) [2]. High-level storage systems (e.g., databases) reply on low-level storage devices to provide the basic data integrity and consistency guarantee. Although the interface remains the same, the underlying technology has evolved a lot. Modern storage devices like SSDs brings new reliability challenges to the already-complicated storage stack. Among other things, their behavior under adverse conditions (e.g., power outages) is an important yet mostly ignored issue. My colleagues and I built a framework to expose reliability issues in block devices such as SSDs under power faults. The framework includes specially-designed hardware to inject power faults directly to devices, workloads to stress storage components, and techniques to detect various types of failures. Applying our framework, we test fifteen commodity SSDs from five different vendors using more than three thousand fault injection cycles in total. Our experimental results reveal that thirteen out of the fifteen tested SSD devices exhibit surprising failure behaviors under power faults, including bit corruption, shorn writes, unserializable writes, metadata corruption, and total device failure. Our analysis result has been reported by several media. Moreover, we have been contacted by several major SSD manufactures (e.g., Intel, sTec, LSI, etc.), storage system builders (e.g., IBM, Savvis, SolidFire, etc.), and consumers (e.g., Airbus, PepsiCo, RBA Consulting, etc.). Reliability improvements for parallel and concurrent programs [3][4]. Besides storage, another challenge in this era of Big Data is computation: how to process the huge volume of data efficiently? The processor industry answers this question by introducing more aggressive parallel architectures. Besides multi-core CPUs, the many-core GPUs (i.e., graphic processing units) have emerged as an extremely cost-effective means for general-purpose computation. However, these parallel platforms are non-trivial to use correctly due to the additional thread-interleaving and synchronization. To improve the reliability of programs running on these platforms, my colleagues and I combined static and dynamic analysis, made full use of architectural features, and built a low-overhead data race detector called GRace for GPU programs. Also, for multi-threaded CPU programs, we developed a state-machine based approach to capture a set of concurrency bugs that involve shared objects. Efficiency improvements for parallel and distributed systems [5][6]. On parallel GPU systems, efficiency highly depends on the usage of the memory hierarchy. In particular, how to make an optimal tradeoff between the large-but-slow device memory and the small-but-fast shared memory is a critical and non-trivial question for achieving high performance. To help answer this question, my colleagues and I built a fine-grained profiler for tuning the shared memory usage, through a combination of software analysis and architectural optimizations. Similarly, for hybrid storage systems containing both hard disks and SSDs, the efficiency highly depends on the usage of the storage hierarchy. By carefully placing data with different access patterns on different devices, we improved the efficiency of a distributed storage system by up to 5 times under typical workloads. Mai Zheng – Research Statement Page 2 of 4 2. Future research Based on my existing research, I see many opportunities in computer systems and related areas where my expertise and experience may help advance the frontier in the future. In the short term, I will enhance my current reliability-analysis framework. The first direction is analyzing the failure propagations among different layers in the whole storage stack on a single machine. Databases and SSDs are just two layers in a typical storage stack, which may contain multiple other layers (e.g., file systems, logical volume manager, software RAID, etc). Each of these additional layers has its unique characteristics and thus may require unique workloads and checking logic. Given that we have observed erroneous behavior in many databases and SSDs, it is likely that the other layers may also contain defects. I will first analyze each individual layer, which is fundamental for the whole system analysis. Then, I will analyze how the failure of a lower layer propagates to upper layers and interact with the potential errors inside the upper layers, which is crucial for improving the reliability of the system as a whole. The second direction is analyzing the failure handling in cloud storage systems with multiple replicas. Many data are now stored and managed in the cloud. If even the relatively matured single-machine storage systems can exhibit erroneous behaviors, it becomes important as well as emergent that we perform similar in-depth analysis on cloud storage systems, which add more layers on top of the local storage stack and are responsible for protecting much more data. The cloud environment introduces many new challenges not only to the system under analysis, but also to the analysis framework itself. For example, synchronization of timestamps, which is necessary for combining the traces, is a new issue for analysis. I will study state-of-the-art techniques for handling the classic distributed problems, incorporate them with my analysis methodology, and design a unique distributed analysis framework for the cloud storage systems. Besides enhancing the analysis framework, another interesting opportunity opened up by the analysis is enhancing the reliability of data transactions based on the errors exposed by our framework. For example, our databases analysis shows that there is a big gap of understanding or assumptions between the database developers and the operating system developers in terms of the behavior of the system call interface. With the collaboration of researchers from different communities, we can bridge the gaps and make storage transactions truly reliable under failures while achieving acceptable efficiency. In addition, I will explore security, especially the combination effect when reliability meets security. Many issues (e.g., buffer overflow) are related to both reliability and security. When a system becomes un-reliable (e.g., a transaction is partially committed as triggered by our framework), it may also become more vulnerable to security attacks, which is a more and more critical issue in this Big Data era. I will collaborate with researchers from security community to address the combined challenges. In the longer term, I will explore the reliability, security, and efficiency of emerging systems with emerging technologies. For example, mobile systems such as Android have become a more and more important computing platform in daily life. These systems relay on flash memory for persistent storage. Meanwhile, they have special constraints (e.g., energy efficiency). My experience on flash-based SSDs and on storage systems in general may also help in optimizing mobile systems as well. Also, just like SSDs have revolutionized the storage market, new technologies like phase change memory (PCM) will probably improve the performance of existing systems greatly while introducing new challenges. New design tradeoffs must be made to achieve an optimal balance among reliability, security, and efficiency. Moreover, with the new advancement of systems and technologies, combining the best of different systems will become possible. For example, parallel platforms like GPUs are mainly used for high-performance computing. It is interesting to see whether we can make use of the parallel computing power for efficient storage without sacrificing reliability and security. My ultimate goal is to make computer systems better in terms of reliability, security, efficiency, and other desirable properties so that people can benefit from them more. To this end, I will start from my existing domains and keep looking for cross-area and cross-disciplinary challenges and opportunities. I believe through the collaboration with researchers from different domains, I can amplify my expertise and experience, and thus maximize my contribution to the society as a researcher. Mai Zheng – Research Statement Page 3 of 4 References [1] M. Zheng, J. Tucek, D. Huang, F. Qin, M. Lillibridge, E. Yang, B. Zhao, and S. Singh, “Torturing Databases for Fun and Profit ”. In Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI’14), 2014 [2] M. Zheng, J. Tucek, F. Qin, and M. Lillibridge, “Understanding the Robustness of SSDs under Power Fault”. In Proceedings of the 11th USENIX Conference on File and Storage Technologies (FAST’13), 2013 [3] M. Zheng, V. Ravi, F. Qin, G. Agrawal, “GRace: A Low-Overhead Mechanism for Detecting Data Races in GPU Programs” In Proceedings of the 16th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming (PPoPP’11), 2011 [4] Q. Gao, W. Zhang, Z. Chen, M. Zheng, F. Qin, “2ndStrike: Towards Manifesting Hidden Concurrency Typestate Bugs”. In Proceedings of the 16th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’11), 2011 [5] M. Zheng, V. T. Ravi, W. Ma, F. Qin, G. Agrawal, “GMProf: A Low-Overhead, Fine-Grained Profiling Approach for GPU Programs”. In Proceedings of the 19th IEEE Annual International Conference on High Performance Computing (HiPC’12), 2012 [6] D. Huang, X. Zhang, W. Shi, M. Zheng, S. Jiang, F. Qin, “LiU: Hiding Disk Access Latency for HPC Applications with a New SSD-Enabled Data Layout”. In Proceedings of the 21st IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS’13), 2013 Mai Zheng – Research Statement Page 4 of 4
© Copyright 2024 ExpyDoc