|
|
Haifa Systems and Storage Conference 2007
IBM Haifa Labs
October 29, 2007 Organized by IBM Haifa Labs and the Technion
GWiQ-P: An Efficient Decentralized Grid-Wide Quota Enforcement Protocol
Kfir Karmon, Liran Liss and Assaf Schuster (Technion)
Mega-grids span several continents and may consist of millions of nodes and billions of tasks executing at any point in time. This setup calls for a scalable and highly available resource utilization control that adapts itself to dynamic changes in the grid environment as they occur. In this paper, we address the problem of enforcing upper bounds on the consumption of grid resources. We propose a grid-wide quota enforcement system, called GWiQ-P. GWiQ-P is light-weight and in practice infinitely scalable, satisfying concurrently any number of resource demands, all within the limits of a global quota assigned to each user. GWiQ-P adapts to dynamic changes in the grid as they occur, improving future performance by means of improved locality. This improved performance does not impair the system's ability to respond to current requests, tolerate failures, or maintain the allotted quota levels.
Storage Virtualization using a block-device File System
Sorin Faibish, Stephen Fridella, Peter Bixby and Uday Gupta (EMC Corporation)
Virtualizing block storage devices requires a location-independent namespace, dynamic allocation and management of free space, and a scalable and reliable meta-data store. A file system provides all of these features, which makes a file an attractive abstraction on which to build virtual block devices or LUNs. However, mapping virtual block devices to files represents an atypical use case for traditional file systems. In a block-device file system, objects can be expected to be few in number and very large. These objects will have high data throughput requirements and very high meta-data reliability requirements.
In this paper, we present a new idea on meta-data management in file systems. We propose a new file system design that cleanly separates file data from meta-data; we demonstrate that our design allows users to achieve differing service level objectives for data and meta-data without modifying any file system code; and we present experimental results that validate the benefits for applications and users of such an approach.
We explore the ways in which this particular usage of a file system challenges some basic assumptions used in the design of general purpose file-systems, and present the performance characteristics of the new file system architecture that might lead us toward an ideal block-device file system. This architecture can be easily extended for use by general purpose FFS-type file systems. An early implementation as an extension of the ext3 Linux FS was introduced recently by the DualFS file system.
Keynote: Future Directions in Advanced Storage Services
Danny Dolev (HUJI)
Reliability has become a major challenge for computer systems research. Online service providers typically deploy a three-tier software architecture that includes servers at the middle-tier to process clients requests (the first-tier) and a distributed database system at the third-tier for storing the data. Replication is a key feature for protecting the middle-tier as transactions are essential for protecting the database tier. Current replication techniques usually deploy a distributed scheme based on total-ordering or a simple primary-backup technique. The distributed scheme is more expensive in terms of communication steps and processing requirements, as a distributed consensus should be achieved before delivering each received message. The primary-backup scheme is easier to implement and more efficient (both in latency and throughput), but it is not as scalable as the distributed solution. The inherent problem of current solutions is that the replication layer that is introduced in the packet flow of the middle-tier is not enough to guarantee the consistency of the tier. Additional services such as replica membership service, failure detection, etc. should also be introduced at each server node. This introduces a significant load on each server handling an enormous amount of client requests.
In the lecture, we will discuss an alternative to the use-level replication algorithms. The new approach suggests that instead of running such algorithms on the host CPUs, available processing power at the peripheral devices can be utilized. Utilizing networking offloading for replication purposes will enable to create a clearer separation of concerns between a server's logic and the replication layer. Additionally, offloading the replication-related services to the disk controllers and the networking devices will relieve the load from the server CPUs and improve the overall systems performance.
Time Travel in the Virtualized Past: Cheap Fares and First Class Seats
Liuba shrira (Brandeis University), Catherine van Ingen (Microsoft), and Ross Shaull (Brandeis University)
"Time travel" in the storage system is accessing past storage system states. Legacy application programs could run transparently over the past states if the past states were virtualized in a form that makes them look like the current state. There are many levels in the storage system at which past-state virtualization could occur. How do we choose? We think that past-state virtualization should occur at a high storage system buffer manager level, such as database buffer manager. Everything above this level can run legacy programs. The system below can manage the mechanisms needed to implement the virtualization. This approach can be applied to any kind of storage system, ranging from traditional databases and file systems to the new generation of specialized storage managers such as Bigtable. Granted that time travel is a desirable feature, this position paper considers the design axis for virtualizing past states for time travel, and asks what amounts to the question: Can we sit in first class and still have cheap fares?
Virtual Machine Time Travel using Continuous Data Protection and Checkpointing
Paula Ta-Shma, Guy Laden, Muli Ben-Yehuda and Michael Factor (IBM HRL)
Virtual machine (VM) time travel enables quickly reverting a VM's state to some prior point in time. Such a capability is useful for forensics, testing, and improving VM availability. Prior approaches have focused on capturing the VM's transient state, for the most part ignoring the persistent disk state. By contrast, our approach is based on combining Continuous Data Protection support in a storage subsystem with checkpointing a VM's transient state using existing hypervisor support for live migration. Our contribution is demonstrating the coordinated use of persistent and transient state-saving technologies in the context of VM time travel.
Robust paravirtualized IO drivers for the modern hypervisor
Dor Laor (Qumranet)
As virtualization gains widespread adoption, the need for high performance virtual guests with close to native performance becomes critical. One of the major bottlenecks is I/O. Paravirtualized drivers are not unique to KVM-they exist for Xen, Vmware, and Lguest hypervisors. The VirtIO interface implements a generic network and block driver logics to be used by specific hypervisors. Since the above logic is error-prone and sensitive code that is hard to optimize, it is a good candidate for sharing/re-use among hypervisors.
This work started by implementing a VirtIO backend for the KVM and continued by enhancing VirtIO. The hypervisor-specific part is pretty small and most of the code can be encapsulated by a well defined interface. The new VirtIO is adopted by the community and a Plan9 interface for KVM was developed on top of it. The interface and architecture will be also the basis for s390 paravirtualized drivers.
Capability based Secure Access Control to Networked Storage Devices
Michael Factor, Dalit Naor, Eran Rom, Julian Satran and Sivan Tal (IBM HRL)
Today, access control security for storage area networks (zoning and masking) is implemented by mechanisms that are inherently insecure and tied to the physical network components. However, what we want to secure is at a higher logical level independent of the transport network; raising security to a logical level simplifies management, provides a more natural fit to a virtualized infrastructure, and enables a finer-grained access control. In this paper, we describe the problems with existing access control security solutions, and present our approach that leverages the OSD (Object-based Storage Device) security model to provide a logical, cryptographically secured, in-band access control for today's existing devices. We then show how this model can easily be integrated into existing systems and demonstrate that this in-band security mechanism has negligible performance impact while simplifying management, providing a clean match to compute virtualization and enabling fine grained access control.
Keynote: Improving the Performance of Network Virtualization
Willy Zwaenepoel (EPFL)
Despite recent advances in virtual machine technology, virtualization overhead seriously degrades the performance of network-intensive applications. In this talk I will analyze the sources of this overhead, and I will present two approaches to address the problem, a software-only approach and a hardware-software approach.
The software-only approach defines a new high-level virtual network interface, which allows virtualized operating systems to take advantage of common optimizations found in hardware network interfaces, such as scatter-gather DMA, checksum offloading, and TCP segmentation offloading. This approach provides transmit performance far superior to the common approach of using the lowest-denominator virtual network interface definition, but it does not do much for receive performance.
The hardware-software approach uses a new hardware network interface card, which supports multiple concurrent "contexts" on the device, each of which can be used in isolation by a separate virtual machine. Protection between contexts and virtual machines is guaranteed by the hardware and a modified version of the hypervisor. This approach improves both transmit and receive performance, and scales to a large number of virtual machines.
This is joint work with Alan Cox, Scott Rixner, Jeff Shafer and Paul Willman from Rice University, and Aravind Menon from EPFL.
| |
|
|