Getting Your Virtualization Priorities Straight

November 18, 2010 - by Jerry Jelinek

Having recently come to work here at Joyent, after spending many years as a kernel developer working on virtualization at Sun/Oracle, its been interesting to see the different emphasis in the development priorities between the two companies. Obviously virtualization is a key foundation for Joyent SmartMachines. Using Operating System Virtualization, Joyent is able to deliver outstanding performance and scalability as compared to Virtual Machine-based alternatives.

Virtualization was also important at Sun, but the emphasis was completely different. At Sun/Oracle, most of our development work in this area over the last few years was focused on consolidation. We delivered the capability to run Solaris 8 and Solaris 9 branded zones on Solaris 10; we delivered a Solaris 10 branded zone for Solaris 11; and we delivered a variety of physical to virtual (p2v) solutions. All of this effort was great for users wishing to consolidate legacy Sun environments into zones, but didn't do much to address the issues related to actually running production environments with zones. In fact, you have to go back several years, to the duckhorn project, to see some of the big improvements we made around improved resource management for virtualization. At Joyent the issues around resource control, monitoring and management of a massive cloud are fundamental, whereas consolidation is relatively less interesting.

Looking back at duckhorn, you can see the framework we put in place for resource management around CPU and memory utilization on a virtualized system. The other two dimensions, which weren't as well supported at the time, were storage and network resource management. These are the four primary categories which need some sort of limits to ensure that one virtual environment does not unfairly hog the underlying physical system. Since the duckhorn project was completed, Crossbow has come along to lay the foundation for network resource management. At Joyent we deploy on ZFS, and while this inherently provides many of the storage capabilities needed to support virtualization, there are still some fundamental enhancements that can be done.

Here in the Joyent kernel group we've been researching the topic of storage IO control and exploring possible solutions. Due to the fairly unique nature of Joyent SmartMachines, some of the alternatives used in virtual machine-based solutions are not fully applicable in our environment. Because we're running a single OS kernel we're able to deliver the full performance of the IO subsystem to a SmartMachine. There is no overhead of an IO virtualization layer or awkward situation in which multiple OS kernels, without any awareness of each other, are fighting amongst themselves in an attempt to manage IO.

Delivering this high performance is a key differentiator for SmartMachines and its critical that we continue to provide this as we design new resource management capabilities into the system. Thus, IO capping seems to be clearly inappropriate in our environment. If there's any doubt about this, looking at Brendan's slides from his recent LISA '10 presentation, particularly slide 34, should make it obvious. You can see that the achievable IOPS from a given disk can vary over two orders of magnitude on differing workloads, and the number of IOPS that you can physically obtain is sometimes very low. With this much variability, there is no way to throttle IO without potentially leaving a large amount of performance on the table. Instead, we need a solution that lets us continuously drive the physical IO subsystem as fast as possible so that we can deliver that speed directly to applications in each SmartMachine, without any loss.

This is a particularly interesting and complex problem. Its exciting to be building a solution that will allow us to continue to deliver our incredible IO performance directly to SmartMachines, while also enabling innovative new forms of resource management that are unavailable on other platforms.