Why SmartOS? KVM, DTrace, Zones and More

On August 15th of this year, Joyent announced that it had ported KVM to its operating system, SmartOS, and was open sourcing the entire OS. You might be wondering why the world needs another operating system, and what's so great about this one. Well, let me explain...

A Little History

In 2005, Sun Microsystems open sourced Solaris, its renowned Unix operating system, eventually to be released as a distribution called OpenSolaris. Among the earliest adopters and most effective advocates of OpenSolaris was Ben Rockwood, who wrote The Cuddletech Guide to Building OpenSolaris in June, 2005 – the first of his many important contributions to the nascent OpenSolaris community. Meanwhile, Joyent's CTO Jason Hoffman was frustrated by the inability of most operating systems to answer seemingly-simple questions like: "Why is the server down? When will it be back up? ... Now that it's back up, why is my database still slow?"

Jason knew that these questions would be a lot easier to answer on Solaris-based systems, and recognized Sun’s open-sourcing initiative as a huge opportunity. He hired Ben, and Joyent became one of the most innovative users of the open sourced Solaris kernel ("Solaris 11 Nevada builds"), over the years amassing a great deal of know-how in tweaking and tuning it for Joyent’s cloud computing needs.

After acquiring Sun Microsystems in 2010, Oracle Corp. closed OpenSolaris. Fortunately, an alternative - Illumos, a new fork of Solaris - was already in the works, and many Solaris engineers had left Oracle and were free to contribute to it. Unsurprisingly, some of those engineers ended up at Joyent, as part of a talented team that now contributes very substantially to Illumos, extending it in key areas like KVM (kernel virtual machines), as well as enhancing the Illumos kernel specifically for cloud use.

The Real Cloud OS

What does it mean for an operating system to be designed “for” cloud computing? The fundamental challenge for a cloud computing OS is to present a single server to many (and varied) customers, while making each customer feel as if they are the only one using that machine. From the user's perspective, a cloud OS has to be:

  • fast: minimizing latency (the time it takes for an operation to complete)
  • flexible: with automatic bursting and easy scaling
  • secure: I should never have to worry about what my neighbors are doing

For the cloud data center operator, the OS additionally must provide:

  • ultra-fast provisioning and de-provisioning (i.e., the creation and destruction of virtual machines)
  • efficient and fair resource sharing
  • multi-thread and multiprocessor support
  • easy/automated operation
  • reliability
  • observability: when something doesn't behave as it should, we need to be able to find out quickly what is wrong and why

Inherited Features

From Illumos, SmartOS inherits powerful features that address these needs. We'll give a brief overview here; some of these topics will be covered in depth in future posts.

Operating System Virtualization

"Thanks to the Solaris/Illumos heritage, SmartOS already had Containers and Zones – container-based virtualization (containers is supposed to mean zones + resource controls) that allowed users to run multiple applications sets on one server isolated from one another. With KVM on SmartOS, Joyent can now address workloads that require running a full operating system for those customers who need Linux, Windows, or other operating systems to run in full, hardware-assisted virtualization. Unlike any other "hypervisor", Joyent's KVM images run as a process inside of a zone: turns out to be a very secure way to run Windows. And, unlike Linux, SmartOS will also give customers access to Solaris technologies that many users find compelling – like DTrace and ZFS. " ReadWriteEnterprise

ZFS

This future proof file system - which is also a logical volume manager - gives us:

  • Fast file system creation: The creation and startup of additional zones ("SmartMachines" in Joyent terminology) – in other words, adding new paying customers -- is nearly instantaneous.
  • Data integrity is guaranteed, with particular emphasis on preventing silent data corruption.
  • Storage pools: "virtualized storage" makes administrative tasks and scaling far easier. To expand storage capacity, all you need to do is add new disks (hard disks, flash memory, and whatever may come along in the future) to a zpool.
  • Snapshots: ZFS' copy-on-write transactional model makes it possible to capture a snapshot of an entire file system at any time, storing only the differences between that and the working file system as it continues to change. This creates a backup point that the administrator can easily roll back to.
  • The ARC (Adaptive Replacement Cache) improves file system and disk performance, driving down overall system latency.

Scalability

wikipedia defines scalability as "...the ability of a system, network, or process, to handle growing amounts of work in a graceful manner, or its ability to be enlarged to accommodate that growth."

Solaris has been the OS of choice for major enterprise computing for decades. 'nuff said!

Resource Controls

SmartOS offers two methods for controlling CPU consumption:

  • Fair share scheduler lets the operator set a minimum guaranteed share of CPU. It takes effect when the system is busy with demand from more than one zone, to ensure that each gets its fair share. When the system is not otherwise busy, a zone can "burst" beyond its usual limit, consuming more than the minimum as needed, up to the CPU cap set for it.
  • CPU cap is a maximum, e.g. an amount of CPU time that a user has paid for. This can also be used to set user expectations about system performance, even when the overall system is not yet populated and load is still light.

Network Virtualization

Virtualization is also used to create the illusion of things that aren’t actually on the real system, such as virtual network interfaces (VNICs). Joyent was one of the first users of Project Crossbow, which added network virtualization to OpenSolaris. Using this technology, each Joyent SmartMachine gets up to 32 VNICs, each with its own TCP/IP stack. This helps maximize another scarce resource, IPv4 addresses, through the use of network pools.

Observability

Users of Illumos, Mac OS X and FreeBSD know that DTrace gives you an unprecedented view of what's going on throughout the software stack. In SmartOS, this allows operators to observe and troubleshoot across all the zones and nodes in an entire data center. In SmartDataCenter, the Joyent team have harnessed the power of DTrace in a more user-friendly form with Cloud Analytics, which is available to both cloud operators and their customers.

Security

Solaris has long been the operating system of choice in highly secure data centers, thanks to several features which SmartOS inherits. SmartOS zones, though they share system resources such as CPU and disk space, simply cannot see each other. Users in a multi-tenant environment are thus protected from each other; your neighbor's security lapse will not affect your zone. Data security is also ensured: no byte of data from one customer is shared with any other customer, now or later, because:

  • A zone can only see its own network traffic.
  • Disk storage is accessed only via ZFS file systems, never raw devices. Each SmartMachine has its own file system and does not even know of the existence of any other.
  • A user has no access to raw memory devices, so can't scan system memory.

Upon deletion of a SmartMachine, the file system is destroyed and there is no device path by which a future customer could access any data left over in that file system. A SmartMachine is protected from DDOS attacks by some of the same features that guarantee that it gets a fair share of system resources: fair share scheduler, caps, process limits, rcapd, swap cap, disk file system limits, quota limits. By capping each zone's resource usage, SmartOS ensures that, even under heavy attack, a zone will not bring down its neighbors.

Reliability

SmartOS is made more reliable by:

  • Fault management (FMA): "fine-grained fault isolation and restart where possible of any component — hardware or software — that experiences a problem. To do so, the system must include intelligent, automated, proactive diagnoses of errors that are observed on the system. The diagnosis system is used to trigger targeted automated responses or guided human intervention that mitigates a specific problem or at least prevents it from getting worse."
  • The Service Management Facility (SMF) is "a feature of the Solaris operating system that creates a supported, unified model for services and service management on each Solaris system".

Joyent-Added Features in SmartOS

Above and beyond what we inherited from Solaris, Joyent has extended SmartOS with some features of particular interest to cloud operators, including disk I/O throttling. A drawback of multi-tenancy in classic Solaris is that, where storage is shared, a single application on a system can monopolize access to local storage by a stream of synchronous I/O requests, effectively blocking the system from servicing I/O requests from other zones and applications, and causing performance slowdowns for other tenants. This new operator-configurable setting throttles I/O from misbehaving zones (by adding a small delay to each read or write), thus ensuring that other zones also get a turn at reading/writing to disk. As with CPU caps, disk I/O throttling only comes into effect when a system is under load from multiple tenants. When a system is relatively quiet, a single tenant can enjoy faster I/O without bothering the neighbors.

Orchestrating the Cloud

SmartOS allows a cloud hosting provider to put more customers on the physical server (each in their own SmartMachine), while still giving them all phenomenal performance. Joyent’s servers typically run at 70% CPU capacity, against an industry standard of 15%. Joyent SmartMachines also run faster. SmartOS provides the underlying features; SmartDataCenter adds the orchestration layer that abstracts these concepts and operations to a GUI and/or API layer.

Beyond the Cloud

We should add that SmartOS potentially has applications well beyond the cloud and large data centers. Here's an idea from Stacy Higginbotham of Gigaom:

SmartOS only requires 128 MB of RAM to boot, which means it can be used for a variety of smaller gadgets such as digital signs, set-top boxes and even high-end sensors. Looking ahead, having an OS that can work at both the data-center level and on sensors in the field enables a sensor-rich network.

Now What?

To learn more, and to download SmartOS to try for yourself, visit smartos.org.



Post written by deirdrebstraughan