CloudLab - Software Technology

The software stack that manages CloudLab is based on Emulab, a testbed control suite that has been developed by the Flux Research Group at the University of Utah. Emulab’s primary strength lies in provisioning an ensemble of resources at the physical level, giving experimenters “raw” access to compute, network, and storage resources. The description of an ensemble includes a full description of the network, enabling Emulab to tightly control network topologies and to do network-aware resource placement. It places the infrastructure layer below the cloud software architecture, enabling experimenters to run cloud software stacks such as OpenStack, Eucalyptus, and CloudStack as experiments within CloudLab. Some of these experiments may be very long-lived, representing persistent production-quality clouds that may themselves offer cloud services to other users for years at a time. Others may last weeks, days, or even hours, representing targeted experiments to test specific hypotheses. CloudLab is extremely flexible in its allocation policies, as the Emulab software is capable of fast re-provisioning, switching between experiments (in this case, entire clouds) on the order of minutes.

The technology behind Emulab was first described in a paper at OSDI, and dozens of subsequent papers have covered further steps in its evolution.

Federated From the Ground Up

Open Source

All software used to run CloudLab is released under open source licenses, and can be found in the Emulab git repositories. The software is already in use to run dozens of testbed facilities worldwide, as well as thirty “InstaGENI racks” across the US.

GENI is a distributed infrastructure built by the National Science Foundation to support research in networks and distributed systems. CloudLab uses many technologies that were originally developed for GENI.

GENI is built around federation: this allows each one of the clusters that comprise CloudLab to operate autonomously, but to be seamlessly combined into a single whole for running large experiments. It also means that any testbed that supports the GENI APIs may federate with CloudLab.

GENI has a rich API for listing, requesting, and controlling resources, as well as full language for describing resources. CloudLab offers a straightforward web-based GUI for building clouds, but the full power of GENI is available through these APIs. Many tools that work with GENI will also work out of the box on CloudLab.

Several Ways to Build a Cloud

Use A Pre-Built Profile Available Now

CloudLab has a library of profiles with pre-made definitions of clouds. Some of these profiles are provided by the CloudLab staff, and have standard installs of popular cloud software stacks like OpenStack and Hadoop. Others are be provided by the CloudLab community—anyone who uses CloudLab has the ability to share the profiles they create. These user-created profiles may be variations on the standard ones, or they be entirely new cloud stacks built by the research community.

Customize a Profile Available Now

You can start with an existing profile and customize it. For example, create a cloud using a standard OpenStack profile, log in and replace the storage stack, and make your own profile with the customized stack.

Build a Profile By Hand Available Now

CloudLab profiles are defined using GENI RSpecs and Emulab disk images, meaning that you can craft them by hand or generate them with your own code. CloudLab supports the GENI APIs, so everything you can do through our web interface can be fully scripted and automated.

Design With a GUI Available Now

We have a GUI, Jacks, for designing profiles; this GUI, which is suitable for relatively small experiments, allows users to interactively draw topologies, pick disk images, and set boot scripts. It's designed to provide a user-friendly experience, by ensuring that the topologies that it allows users to draw can be instantiated on the hardware the CloudLab has.

Generate With Code Available Now

For large clouds, a GUI is impractical, so it is possible to programatically generate cloud descriptions using a set of Python libraries. You can write loops to define worker nodes and LANs, assign roles, disk images, startup scripts, and more. These scripts generate the RSpecs that are used to define profiles.

Working with profiles

CloudLab uses profiles to describe the hardware and software needed to build a cloud. A profile can be as simple as a “clean slate” installation of a standard operating system on bare metal hosts, or it can contain an entire software stack. The profile shown below contains a canned instance of OpenStack.

The create button causes CloudLab to start provisioning the hardware resources specified by the profile and loading the appropriate software on them. When all provisioning is done (which typically takes a few minutes), you get a screen like the one below, showing the resources that have been allocated to your cloud. The resources allocated to you are for your exclusive use for the duration of the experiment (though you can open up the cloud to other users if you choose.)

Once your cloud is ready, you can interact with it directly. The pictures below show an instance of OpenStack that's running inside of CloudLab. You have full control over all of the machines, including the controller, and can replace any piece of software you want!

Under the Hood: Bare metal provisioning

To provide bare-metal access, CloudLab PXE-boots all physical nodes and use a custom boot loader to control the first-stage boot process: this allows it to get control back between users. Our boot loader can handle booting local disk or booting network-loaded kernels. One of the uses for the latter is imaging local disks: we built a custom, extremely fast, highly-scalable multicast disk loader to accomplish this. Taken together, this lets CloudLab re-provision machines at a bare-metal level rapidly (actual disk loads often take less than a minute) and give users nearly full control. CloudLab will have the ability to power cycle or perform hard resets on all nodes to recover them if they become wedged. We take the security of the re-imaging process seriously—we have built a TPM-based secure disk loading system. We also use a sophisticated state machine system that tracks the boot process of the nodes at multiple stages. It takes action to recover from a variety of failure modes, and in some cases, will even give up on one machine and re-provision another to take its place.

CloudLab Technology