acm-header
Sign In

Communications of the ACM

ACM News

Sharing Resources in Condo Computing


View as: Print Mobile App Share:
IT workers at Purdue University adding capacity to the Coates compute cluster.

Condo computing allows users to purchase one component in a shared computing infrastructure, while gaining access to greater resources depending on need and availability, and sharing infrastructure expenses with other condo owners.

Credit: Purdue University

The University of California, San Diego’s Research Cyberinfrastructure (RCI) program recently started offering researchers a new form of access to its high-performance computer services. The Triton Shared Computing Cluster (TSSC), operated by the San Diego Supercomputer Center, will let users purchase their own computing nodes or servers, and have them installed in the cluster; this arrangement will give the researchers unlimited access to their own node, plus occasional access, as needed, to idle nodes elsewhere in the cluster.

This offering is one example of the trend towards condominium or "condo" computing in university and research center supercomputers. The term "condo" comes from the fact that a user owns one component in a shared infrastructure, much the way a condominium owner owns one unit in a shared building, sharing maintenance and other services (and their fees) with the other residents. This is in contrast to the more familiar "hotel" approach, in which users rent time on a central computer resource, but must contend with possible lack of availability.

The condo computing trend started in the middle of the last decade. One of its pioneers was Purdue University, where Gerry McCartney was hired as CIO in 2007 (today he also serves as the university's vice president of information echnology). McCartney came up with a business plan whereby the university would pay for the computing facility, electricity, and racks, while faculty would buy their own computing nodes with which to build it out.

Initially it was a hard sell, says Steven Tally, STEM senior strategist in Purdue’s office of marketing and media. "The faculty was giving up control," he recalls, "trading ownership for what is essentially a localized cloud service."

Yet Purdue’s IT department was already planning to retire two of the campus’s supercomputers, and had the funds to replace them. IT department representatives approached the campus’s top researchers and proposed a joint purchase of additional nodes to make the incoming supercomputer even larger. McCartney was able to show the researchers that by buying in bulk, they could get access to a faster machine for a quarter of the price of buying their own. Now, says Tally, about 150 researchers and groups on campus have bought into the program, and not one has changed their mind and decided to pull out.

The experience at Rice University was similar, according to Kim Andrews, Rice’s assistant director for research computing support and data services. Standard practice had been for a professor to get a grant, buy a computing cluster and put it in a closet, and designate a graduate student run it. "I was one of those grad students," he recalls. At one point, there were 14 such clusters scattered around the Rice campus, in addition to the general computer center.

Then in 2006, a professor got a research grant and asked for more computing time and dedicated access. That led to the university’s first condo, "but it was a one-off, and a way for me to leverage scarce resources," says Andrews. The university started writing grants to purchase large systems that they could grow with researchers’ needs. The result, the Shared University Grid at Rice (SUG@R), was installed in the university's Primary Data Center in 2008. The university now offers both hotel and condo computer access in what it has come to call the "SUG@R Resort."

In the beginning, though, condo computing was a hard sell at Rice, too. "Professors took pride in owning their own clusters," says Andrews, "and didn’t trust IT." So Andrews identified what he calls the "top five big men on campus"—the most influential researchers—and made sure the cluster let them do what they needed to; that worked. "What finally broke down barriers was building trust—our track record, and the word of mouth that we were successful," he says.

Condo computing may seem like an obvious idea, but it required both overcoming on-campus political issues and having the necessary technical infrastructure in place. "It was an idea that was bouncing around," says Tally. "With the rise of cluster computers rather than big iron, it gave the flexibility to actually do it."

The approach has inherently limited application; only a small percentage of researchers require enough computing power to make purchasing a condo reasonable. "Someone who needs less than 36,000 CPU-hours per year can get it from the computer commons," says Andrews. "Eighty-five percent of my users don’t need condo computing." Among those who do are researchers who need very long-term calculations: "in quantum dynamics, simulations might run for eight years," he says. Others need on-demand access: "geneticists need quick turnaround because they’re trying to develop a cure." Some benefit from massively parallel computing: "Genomics involves tasks that could be individually done on one computer, but they’re doing them a thousand times." Another scenario is that a professor will make an "exception request" for special treatment in the common area that Andrews can’t accommodate—a peculiar operating system, say, or an extravagant memory configuration, such as 256 GB of RAM—and will be steered into a condo purchase instead.

Also, the model works in a heterogeneous environment with multiple sources of funding, which is why it has taken hold at universities and research centers like the one in San Diego. "It’s hard to imagine such a system in a corporation," says Tally, where one would be faced with centralized purchasing and a monolithic budget. Individual departments in a corporation, for the most part, don’t have their own IT budgets.

In San Diego, the new TSSC is built mostly on the condo model. Researchers can purchase general computing nodes containing two eight-core Intel Xeon processors and 64 GB of memory for $3,934, plus a $939 infrastructure fee and a $495 annual operations fee; they can also purchase an Nvidia GPU node for $6,310 with the same infrastructure and operations fees. Users can take their hardware any time they want, but they can’t reinstall it later.

The TSSC also has hotel nodes that condo owners, as well as outside researchers, can rent by the hour. Condo purchasers receive a time allocation on the cluster proportional to the number of nodes they purchase, but their jobs may run on any combination of their own nodes, other users’ nodes, and the hotel nodes.

Other research centers providing condo computing include the Lawrence Berkeley National Laboratory, but the practice still isn’t widespread. "We’ve always been surprised that we seem to be alone in doing this in a big way," says Tally.

Logan Kugler is a freelance technology writer based in Silicon Valley. He has written for over 60 major publications.


 

No entries found