CORAL: the next big thing in supercomputing; next-gen machines for Oak Ridge, Argonne, Livermore

Three of the nation’s premier national labs — Oak Ridge, Argonne and Lawrence Livermore — are banding together to push the boundaries of supercomputing and acquire next-generation supercomputers for each of the labs at the best value for the U.S. government.

It’s a big deal. Each of the computers is estimated to cost in the vicinity of $125 million and have capabilities of 100-200 petaflops (that’s 100-200 million billion calculations per second).

The request for proposals was released last week, and the due date for proposals is Feb. 18.

I talked this afternoon with Buddy Bland, who directs the Leadership Computing Facility at Oak Ridge National Laboratory. He is the project director for the Oak Ridge role in CORAL (which, by the way, stands for Collaboration of Oak Ridge, Argonne and Livermore). A total of about 100 experts from the three labs will be participating in the reviews, etc., in the coming months.

The impetus for the three-way collaboration was that each of the labs will be in need of a replacement supercomputer at about the same time, with ORNL looking for a successor to Titan, Argonne look for its follow-on to Mira, and Livermore replacing its successful Sequoia machine. In addition to the collaboration among the labs, the project also requires the coordination of DOE’s Office of Science and the National Nuclear Security Administration.

According to Bland, the search for the next, best supercomputers will be a full and open competition, with a goal of two different types of architectures for the future machines as the Department of Energy leads the U.S. on the path to exascale machines 1,000 times more powerful than today’s best. It’s anticipated that different vendors will come up with the different architectures desired for the CORAL program, but it’s not inconceivable that the same company could present two different designs of choice, Bland said.

“We’re going to see what the marketplace will bring,” he said.

Even though ORNL has a longstanding relationship with Cray, which built the Jaguar and Titan systems as well as others housed at the Oak Ridge supercomputer center, Bland said that doesn’t mean the next system in Oak Ridge will be a Cray.

“The previous relationship or the ongoing relationship that we have with Cray . . . is something that has worked out very well,” Bland said, noting that Argonne and Livermore have had similarly good relationships with IBM. It wouldn’t surprise him at all to see changes in the vendor affiliations over the next few years, he said.

Although the nominal cost of each of the next-generation supercomputers is $125 million, the funding for the program is subject to congressional appropriations. As such, the costs could go up if the timetable gets delayed.

Initially, there will be two NRE (non-recurring engineering) subcontract awarded, valued at $25 million apiece, to showcase the research and development of new computer systems, “in order to accelerate technology, improve capabilities, improve application performance, and lower the total cost of ownership of the delivered systems.”

Those subcontracts will be issued by Lawrence Livermore, but the three laboratories will jointly negotiate them and evaluate their merits and be overseen by technical representatives of each of the labs.

Each of the labs will award a subcontract to build its next supercomputer, with delivery anticipated in the 2017 timeframe.

It’s apparently already decided that ORNL and Argonne will choose machines with different architectures, and then Livermore will choose one or the other architecture for its next-generation supercomputer. It’s not clear, however, how the order of choice is being decided.

Both the NRE and build subcontracts are expected to be awarded this calendar year, Bland said.

The new Oak Ridge supercomputer will not require a new building. Nor will it require displacement of the Titan machine at the ORNL supercomputing center.

Bland said the new supercomputer will be located in the bottom floor of the annex that was constructed at the back of Building 5600 a few years ago. It has never been occupied, he said.

Titan will continue to run, at least through 2017, Bland said. At that time, it will no longer be the second fastest computer in the world, as it is now, but it should still be plenty productive, Bland said.

The next computer system, which is expected to be roughly the size of Titan, will require upgrades for electrical power and chilled water at the Oak Ridge Leadership Computing Facility, the ORNL official said. But the extent of those upgrades won’t be known until more is known about the new supercomputer, he said.

