|
Blue Gene is a computer architecture project designed to produce several next-generation supercomputers, designed to reach operating speeds in the petaflops range, and currently reaching speeds over 280 teraflops (sustained). It is a cooperative project between the United States Department of Energy (which is partially funding the project), industry (IBM in particular), and academia. There are five Blue Gene projects in development, among them Blue Gene/L, Blue Gene/C, and Blue Gene/P.
Blue Gene/L
The first computer in the Blue Gene series, Blue Gene/L, developed through a partnership with Lawrence Livermore National Laboratory (LLNL), has a theoretical peak performance of 360 TFLOPS, and scores over 280 TFLOPS sustained on the Linpack benchmark.
History:
On September 29, 2004, IBM announced that a Blue Gene/L prototype at IBM Rochester (Minnesota) had overtaken NEC's Earth Simulator as the fastest computer in the world, with a speed of 36.01 TFLOPS on the Linpack benchmark, beating Earth Simulator's 35.86 TFLOPS. This was achieved with an 8-cabinet system, with each cabinet holding 1,024 compute nodes. Upon doubling this configuration, the machine reached a speed of 70.72 TFLOPS by November.
On March 24, 2005, the US Department of Energy announced that the Blue Gene/L installation at LLNL broke its current world speed record, reaching 135.5 TFLOPS. This feat was possible because of doubling the number of cabinets to 32.
On the June 2005 Top500 list, Blue Gene/L installations across several sites world-wide took 5 out of the 10 top positions, and 16 out of the top 64.
On October 27, 2005, LLNL and IBM announced that Blue Gene/L had once again broken its current world speed record, reaching 280.6 TFLOPS, upon reaching its final configuration of 65,536 "Compute Nodes" (i.e., 216 nodes) and an additional 1024 "IO nodes" in 64 air-cooled cabinets.
BlueGene/L is also the first supercomputer ever to run over 100 TFLOPS sustained on a real world application, namely a three-dimensional molecular dynamics code (ddcMD), simulating solidification (nucleation and growth processes) of molten metal under high pressure and temperature conditions. This won the 2005 Gordon Bell Prize.
Architecture:
Each Compute or IO node is a single ASIC with associated DRAM memory chips. The ASIC integrates two 700 Mhz PowerPC 440 embedded processors, each with a double-pipeline-double-precision Floating Point Unit (FPU), a cache sub-system with built-in DRAM controller and the logic to support multiple communication sub-systems. The dual FPUs give each BlueGene/L node a theoretical peak performance of 5.6 GFLOPS. Node CPUs are not cache coherent with one another.
By integration of all essential sub-systems on a single chip, each Compute or IO node dissipates low power (about 17 Watt, including DRAMs). This allows very aggressive packaging of up to 1024 Compute nodes plus additional IO nodes in a standard 19" cabinet, within reasonable limits of electrical power supply and air cooling. The performance metrics in terms of FLOPS per Watt, FLOPS per m2 of floorspace and FLOPS per unit cost allow scaling up to very high performance.
Each Blue Gene/L node is attached to three parallel communications networks: a 3D toroidal network for peer-to-peer communication between compute nodes, a collective network for collective communication, and a global interrupt network for fast barriers. The I/O nodes, which run the Linux operating system, provide communication with the world via an Ethernet network. Finally, a separate and private Ethernet network provides access to any node for configuration, booting and diagnostics.
Blue Gene/L Compute nodes use a minimal operating system supporting a single user program. To allow multiple programs to run concurrently, a Blue Gene/L system can be partitioned into electronically isolated sets of nodes. The number of nodes in a partition must be a positive integer power of 2, and must contain at least 25=32 nodes. The maximum partition is all nodes in the computer. To run a program on Blue Gene/L, a partition of the computer must first be reserved. The program is then run on all the nodes within the partition, and no other program may access nodes within the partition while it is in use. Upon completion, the partition nodes are released for future programs to use.
With so many nodes, components will be failing. The system is able to electrically isolate faulty hardware to allow the machine to continue to run.
Blue Gene/C
The purpose of the Blue Gene/C (also known as the Cyclops64 (C64) architecture) is to design a cellular architecture-based supercomputer. Each "cell" consists of several dozen (approximately 75) custom designed 64-bit processors. Each processor will have two thread units, two integer units, and a floating point unit. The architecture was conceived by Cray award winner Monty Denneau, who is currently leading the project. Verification testing and system software development is being done at the University of Delaware.
Design and fabrication are expected to be completed in 2005.
There are no plans to release any final performance results.
Blue Gene/P
IBM currently plans to finish Blue Gene/P in 2006. Blue Gene/P is expected to be the first supercomputer to break 1 petaflops, or 1 quadrillion floating-point operations per second.
Blue Gene/Q
The last known supercomputer in the Blue Gene series, Blue Gene/Q is expected to reach 3 petaflops. |