logo

EnterTheGrid - Primeur Live!

EnterTheGrid - Primeur is the premier Grid and Supercomputing information source in the world.

>PrimeurMagazine
>PrimeurLive!
>EnterTheGrid
>Analysis
>Backissues
>Calendar
>Subscribe
>Advertise
>Contact
News digest 20 June 2008
>Start
>Primeur Live! from Dresden
>Blog
>Cray's update
>Supercomputers or Grid?
>Fujitsu - we wil be back sometime in the future
>NEC to introduce Assignable Data Buffer
>The success of SGI in 2007
>TOP500
>Argonne's supercomputer named world's fastest for open science and third overall
>Hardware
>Challenges of the Petaflops era
>The Grid
>Action needed for a grown up HPC infrastructure in Europe
>We need to build our own supercomputers
>Company news
>Fujitsu supercomputers make TOP500 list
>ADVA Optical Networking, Obsidian and Voltaire prove support for Infiniband over 50km
>EMCORE announces sampling and demo of 40 gigabit per second connects cables for high-performance computing
>IBM sets the pace for TOP500
>Voltaire and DataDirect Networks deliver native InfiniBand storage to customers worldwide
>Voltaire switches to support new Engenio 7900 HPC storage system from LSI
>Sun expands Sun Constellation System Family
>LSI Storage System brings enterprise-class availability and reliability to HPC environments
>Acceleware announces GPU based cluster solution
We need to build our own supercomputers
Dresden 20 June 2008 Dr. John Salmon from D.E. Shaw Research stated that his company feels the need to build its own supercomputers. In the next couple of days it will celebrate its 20th anniversary. Founder of the company first worked in Wall Street. D.E. Research is an independent research laboratory in computational biochemistry. There are two approaches, namely laboratory experiments and biophysical simulation.
Advertisement
Visit our sponsors
Advertisement

The goal is to do strong scaling in single, millisecond-scale MD simulations. It is one long trajectory and not many short ones. This is a harder problem but often essential. Why a millisecond? This is a time scale at which many biologically interesting things start to happen.

Dr. Salmon explained the interactions between proteins and the binding of drugs to their molecular targets. The laboratory developed a drug that targets the specific cancer without damaging the healthy cells. The speaker showed an illustration of the required speed-up. It took two weeks to do the calculation at D.E. Research. Now, it can be done in ten minutes.

What will it take to simulate a millisecond? Can it be done with a machine bought off-the-shelf? The lab needs needs 10,000 ns/day. The challenges are to simply doing that much computation and to keep the computing elements busy.

The approach is to design a specialized machine, the Anton, an enormously parallel architecture. Dr. Salmon gave an example of molecular dynamics. The time has to be divided to calculate the forces, and iterate the process time and time again. Non-bonded calculations account for most of the work.

Dr. Salmon described the algorithms used for this process. Ewald methods decompose electrostatics. The typical way to parallelize is partitioning the spaces into boxes. There are two-dimensional home and neutral territory methods. The speaker showed a picture of scaling with the traditional versus the non-traditional method.

The Anton will execute the calculations. Desmond's performance on a commodity cluster shows the following results:

  • GROMACS on single processor - 1 processor per core - about 1 ns/day
  • MDGRAPE-3 - 12 ASICS - 3.3 ns/day
  • Desmond on cluster - 512 processor cores - 132 ns/day
  • Desmond on cluster - 512 processor cores - 280 ns/day

The lab has designed a single Anton ASIC with 4 ASICs per board, 512 ASICs in total. What makes Anton fast? There is a high throughput interaction subsystem with extremely high computational density for specific application-dependent operations. The communication subsystem is high-performance, highly integrated. It has a flexible subsystem. Ahmdal's Law requires high performance here too.

The bandwidth has a link of 42 Gigabits/second, a node of 250 Gigabits/second and a cross section of 5 Terabits/second. The latency is 50 ns hop time. There is unification across the network layers 2-7.

The computational density is achieved with pairwise point interaction modules: 32 per chip x 28 stages deep x 800MHz.

The process is as follows:

1. import tower particles

2. import plate particles

3. create direct product

4. select pairs

5. evaluate function

6. accumulate plate forces

7. accumulate tower forces

8. export plate forces

9. export tower forces

The flexible subsystem for general purpose computation consist of four processing slices. The Tensilica cores control the floor. There is 32 kilobytes of memory and the researchers hardly ever touch the DRAM. There is a racetrack station and a correction pipeline to undo the operation that is not right.

Dr. Salmon also showed an example of protein folding of Villin Headpiece. It folds in a millisecond in the lab. The protein does not always fold so there is more work to be done here.

Advertisement
Advertisement
Leslie Versweyveld

EnterTheGrid - Primeur Magazine

James Stewartstraat 248

1325 JN Almere

The Netherlands

http://EnterTheGrid.com

mailto:primeur.editor@hoise.com

� EnterTheGrid - Primeur Live!