logo

EnterTheGrid - Primeur Live!

EnterTheGrid - Primeur is the premier Grid and Supercomputing information source in the world.

>PrimeurMagazine
>PrimeurLive!
>EnterTheGrid
>Analysis
>Backissues
>Calendar
>Subscribe
>Advertise
>Contact
News digest 25 June 2009
>Blog
>Building your own Cloud data centre? For one billion Euro you have one!
>TOP500
>IBM achieves record 10th straight perfromance number one showing on TOP500 Supercomputer List
>The 'best-of-the-best' supercomputers leverage AMD Opteron processors
>Applications
>Contracts without lawyers?
>The Grid
>P&T Luxembourg offers Cloud single sign-on with Bull Evidian
>AVL employs PBS GridWorks for powertrain engineering and development
>Altair launches analytics and desktop-user portal with new release of PBS GridWorks
>Platform Computing announces private Cloud management software
>Company news
>T-Platforms to present comprehensive solutions at ISC'09 Exhibitor Forum Session
>T-Platforms Presents Seismic Data Processing Solution
>T-Platforms to be the first Russian company to participate in the International Supercomputer Conference
>Mellanox InfiniBand provides the best return on investment by delivering the highest system efficiency and utilization in the TOP500 High-Performance Computing Systems
>TotalView Technologies opens beta for MemoryScape 3.0, TotalView 8.7, and ReplayEngine 1.5
>Clemson University's Computational Center for Mobility Systems deploys Panasas storage to advance research for automotive, aviation, and energy industries
>Sun announces major update to Sun Studio compilers and tools software
>Supercomputing Center sees 46 percent performance gain with Cray CX1 supercomputer running Windows HPC Server 2008
>Cray adds additional storage options for the Cray CX1 deskside supercomputer
>Japan's Railway Technical Research Institute puts two Cray supercomputers into production
>Mellanox adapters and switch solutions power Europe's largest 40Gb/s network demonstration at ISC'09
>Mellanox announces 648-port modular 40Gb/s InfiniBand switch with comprehensive enterprise-class fabric management suite for data centres and high-performance systems
>Mellanox 40Gb/s InfiniBand technology maximizes I/O performance and reduces power consumption in HP Extreme Scale-Out solution
Building your own Cloud data centre? For one billion Euro you have one!
Hamburg 24 June 2009

Clouds and HPC do they fit together? For the real high-end supercomputing not so well yet, was the conclusion of the "Cloud Computing and HPC - Synergy or Competition?" session at ISC09. For industrial thoughput computing in HPC it could be useful already. Especially for bursty applications like chip design.

But what does it take to set up a Cloud data centre? As lectures from Amazon, Google, and Microsoft explained it takes a lot in infrastructure, computers, middleware and management. And you need deep pockets: to be exploitable, you need to build very large data centres. A typical Cloud data centre with one million processor cores typically costs about a million Euro or dollar.

Even if you build them low cost. Cloud data centres consist of containers, yes the same type you see on boats - that house the servers. No fancy glass buildings or former churches that are in use as HPC centres today.

Some major general computing vendors and the large Cloud providers, presented their views and products on Clouds. Some even talked a bit about HPC and Clouds.
Advertisement
Visit our sponsors
Advertisement

Richard Kaufmann CTO Scalable Computing Infrastructure Organization, Hewlett-Packard, explained the types of servers that HP is developing for use in the Cloud. Although there is nothing really anything cloudy about general purpose servers, but Kaufmann explained what the characteristics of these Cloud servers are and the differences with HPC servers. Cloud servers do not need very reliable hardware: fault tolerance is built into the software. They need to be cheap, so you can pack a large number of them in a data centre, and they need to use as less power as possible. Kaufmann believes that servers designed for these mega-scale data centres need a design a bit different from regular servers.

HPC systems on the other hand, Kaufmann explained, need to have very reliable hardware. In addition they need a massive, fast intercconect. For Cloud servers you can do with medium speed interconnects.

In building large scale data centres, the one million computer core holding ones, you need to have efficient cooling, and of course sufficient power. Multicore helps in cooling as do efficient power supplies. Today's power supplies run at about 50% load and not at 90% as they used to do. For a server producing company like HP, it is not easy to find additional power reduction possibilities. If you put your data centres in containers, you better not put this centre in a very hot area. If you look for a cooler climate, you can even do without additional cooling.

How do HPC guys adapt to the Cloud? A bit joking, Kaufmann said he noted that all HPC Grid people scratched "Grid" off their business card and replaced it with "Cloud".

But, he concluded, in fact, Cloud computing is just sliced bread.

Would servers look different if Cloud was not invented? Yes, because these servers are really designed for very, very large scale centres, where economy of scale counts.

Next speaker was Sun Microsystem's Marc Hamilton. He started with explaining HPC, especially the high-end, does not need virtualisation. In HPC you want to use as much processing power as possible to run jobs. And you have enough of them in an HPC system.

An issue in the Cloud is that you are running on the same server as other people. So you have to trust the Google or Amazon security mechanisms. In general that is OK, but some application areas do not fit that model. For instance in the banking business, you cannot use it, because of auditability requirements.

Two important characteristics of Clouds are real time user-controlled provisioning and pay-per-use. Real time user-controlled provisioning has to be fully automated. Otherwise you need hundreds of operators to run a centre. Pay-per-use is the other corner stone: you only pay for the system part and for the time you use. An advantage is that in this way, as a user, you also have access to the latest technology. If you buy a server of your own, you probably are still using it after three years or so. But by that time, the technology is already "old". In the Cloud systems are fully utilized so they can be upgraded faster.

In Cloud computing he sees three different layers;

  • Software as a service (Salesforce.com)
  • Platform as a servicoe (google app engine, Microsoft Azure Platform)
  • Infrastructure as a Service (Basic storage and compute capabilities, Amazon web servcie, Micorsoft Infrastructure services, Mosso.)

He also seess three business models for Clouds: Public, Private, and Hybrid Clouds.

According to Marc Hamilton, Clouds can be used in different application domains. From HPC, medical, to finance and Web. So basically it is just business as usual, not really different from SUN's server bussines to date that was useful for all purposes.

A question to Hamiltonm was why SunGrid does not work? According to Hamilton, SunGrid was not easy to use for most csutomers: you had to login with SSH, learn SUN Grid Engine. The pricing of 1 USD per hour was too high, although it sounded like good markteing at the time.

A complete different approach was taken by Thomas Lippert from Juelich. He started, of course, from HPC and tried to find the place where Clouds could play a role. The vision in Germany is that there are several tiers of supercomputer centres that form a pyramid. The Tier-0 consists of a few very powerful machines of national importance. In Tier-2 there are a dozen or so machines for regional or topical services. At Tier-3 are hundreds of local servcies and the Grid. In Germany it are the Gauss Centres that provide the high-level tiers and the Gauss Aliance provides access to the lower tiers. In Europe it is PRACE that is planning to put together the tier layered structure of HPC centres.

Lippert sees that the needs of HPC users are addressed in the effective use of the tier-0 to tier-2 computers. The HPC users work together in a kind of "simulation labs" with a lot of experts from several disciplines. This includes experts in operating the HPC systems, experts in simulation and experts in a specific scioentific discipline.

An example Lippert mentioined is the SoftComp project. This Juelich "Cloud computer" is a Linux cluster. Access is provided by Unicore. Unicore is a popular open source Grid software.

The SoftComp machine is mainly used in serial mode and not so much in parallel mode as one would expect of HPC users. To run an application in parallel mode always involves these many people from different disciplines. If Cloud providers want to serve users in the top-tiers, beyond tier-3 they must offer leading edge tier 0-3 performance and guarantee absolute security and privacy. They must actively offer the highest level support and research for the science community and industry suporting these mulit-disciplinary teams.

What is the difference between Grids and Clouds? Lippert does not see so much difference. The Grid has now been defined by how it has been used during the past years. For the Clouds it is still unclear what it is exactly.

Dan Reed from Microsoft also started with the tiered technical computing. In his picture, at the lowest level are the millions of mobile devices. On top are the petascale and exascale systems.

For the high-performance systems, the msot important determining feature is the interconnect in the machine. The costs of operating computer systems is more and more shifting away from hardware. Bulk computing is almost free today if you can pay the energy bill. But moving large amounts of data around is still hard to do. And people involved in managing and operating systems are expensive.

Reed sees that again, it will be consumer commodities that will drive HPC system development, much like Commercial of The Shelf Systems (COTS) systems did in the past. Today's economics are based on many-core/accelerator processors and software as a service/Cloud computing. He sees a convergence happening between HPC, Clouds and manycore developments.

Reed sees a number of Cloud Application Frameworks emerging. Basically there are three levels: Infrastructure as a service, applications as a service; and software as a service.

Microsoft's framework is based on their Azure services platform. Azure provides computing, storage and management. It also supports virtualization, although Reed thinks that is not an essential feature of Clouds. The "magic" of Azure is the Azure Fabric Controller that takes care of load balancing between the nodes. Nodes can be physical or virtual nodes.

With all this computing Reed also sees a data explosion going on. There are cases were routinely many petabytes of data will be searched. As an example Microsoft is building "Metagenomics on Azure" together with Argonne Lab.

Microsoft is also developing Cloud solutions for science: a toolkit that will be released soon and is called DryadLINQ.

What does it take in hardware infrastructure to support a Cloud platform? Reed explains you need very large data centres. Each data centre is about 10 times the size of a football field and costs about a billion dollar to build. Such a data centre can easly house a million compute cores.

Large scale data centres are requiring looking at some specific technical issues.

Cooling technologies, for instance. You have to determine the optimal operating point of the hardware. You also need to look into locality-aware algorithms. The speed of light is not that fast that you do not see a delay between a signal going around the data centre. Also new fault tolerant algorithms have to be looked into. The classical checkpoint-restart needs revison: it does not scale.

Google takes a different perspective Robin Williams from this company explained. He sees three levels in Clouds: At the lowest level is the Cloud provider. It delivers services to the next level: the SaaS (software as a service) providers that are themselves Cloud users. At the highest level are the SaaS users. Google is active in the two lower layers.

According to Williams, analyzing,transforming, querying data all can be done best centrally (adjacent to the data). So that is why he said we should centralize the data, and centralize the computing near to it. Google has a large query support infrastructure for its search engine, of course. It is a very complex technology stack and innovation is needed at all stack layers to keep it efficient. At the lowest level, the Platform layer, single-threaded performance matters less. Moore's law manifests as more cores ar used.

Managing the data centres themselves is also a core competency of Google. You must be able to turn on/turn-of full racks very fast as demand rises or lowers. And this has to be done automatically.

To use the Cloud infrastructure you need high level programming abstraction that hides the infrastructure complexity and infrastructure faults.

The Google Cloud systems infrastructure contains:

  • a Google file system: a fault tolerant distributed disk storage;
  • Big table: a large storage system for semi-structured data. A database like model, but stored on thousands of machines
  • MApReduce; A programming model to simplify large scale computations on clusters.

On top of the stack are the software services. This includes the usual Google products and some special services for software developers in the Google apps engine. The Google Apps Engine is easy to start, easy to scale. It can be programmed used Python or Java code and allows to develop locally and deploy to the Cloud seamlessly.

Google Apps is built using standard HTTP requests/responses. The programmes get a high level programming model and do not have to worry about the "raw iron". Alos advanced features for developing secure applications, such as API support for login and identity management is available.

Although it is wideley used, it is still early technology. Hence it does not yet provide SLAs. For small amounts of usage, Google apps is free. For larger amounts there is a pay per use model. Paying users have an admin-interface.

Amazon was one of the first Cloud providers. In Hamburg Simone Brunozzi brought the attendees up to date with the latest Cloud developments at Amazon.

Currently there are more than 540,000 AWS (Amazon Web Services) users. Amazon stores about 52 billion objects and the bandwidth used by AWS is much larger than that used by the web store.

To support programming, Amazon has introduced Elastic MapReduce.

Traditionally Amazon delivers Virtual Machines as a basic computing unit. Recentely they announced additional features and scalability tools. These are Amazon specific.

Related to HPC, the Amazon Elastic MapReduce and the Hosted Hadoop Framework are the most important new services. MapReduce and Hadoop are offered by other Cloud providers too. An advantage of using Elastic MapReduce is that it can automatically create and destroy the VM instances needed for computatation. Elastic MapReduce manages a cluster for you. The steps you need are:

1. Develop a dataprocessing application

2. Upload data and application to Amazon S3

3. Start EMR "jobflow"

4. Monitor the progress

5. Get the results.

In the future Amazon will provide AWS certification (developers). There are plans to expand into Asia, have new announcements in Europe not lagging behind too much in relation to the USA, and announce more features.

Sanjay Radia from Yahoo was the last presenter in the HPC/Cloud session.

He explained the importance of Hadoop for Yahoo: their services depend on it. Hence they also actively participate in the open source development of Hadoop in Apache. Probably they contribute at elast 80% of the development he says.

Yahoo has a very strong open source technology. In addition to Hadoop this includes the PIG programming language, the HBase database and MapReduce.

Radia sees two different kinds of Clouds:

  • Horizontal (Platform) clouds services
  • Functional cloud services

So not too different from what others see.

Yahoo itself has some 500 million unique users per month and hence run a multi-data centre operation with replication of data with consistency/availablity of data being very important. The infrastructure is designed to be flexible.

The discussion that ended the session, mainly consisted of further clarifications of the vendors. So did Google stress the fact that they are also strongly contributing to open source software. Reed said that the MPI parallel programming model is not very suited for the Cloud. MapoReduce provides a better fit with its functional programming origins.

Lippert noted that the HPC community will not adapt to the Cloud paradigm: that will not happen. The community has specific computation needs and data needs that need to be addressed.

Kaufmaan is not too sure about that. It could well be that industrial HPC will drive the HPC Cloud, not the research HPC.

Advertisement
Advertisement
Ad Emmen

EnterTheGrid - Primeur Magazine

James Stewartstraat 248

1325 JN Almere

The Netherlands

http://EnterTheGrid.com

mailto:primeur.editor@hoise.com

© EnterTheGrid - Primeur Live!