27
Jan
Posted by Shawn Kung in Blogroll, Frontline data warehouse, TCO on January 27, 2009

Back in March 2005, I attended the AFCOM Data Center World Conference while working at NetApp.  It was a great opportunity to learn about enterprise data center challenges and network with some very experienced folks.  One thing that caught my attention was a recurring theme on growing power & cooling challenges in the data center.

Vendors, consultants, and end user case study sessions trumpeted dire warnings that the proliferation of powerful 1U blade servers would result in power demands outstripping supply (for example, a typical 42U rack consumed 7-10kW, while new-generation blade servers were said to exhibit peak rack heat loads of 15-25kW).  In fact, estimates were that HVAC cooling (for heat emissions) were an equally significant power consumer (ie. for every watt you burn to power the hardware, you burn another watt to cool it down).

Not coincidentally, 2005 marked the year when many server, storage, and networking vendors came out with “green” messaging.  The idea was to convey technologies that reduce power consumption and heat emissions, saving both money and the environment.  While some had credible stories (eg. VMware), more often than not the result was me-too bland positioning or sheer hype (also known as “green washing”).

Luckily, Aster doesn’t suffer from this, as the architecture was designed for cost-efficiency (both people costs and facilities costs).  Among many examples:

[1] Heterogeneous scaling: we use commodity hardware but the real innovation is making new servers work with pre-existing older ones.  This saves power & cooling costs because rather than having to create a new cluster from scratch (which requires new Queen nodes, new Loader nodes, more networking equipment, etc), you can just plug in new-generation Worker nodes and scale-out on the existing infrastructure…

[2] Multi-layer scaling: A related concept is nCluster doesn’t require the same hardware for each “role” in the data warehousing lifecycle.  This division-of-labor approach ensures cost-effective scaling and power efficiency.  For example, Loader nodes are focused on ultra-fast partitioning and loading of data - since data doesn’t persist to disk, these servers contain minimal spinning disk drives to save power.  On the opposite end, Backup nodes are focused on storing full/incremental backups for data protection - typically these nodes are “bottom-heavy” and contain lots of high-capacity SATA disks for power efficiency benefits (fewer servers, fewer disk drives, slower spinning 7.2K RPM drives).

[3] Optimized partitioning: one of our secret sauce algorithms ensures maximizing locality of joins via intelligent data placement.  As a result, less data transfers over the network, which means IT orgs can stretch their existing network assets (without having to buy more networking gear and burn power).

[4] Compression: we love to compress things.  Tables, cross-node transfers, backup & recovery, etc all leverage compression algorithms to get 4x - 12x compression ratios - this means fewer spinning disk drives to store data and lower power consumption.

…and others (too many to list in a short blog like this)

I’d love to continue the conversation with IT folks passionate about power consumption…what are your top challenges today and what trends do you see in power consumption for different applications in the data center?

Bookmark and Share

Post a comment
Name: 
Email: 
URL: 
Comments: