06
Oct
By Tasso Argyros in Analytics, Blogroll, Frontline data warehouse on October 6, 2008
   

Back in the days when Mayank, George and I were still students at Stanford, working hard to create Aster, we had a pretty clear vision of what we wanted to achieve: allow the world to do more analytics on more data. Aster has grown tremendously since these days, but that vision hasn’t changed. And one can see this very clearly in the new release of our software, Aster nCluster 3.0, which is all about doing more analytics with more data. Because 3.0 introduces so many and important features, we tried to categorize them in three big buckets: Always Parallel, Always On, and In-Database MapReduce.

Always Parallel has to do with the “Big Data” part of our vision. We want to build systems that can handle 10x -100x more data than any other system today. But this is too much data for any single “commodity server” (that is, a server with reasonable cost) that one can buy. So we put a lot of R&D effort into parallelizing every single function of the system -not only querying, but also loading, data export, backup, and upgrades. Plus, we allow our users to choose how much they want to parallelize all these functions, without having to scale up the whole system.

Always On also stems from the need to handle “Big Data”, but in a different way. In order for someone to store and analyze anything from a terabyte to a petabyte, she needs to use a system with more than a single server. But then availability and management can become a huge problem. What if a server fails? How do I keep going, but also how do I recover from the failure (either by introducing the same server or a new, replacement, server) with no downtime? And how can I seamlessly expand the system, in order for me to realize the great promise of horizontal scaling, without taking the system down? And, finally, how do I backup all these oceans of data without disrupting my system’s operation? All these issues are handled in our new 3.0 release.

We introduced In-Database MapReduce in a previous post so I won’t spend too much time here. But I want to point out how this fits our overall vision. Having a database which is always parallel and always on allows you to handle Big Data with high performance, low cost, and high availability. But once you have all this data, you want to do more analytics - to extract more value and insights. In-Database MapReduce is meant to do exactly that -push the limits of what insights you can extract by providing the first-ever system that tightly integrates MapReduce (a powerful analytical paradigm) with a wide-spread standard like SQL.

These are the big features in nCluster 3.0, and in the majority of our marketing materials we stop here. But I also want to talk about the other great things we have in there; things more subtle or technical to mention in the headlines, but still very important. We’ve added table compression features that offer online, multi-level compression for cost-savings. With table compression, you can choose your compression ratio and algorithm and have different tables compressed differently. This paves the way for data life-cycle management that can compress data differently depending on its age.

We’ve also implemented richer workload management to offer quality of service for fine-grained mixed workload prioritization via priority and fair-share based resource queue.  You can even allocate resource weights based on transaction number or time (useful when both big and small jobs occur).

3.0 also has Network Aggregation (NIC “bonding”) for performance and fault tolerance. This is a one-click configuration that automates network setup -usually a tedious error-prone sys admin task. And that’s not the end of it -we also are introducing an Upgrade Manager that automates upgrades from one version of nCluster to another, including what most frequently breaks upgrades: the operating system components. This is another building block of the low cost of ongoing administration that we’re so proud of achieving with nCluster. I could go on and on (new SQL enhancements, new data validation tools, heterogeneous hardware support, LDAP authentication, …), but since blog space is supposed to be limited, I’ll stop here. (Check out our new resource library if you want to dig deeper.)

Overall, I am delighted to see how our product has evolved towards the vision we laid out years back. I’m also thrilled that we’re building a solid ecosystem around Aster nCluster -we now support all the major BI platforms -and are establishing quite a network of systems integrators to help customers with implementation of their frontline data warehouses. In a knowledge-based economy full of uncertainty, opportunities, and threats, doing more analytics on more data will drive the competitiveness of successful corporations -and Aster nCluster 3.0 will help you deliver just that for your own company.

Bookmark and Share

Comments:
Aster Data has a new release | DBMS2 -- DataBase Management System Services on October 7th, 2008 at 7:48 am #

[...] is doing a great job telling their story in their own blog.  The post summarizing Release 3.0 is here. Share: These icons link to social bookmarking sites where readers can share and discover new web [...]

Post a comment
Name: 
Email: 
URL: 
Comments: