I am very pleased to announce today that Aster nCluster now brings together the expressive power of a MapReduce framework with the strengths of a Relational Database!
Jeff Dean and Sanjay Ghemawat at Google had invented the MapReduce framework in 2004 for processing large volumes of unstructured data on clusters of commodity nodes. Jeff and Sanjay’s goal was to provide a trivially parallelizable framework so that even novice developers (a.k.a interns) could write programs in a variety of languages (Java/C/C++/Perl/Python) to analyze data independent of scale. And, they have certainly succeeded.
In mapping our product trajectory, we realized early on that the intersection of MapReduce and Relational Databases for structured data analysis has a powerful consonance. Let me explain.
Relational Databases present SQL as an interface to manipulate data using a declarative interface rooted in Relational Algebra. Users can express their intent via set manipulations and the database runs off to magically optimize and efficiently execute the SQL request.
Such an abstraction is sunny and bright in the academic world of databases. However, any real-world practitioner of databases knows the limits of SQL and those of its Relational Database implementations: (a) a lack of expressive power in SQL (consider doing a Sessionization query in SQL!), and (b) a cost-based optimizer that often has a mind of its own refusing to perform the right operations.
A final limitation of SQL is completely non-technical: most developers struggle with the nuances of making a database dance well to their directions. Indeed, a SQL maestro is required to perform interesting queries for data transformations (during ETL processing or Extract-Load-Transform processing) or data mining (during analytics).
These problems become worse at scale, where even minor weaknesses result in longer run-times. Most developers (the collective us), on the other hand, are much more familiar with programming in Java/C/C++/Perl/Python than in SQL.
MapReduce presents a simple interface for manipulating data: a map and a reduce function written in the language of choice (Java/C/C++/Perl/Python) of a developer. Its real power lies in the Expressivity it brings: it makes the phrasing of really interesting transformations and analytics breathtakingly easy. The fact that MapReduce, in its use of Map and Reduce functions is a “specific implementation of well known techniques developed nearly 25 years ago” is its beauty: every programmer understands it and knows how to leverage it.
As a computer scientist, I am thrilled at the simple elegant interface that we’ve enabled with SQL/MR. If our early beta trials with customers are any indication, databases have just taken a major step forward!
You can now write against the database in a language of your choice and invoke these functions from within SQL to answer critical business questions. Data analysts will feel liberated to have simple powerful tools to compete effectively on analytics. More importantly, analysts now have simplicity, working within the environs of simple SQL that we all love.
The Aster nCluster will orchestrate resources transparently to ensure that tasks make progress and do not interfere with other concurrent queries and loads in the database.
We proudly present our SQL/MapReduce framework in Aster nCluster as the most powerful analytical database. Seamlessly integrating MapReduce with ANSI SQL provides a quantum leap that will empower analysts and ultimately unleash the power of data for the masses.
That is our prediction. And we are working to make it happen!