25
Aug
By Mayank Bawa in Blogroll, Database, MapReduce on August 25, 2008
   

I am very pleased to announce today that Aster nCluster now brings together the expressive power of a MapReduce framework with the strengths of a Relational Database!

Jeff Dean and Sanjay Ghemawat at Google had invented the MapReduce framework in 2004 for processing large volumes of unstructured data on clusters of commodity nodes. Jeff and Sanjay’s goal was to provide a trivially parallelizable framework so that even novice developers (a.k.a interns) could write programs in a variety of languages (Java/C/C++/Perl/Python) to analyze data independent of scale. And, they have certainly succeeded.

Once implemented, the same MapReduce framework has been used successfully within Google (and outside, via Yahoo! sponsored Apache’s Hadoop) to analyze structured data as well.

In mapping our product trajectory, we realized early on that the intersection of MapReduce and Relational Databases for structured data analysis has a powerful consonance. Let me explain.

Relational Databases present SQL as an interface to manipulate data using a declarative interface rooted in Relational Algebra. Users can express their intent via set manipulations and the database runs off to magically optimize and efficiently execute the SQL request.

Such an abstraction is sunny and bright in the academic world of databases. However, any real-world practitioner of databases knows the limits of SQL and those of its Relational Database implementations: (a) a lack of expressive power in SQL (consider doing a Sessionization query in SQL!), and (b) a cost-based optimizer that often has a mind of its own refusing to perform the right operations.

Making an elephant dance!A final limitation of SQL is completely non-technical: most developers struggle with the nuances of making a database dance well to their directions. Indeed, a SQL maestro is required to perform interesting queries for data transformations (during ETL processing or Extract-Load-Transform processing) or data mining (during analytics).

These problems become worse at scale, where even minor weaknesses result in longer run-times. Most developers (the collective us), on the other hand, are much more familiar with programming in Java/C/C++/Perl/Python than in SQL.

MapReduce presents a simple interface for manipulating data: a map and a reduce function written in the language of choice (Java/C/C++/Perl/Python) of a developer. Its real power lies in the Expressivity it brings: it makes the phrasing of really interesting transformations and analytics breathtakingly easy. The fact that MapReduce, in its use of Map and Reduce functions is a “specific implementation of well known techniques developed nearly 25 years ago” is its beauty: every programmer understands it and knows how to leverage it.

As a computer scientist, I am thrilled at the simple elegant interface that we’ve enabled with SQL/MR. If our early beta trials with customers are any indication, databases have just taken a major step forward!

You can program a database too!You can now write against the database in a language of your choice and invoke these functions from within SQL to answer critical business questions. Data analysts will feel liberated to have simple powerful tools to compete effectively on analytics. More importantly, analysts now have simplicity, working within the environs of simple SQL that we all love.

The Aster nCluster will orchestrate resources transparently to ensure that tasks make progress and do not interfere with other concurrent queries and loads in the database.

Aster: Do More!We proudly present our SQL/MapReduce framework in Aster nCluster as the most powerful analytical database. Seamlessly integrating MapReduce with ANSI SQL provides a quantum leap that will empower analysts and ultimately unleash the power of data for the masses.

That is our prediction. And we are working to make it happen!


Comments:
Database vendors add Google’s MapReduce | about ICT on August 26th, 2008 at 12:38 pm #

[...] with the nuances of making a database dance well to their directions," he wrote in a blog post. "Indeed, a SQL maestro is required to perform interesting queries for data transformations [...]

Krish Krishnan on August 26th, 2008 at 1:35 pm #

Mayank and Aster team, this is a wonderful step forward to a future world. Keep the prediction working. We are curious about any benchmarks that you have in terms of SQL coding in this environment versus a traditional environment.

[...] struggle with the nuances of making a database dance well to their directions,” he wrote in a blog post. “Indeed, a SQL maestro is required to perform interesting queries for data transformations (during [...]

News » Database vendors add Google’s MapReduce on August 26th, 2008 at 6:10 pm #

[...] with the nuances of making a database dance well to their directions," he wrote in a blog post. "Indeed, a SQL maestro is required to perform interesting queries for data transformations [...]

Dana Gardner’s BriefingsDirect mobile edition on August 27th, 2008 at 9:18 am #

[...] will be part of its Greenplum Database beginning in September. Aster Data, Redwood Shores, Calif., says that MapReduce will be included in its Aster nCluster. [Disclosure: Greenplum is a sponsor of BriefingsDirect podcasts.] Curt Monash, president of Monash [...]

[...] with the nuances of making a database dance well to their directions," he wrote in a blog post. "Indeed, a SQL maestro is required to perform interesting queries for data transformations [...]

[...] Most data is organized in the database. At the same time, there is some data that needs to be analyzed that cannot be organized. Social behavioral data is a mix of structured and unstructured data. To get better value out of this data, you need a tool like MapReduce. MapReduce gives you the power to analyze your unstructured data, and best-of-breed database technology lets you analyze your typical structured data. The combination of these is the best platform for the next generation of analytics, which is what Aster nCluster provides us with In-Database MapReduce [...]

Vowhoarry on December 19th, 2008 at 9:01 pm #

Excellent web pages Successes and prosperity to you!

[...] Aster announced In-Database MapReduce last summer, we saw tremendous interest and intrigue. Today, Amazon announced that it is helping promote the [...]

[...] year ago we introduced SQL/MapReduce for the Aster nCluster database, which integrates MapReduce and SQL to enable deep analytics within [...]

[...] will be part of its Greenplum Database beginning in September. Aster Data, Redwood Shores, Calif., says that MapReduce will be included in its Aster nCluster. [Disclosure: Greenplum is a sponsor of BriefingsDirect [...]

jacqueline friedberb on April 12th, 2013 at 3:05 pm #

Having read this I believed it was rather enlightening. I appreciate you finding the time and effort to put this information together. I once again find myself spending a lot of time both reading and posting comments. But so what, it was still worth it!

Post a comment

Name: 
Email: 
URL: 
Comments: