Archive for July 2nd, 2010

02
Jul
By Tasso Argyros in Analytics, Blogroll on July 2, 2010
   

There is a lot of talk these days about relational vs. non-relational data. But what about analytics? Does it make sense to talk about relational and non-relational analytics?

I think it does. Historically, a lot of data analysis in the enterprise has been done with pure SQL. SQL-based analysis is a type of “relational analysis,”? which I define as analysis done via a set-based declarative language like SQL. Note how SQL treats every table as a set of values; SQL statements are relational set operations; and any intermediate SQL results, even within the same query, need to follow the relational model. All these are characteristics of a relational analysis language. Although recent SQL standards define the language to be Turing Complete, meaning you can implement any algorithm in SQL, in practice implementing any computation that departs from the simple model of sets, joins, groupings, and orderings is severely sub-optimal, in terms of performance or complexity.

On the other hand, an interface like MapReduce is clearly non-relational in terms of its algorithmic and computational capabilities. You have the full flexibility of a procedural programming language, like C or Java; MapReduce intermediate results can follow any form; and the logic of a MapReduce analytical application can implement almost arbitrary formations of code flow and data structures. In addition, any MapReduce computation can be automatically extended to a shared-nothing parallel system which implies ability to crunch big amounts of data. So MapReduce is one version of “non-relational”? analysis.

So Aster Data’s SQL-MapReduce becomes really interesting if you see it as a way of doing non-relational analytics on top of relational data. In Aster Data’s platform, you can store your data in a purely relational form. By doing that, you can use popular RDBMS mechanisms to achieve things like adherence to a data model, security, compliance, integration with ETL or BI tools etc. The similarities, however, stop there. Because you can then use SQL-MapReduce to do analytics that were never possible before in a relational RDBMS, because they are MapReduce-based and non-relational and they extend to TBs or PBs. And that includes a large number of analytical applications like fraud detection, network analysis, graph algorithms, data mining, etc.