26
Jan
By Tasso Argyros in Analytic platform, Analytics, Database, MapReduce on January 26, 2011
   

When we kicked off Aster Data back in 2005, we envisioned building a product that would advance the state of the art in data management in two areas; (1) size and diversity of data and (2) depth of insight/analytics. My co-founders and I quickly realized that building just another database wouldn’t cut it. With yet-another-database, even if we enabled companies to more cost-effectively manage large data sizes, it was not going to be enough given the explosion in diverse data types and the massive need to process all of it. So we set out to build a new platform that would solve these challenges – what’s now commonly known as the ‘Big Data’ challenge.

Fast forward to 2008 when Aster Data led the way in putting massive parallel processing inside a MPP database, using MapReduce, to advance how you process massive amounts of diverse data. While this was fully aligned with our vision for managing hoards of diverse data and allowing deep data processing in a single platform, most thought it was intriguing but couldn’t quite see the light in terms of where the future was going. At one point, we thought of naming our product XAP – “extreme analytic platform” or “extreme analytic processing” as that’s what it was designed to do from day one. However, we thought better of it since we thought we would have to educate people too much on what an “analytic platform” was and how it was different from a traditional DBMS for data warehousing. Since we also were serving the data architects in organizations as well as the front-line business that demands better, faster analytics, we needed to use terminology that resonated with both.

Then, in the fall of 2009, with our flagship product Aster Data nCluster 4.0, we made further strides in running advanced analytics inside the database by including all the built-in application services (e.g. like dynamic WLM, backup, monitoring, etc) to go with it. At that time, we referred to it as a Data-Application Server – which our customers quickly started calling a Data-Analytics Server.  I remember when analyst Jim Kobielus at Forrester said,

“It’s really innovative and I don’t use those terms lightly. Moving application logic into the data warehousing environment is ‘a logical next step’.”

And others saying,

“The platform takes a different approach from traditional data warehouses, DBMS and data analytics solutions by housing data and applications together in one system, fully parallelizing both. This eradicates the need for movements of massive amounts of data and the problems with latency and restricted access that creates.”

What they started to fully appreciate and realize is that big data is not just about storing hoards of data, but rather, cracking the code on how to process all of it in deep ways, at blazing fast speeds.

By 2010, everyone whose roots were in cost-effective data warehousing suddenly started claiming to be a platform for deep analytics. How ironic — since none of the other vendors had even built any fundamental architecture that lent itself to being a true analytics machine.  Aster Data did so – from the very beginning – with a deep in-database analytics engine; SQL-MapReduce analytics processing; data-application services so you could really run all procedural code in-database and support it with all the key application services; out-of-the-box analytics modules (that now number over 1000); and a visual development environment for point-and-click development of apps and push down of apps into our platform.

By late 2010, the term “analytic platform” started to take shape. The definition of it fit exactly with what Aster Data has built. And now, traditional DW appliances are claiming to be analytic platforms. Even Netezza is taking the same box they had before and calling it “An Appliance for Deep Business Analytics,” and pure columnar MPP DBMS’s like Vertica and ParAccel overnight went from being ‘the world’s fastest database’ to ALL claiming to be an analytic(s) platform.  This is a typical marketing trajectory if you now see where the future lies in big data management.  The market as a whole is gravitating to accept that if you truly want to manage big, diverse data, you ultimately want to analyze all of it, and for that you’re really in need of a big data analytic platform – not just a big data store. Recently, Curt Monash supports a similar notion when describing choices in analytic computing system design.

I predict this year will be the year where the analytic platform – which we at Aster Data started to talk about and deliver in 2008 – will now be a distinct and unique category: distinct from an enterprise data warehouse (EDW); distinct from traditional DBMSs; distinct from even some pure MPP DBMSs; and distinct from even Hadoop.

An analytic platform, put simply, must have the following:

1. Native in-database processing engine for application embedding that provides the capability to run applications inside an MPP database with high performance and high reliability and provides necessary services so that applications can process the right data at the right time.  This is not UDFs, which due to architectural limitations have always been a small “niche” in the RDBMS world, or Stored Procedures which can’t match the performance and flexibility needed to push applications inside in a large scale MPP system.

2. Native support for MapReduce.  Just like the world needs a language for basic data management (SQL), it also needs a framework for writing and deploying applications inside the data analytics platform.  MapReduce is by far the most prominent and promising interface for doing exactly that. Hadoop’s open source popularity will only propel MapReduce’s significance and already MapReduce has become the de-facto standard for writing large, data intensive parallel applications inside the database.  “Native” means that MapReduce is not built on top of UDFs or Stored Procedures, nor is it a side-implementation on the DBMS, nor is it a simple Hadoop connector as every other DBMS vendor has done.  All these approaches make MapReduce totally unusable for big data applications & analytics.

3. Tight integration with SQL. A great Analytics Platform for diverse data management, built for the Enterprise needs to respect SQL or otherwise risk running into objections in the Enterprise world where the majority of the skill set is around SQL. As of now, Aster Data’s patent-pending SQL-MapReduce is the only technology that manages to tightly blend two different worlds – SQL and MapReduce – together.  Integration means that SQL analysts can seamlessly use MR applications inside the database, and MR data scientists can use the flexibility and power of SQL to complement their MR applications.

4. Enterprise feature support for in-database applications – High Availability; Backup/Restore; Access and Security; Resource Partitioning and Prioritization for concurrent operations (WLM). Several database systems offer these capabilities when it comes to SQL queries; but none offers all the same when it comes to in-database applications.  Having the same support for both interfaces is critical because otherwise in-database apps are naturally forced into a second-citizen status making DBAs reluctant to let users run in-db apps and Enterprise security and compliance groups reserved about risks of going beyond SQL. Aster Data’s platform is unique in that SQL is just another interface, just like MapReduce.  All platform capabilities have been built as services and treat each interface in the same way, resulting in 100% enterprise feature support for both interfaces.

5. Making in-database application development EASY and cost-effective.  An otherwise great Analytical Platform that’s too hard and too expensive to use will end up doing no analytics and thus does not deserve to be called an Analytical Platform.  This is why elements like the following are critical: libraries of pre-packaged SQL/MR analytic functions; 3rd party in-database application integrations; integrated, graphical, development environment for in-database apps and MapReduce; data visualization tools; unified monitoring and debugging of SQL and in-database application processes; and integrations with front-end visualization tools that can distribute the power of MapReduce to 1000s of analysts and business users.

So from what I see in organizations I talk to every week, their need for a new Analytics Platform for managing and processing big, diverse data is loud and clear. I think this is the year when it is validated and legitimized as a new and distinct category. If you agree – or disagree for that matter – leave a comment or point me to a resource that argues the counter points.  I look forward to hearing from you.


Comments:

[...] This post was mentioned on Twitter by Tasso Argyros and Giorgos Saslis, John Cieslewicz. John Cieslewicz said: RT @targyros: I'm betting that 2011 will be the year of the Analytical Platform. New blog post: http://bit.ly/hoNh0h [...]

Name on February 28th, 2011 at 10:22 am #

Can you provide an example of an Analytics operation that can only run using MapReduce, rather than be run as an SQL statement that is implicitly parallelized by the database?

Thank you.

[...] Our journey started when we realized that (a) it was hard and expensive to manage big data, and (b) it was nearly impossible to process and analyze diverse (non-relational) data types like Web clicks, social connections, and text files at scale. The two worlds of data management and data processing were separate – RDBMSs would store and manage data in their world; however, applications and tools would do analytics outside of the database. This division severely restricted the types of analytics possible on large amounts of data. We discussed this in more detail on an earlier blog post from January 26. [...]

Post a comment

Name: 
Email: 
URL: 
Comments: