By Mayank Bawa in Analytics, Cloud Computing, MapReduce on April 15, 2010

In the last few years there has been a significant amount of market pickup, from users and vendors, on data clouds and advanced analytics - specifically a new class of data-driven applications run in a data cloud or on-premise. What’s different about this from past approaches is the frequency and speed at which these applications are accessed, the depth of the analysis, the number of data sources involved and the volume of data mined by these applications - terabytes to petabytes. In the midst of this cacophony of dialogue, recent announcements from vendors in this space are helping to clarify different visions and approaches to the big data challenge.

Both Aster Data and Greenplum made announcements this week that illustrated different approaches. At the same time that Aster Data announced the Aster Analytics Center, Greenplum announced an upcoming product named Chorus. I wanted to take a moment to compare and contrast what these announcements say about the direction of the two companies.

Greenplum’s approach speaks to two traditional problem areas i) access to data, from provisioning of data marts to connectivity to data across marts, and ii) some level of collaboration among certain developers and analysts. Their approach is to create a tool for provisioning, unified data access, and sharing of annotations and data among different developers and analysts. Interestingly, this is not an entirely new concept; these are well-known problems for which a number of companies and tools have already developed best-of-breed solutions over the last 15 years. For example, the capabilities for data access are another version of Export/Copy primitives that already exist in all databases and that have been built upon by common ETL and EII tools for cases in which richer support than Export & Copy are needed - for instance, when data has to be transformed, correlated or cleaned while being moved from one context (mart) to another (mart).

This approach is indicative of a product direction in which the primary focus is on adding another option to the list of tools available to customers to address these problems. It’s really not a ground-breaking innovation that evolves the world of analytics. New types of analytics, or ‘data-driven applications,’ is where the enormous opportunity lies. The Greenplum approach of data collaboration is interesting in a test environment or sandbox. When it comes to real production value however, it effectively increases the functions available to the end user, but at a big cost due to significant increases in complexity, security issues and extra administrative overhead. What does this mean exactly?

  • The spin-up of marts and moving data around can result in “data sprawl” which ultimately increases administrative overhead and is dangerous in these days of compliance and sensitivity to privacy and data leaks.
  • Adding a new toolset into the data processing stack creates difficult and painful work to either manage and administer multiple tool sets for similar purposes or to eliminate and transition away from investments in existing toolsets.
  • To enable effective communication and sharing, users need strong processes and features for source identification of data, data collection, data transformation, rule administration, error detection & correction, data governance and security. The quality and security policies around meta-data are especially important as free-form annotations can lead to propagation of errors or leaks in the absence of strong oversight.

In contrast, Aster Data’s recent announcements support our long-standing investments in our unique advanced in-database architecture where applications run fully inside Aster Data’s platform with complete application services essential for complex analytic applications. The announcements highlight that our vision is not to create a new set of tools and layers in the data stack that recreate capabilities currently available from a number of leading vendors, but rather to deliver a new Analytics Platform, a Data-Application Server, to uniquely enable analytics professionals to create data-rich applications that were impossible or impractical before - namely, to create and use advanced analytics for rich, rapid, and scalable insights into their data. This focus is complemented by our partners, who offer proven best-of-breed solutions for collaboration and data transformation.

A key illustration of the investments that Aster is making in this vision is the formation of the new Aster Analytics Center: a center of excellence; ready-to-use analytics solutions that leverage MapReduce; and best practices for advanced analytics on big data. The Center’s charter is to develop products and provide insights that help organizations use data in clever ways to enable data-driven decisions. The Center is headed by Dr. Jonathan Goldman, our Director of Analytics and Applications, and a team of analytics experts. Jonathan joined us from LinkedIn, where as their Principal Scientist he led a team of analytics researchers to build cutting-edge products with the rich data sets LinkedIn has amassed. His team’s focus was on driving growth and user engagement for the LinkedIn social network. His team developed a successful model to build, ship, and iterate - to deliver value to LinkedIn effectively and sustainably. Across 3 years, he and his team delivered several industry-first features that surprised and delighted LinkedIn’s users - “People You May Know,” “Who Viewed My Profile?” “Jobs that are Similar to Mine,” and several others.

One of the first product solutions from the Aster Analytics Center is a suite of advanced analytics modules built on SQL and MapReduce called Aster Data Analytic Foundation. The suite makes it easy for data analysts to leverage large volumes of diverse data effectively. This package, which made its debut with our nCluster 4.5 release, announced in February, provides a suite of rich analytic functions that enable data scientists and users to manipulate data easily rather than building primitives from scratch.

The second aspect of the Analytics Center’s charter, the methodology of using data, is being addressed by their work to create analytics best practices that provide blueprints for data analysts to develop their insights into an operational data product that can be delivered repeatably.

From what we see already with customers, the Aster Analytics Center - the Aster Analytics Foundation solution, big data analytics best practices, and deep analytics expertise - will be a catalyst accelerating a chain reaction that will revolutionize data usage across industries.

Post a comment