Aster Data nCluster:
Application Embedding and SQL-MapReduce for Ultra-fast Data Analysis

The "Big Data" Challenge

Exponential data growth coupled with the desire to do more with data is forcing organizations to seek newer methods of analyzing data. The limited functionality of SQL has traditionally forced companies toward architectures that required the majority of data analysis to be done in the application-tier using programming languages such as Java, C#, Python, C++, R, etc. At current volumes, such architectures are no longer feasible, as transferring big data volumes between a DBMS/data warehouse to the application-tier for analytics and data processing does not scale.

To overcome the issue of data movement and high latency, Aster Data 4.0 for the first time brings application logic inside Aster Data's MPP database. The co-location of data and application logic inside Aster Data 4.0 enables ultra-fast analysis on big data sets and allows for a new generation of interactive, big data applications.

Embed Applications with Data for Ultra-fast Analysis

Aster Data’s nCluster is the first MPP data warehouse architecture that allows applications to be fully embedded within the database engine to enable ultra-fast, deep analysis of massive data sets. Aster Data's unique “applications-within™” approach allows application logic to exist and execute with the data itself. Termed a “Massively Parallel Data-Application Server”, Aster Data’s solution effectively uses Aster Data’s patent-pending SQL-MapReduce together with parallelized data processing and applications to address the big data challenge.

MapReduce is a programming model that was popularized by Google in 2003 to process large unstructured data-sets distributed across thousands of nodes. Aster Data's patent-pending SQL-MapReduce enables enterprises to harness the power of MapReduce while managing their data in Aster Data nCluster

Using the SQL-MapReduce parallel framework, nCluster seamlessly embeds applications where the data natively resides, within an MPP database running on commodity servers. Enterprises can seamlessly push select applications down to Aster Data's MPP cluster and start getting the performance they need, whether for standard ANSI SQL queries, custom procedures (written in Java, C/C++, C#, Perl, Python, Ruby, R), or packaged analytic applications.

Aster Data nCluster: a massively parallel data-application server

SQL-MapReduce

SQL-MapReduce embeds MapReduce parallel processing framework inside nCluster, enabling push-down of custom applications or packaged analytic applications to where data is stored. Applications can iteratively read and write data enabling highly interactive, analytics-intensive applications. SQL-MR provides:

  • Powerful Expressiveness – Dramatic reduction in SQL code complexity and the procedural flexibility to express any question possible using the language of choice (Java, C/C++, Python, Perl, etc).
  • Seamless SQL Integration – Users (SQL developers, data miners, business users via BI tool) simply plug in the SQL-MR function in arbitrarily composable SQL code that they already know and love.
  • Re-Usability – Polymorphic re-usability saves significant resource time by avoiding re-writing a new function every time the output changes. SQL-MR adapts at the last possible moment (run-time).
  • Scalable Performance – Distributed query planning and optimizations apply the same for SQL-MR as for regular SQL, enabling high-speed parallel processing and distributed query optimization.
  • Fault Isolation – Sandboxed containers and process management ensures strict isolation to avoid any single SQL-MR statement from taking down another (e.g., due to poorly written code).

SQL-MapReduce unites MapReduce with functionally-rich SQL

SQL-MapReduce Highlights

Aster Data nCluster provides a pre-packaged software development kit (SDK), SQL-MapReduce APIs, and documentation provide rich tools for creating custom functions and applications. nCluster also provides several useful functions to jump-start powerful analytics:

  • nPath – complex sequential analysis for time-series analysis and behavioral pattern analysis.
  • SSSP – single source shortest path Graph algorithm useful for fraud and segmentation analysis.
  • Approximate percentiles – ultra-fast percentile (or N-tile) statistical distribution analysis.
  • Linear regression – statistical technique used to predict values based on a set of related variables.
  • Tokenize – text analysis that splits strings into words, categorizes them, and does a word count.
  • Sessionize – session categorization based on a sequence of clicks within a specified timeout

Read much more on SQL-MapReduce on our blog

Top Picks
Whitepaper: A New Approach for Large-Scale Data Management and Data Analysis
Data Sheet: Aster Data nCluster 4.0 Data-Application Server
Whitepaper: Deriving Deep Insights from Large Datasets
Webcast: Mastering MapReduce Series, Part I: Big Data Reality, with Curt Monash
Aster Data's approach for interactive, big data applications is highly unique and allows us to store and process data in ways that were unimaginable in the past.

comScore
Michael Brown, EVP of Software Engineering
Recent attempts to bring analytic logic into databases as user defined functions or stored procedures are a step in the right direction, but inherently limited because most databases aren't optimized for application logic.
Aster Data has tackled this issue by embedding the equivalent of an application server in the database, such that application logic is fully parallelized for maximum speed and scalability with advanced data analytics.

TDWI
Philip Russom, Senior Manager of TDWI Research