Aster Data nCluster:
Application Embedding and SQL-MapReduce for Ultra-fast Big Data Analytics

The "Big Data" Challenge

Exponential data growth coupled with the desire to do more with data is forcing organizations to seek newer methods of analyzing data. The limited functionality of SQL has traditionally forced companies toward architectures that required the majority of data analysis to be done in the application-tier using programming languages such as Java, C#, Python, C++, R, etc. At current volumes, such architectures are no longer feasible, as transferring big data volumes between a DBMS/data warehouse to the application-tier for analytics and data processing does not scale.

To overcome the issue of data movement and high latency, Aster Data for the first time brings application logic inside Aster Data's MPP database. The co-location of data and application logic inside Aster Data enables ultra-fast in-database analytics on big data sets and allows for a new generation of interactive, advanced analytics on large data volumes.

Aster Data nCluster 4.5 takes an additional step forward by making it easy to develop analytic applications for big data with the first integrated development environment for SQL and MapReduce applications and a suite of analytic modules designed to leverage SQL-MapReduce for in-database analytics. Learn more about nCluster 4.5 and how it makes it easy to develop applications like business analytics and predictive analytics.

Embed Applications with Data for Ultra-fast Analysis

Aster Data’s nCluster is the first MPP data warehouse architecture that allows applications to be fully embedded within the database engine to enable ultra-fast, deep analysis of massive data sets. Aster Data's unique “applications-within™” approach allows application logic to exist and execute with the data itself. Termed a “Massively Parallel Data-Application Server”, Aster Data’s solution effectively uses Aster Data’s patent-pending SQL-MapReduce together with parallelized data processing and applications to address the big data challenge.

MapReduce is a programming model that was popularized by Google in 2003 to process large unstructured data-sets distributed across thousands of nodes. Aster Data's patent-pending SQL-MapReduce enables enterprises to harness the power of MapReduce while managing their data in Aster Data nCluster.

Using the SQL-MapReduce parallel framework, nCluster seamlessly embeds applications where the data natively resides, within an MPP database running on commodity servers. Enterprises can seamlessly push select applications like business analytics and predictive analytics down to Aster Data's MPP cluster and start getting the performance they need, whether for standard ANSI SQL queries, custom procedures (written in Java, C/C++, C#, Perl, Python, Ruby, R), or packaged analytic applications.

Aster Data nCluster: a massively parallel data-application server

SQL-MapReduce

SQL-MapReduce embeds the MapReduce parallel processing framework inside nCluster, enabling push-down of custom applications or packaged analytic applications to where data is stored for ultra-fast advanced in-database analytics. Applications can iteratively read and write data enabling highly interactive, analytics-intensive applications. SQL-MR provides:

  • Powerful Expressiveness – Dramatic reduction in SQL code complexity and the procedural flexibility to express any question possible using the language of choice (Java, C/C++, Python, Perl, etc).
  • Seamless SQL Integration – Users (SQL developers, data miners, business users via BI tool) simply plug in the SQL-MR function in arbitrarily composable SQL code that they already know and love.
  • Re-Usability – Polymorphic re-usability saves significant resource time by avoiding re-writing a new function every time the output changes. SQL-MR adapts at the last possible moment (run-time).
  • Scalable Performance – Distributed query planning and optimizations apply the same for SQL-MR as for regular SQL, enabling high-speed parallel processing and distributed query optimization.
  • Fault Isolation – Sandboxed containers and process management ensures strict isolation to avoid any single SQL-MR statement from taking down another (e.g., due to poorly written code).

SQL-MapReduce applications are simple to write and manage

SQL-MapReduce Highlights

Aster Data nCluster provides a pre-packaged software development kit (SDK), SQL-MapReduce APIs, and documentation provide rich tools for creating custom functions and applications. nCluster also provides several useful functions to jump-start powerful analytics:

  • nPath – complex sequential analysis for time-series analysis and behavioral pattern analysis.
  • SSSP – single source shortest path Graph algorithm useful for fraud and segmentation analysis.
  • Approximate percentiles – ultra-fast percentile (or N-tile) statistical distribution analysis.
  • Linear regression – statistical technique used to predict values based on a set of related variables.
  • Tokenize – text analysis that splits strings into words, categorizes them, and does a word count.
  • Sessionize – session categorization based on a sequence of clicks within a specified timeout.

Read much more on SQL-MapReduce on our blog

Top Picks
Webcast: Advanced
In-Database Analytics Done Right - Feat. Neil Raden,
April 1
Whitepaper: A New Approach for Large-Scale Data Management and Data Analysis
Data Sheet: Aster Data nCluster Data-Application Server
Whitepaper: Deriving Deep Insights from Large Datasets
Webcast: Mastering MapReduce Series, Part I: Big Data Reality, with Curt Monash
Aster Data's approach for interactive, big data applications is highly unique and allows us to store and process data in ways that were unimaginable in the past.

comScore
Michael Brown, EVP of Software Engineering
Recent attempts to bring analytic logic into databases as user defined functions or stored procedures are a step in the right direction, but inherently limited because most databases aren't optimized for application logic.
Aster Data has tackled this issue by embedding the equivalent of an application server in the database, such that application logic is fully parallelized for maximum speed and scalability with advanced data analytics.

TDWI
Philip Russom, Senior Manager of TDWI Research