MapReduce, SQL-MapReduce Resources & Learning

What is MapReduce?
What is SQL-MapReduce?
Applications
Writing with SQL-MapReduce

What is MapReduce?

MapReduce is a programming framework developed by Google to simplify data processing across massive data sets. As people rapidly increase their online activity and digital footprint, organizations are finding it vital to quickly analyze the huge amounts of data their customers and audiences generate to better understand and serve them. MapReduce is the tool that is helping those organizations.

What is SQL-MapReduce?

SQL-MapReduce (SQL-MR) is a framework created by Aster Data to allow developers to write powerful and highly expressive SQL-MR functions in languages such as Java, C#, Python, C++, and R and push them into the database. Analysts can then invoke these functions using standard SQL through Aster Data's nCluster, the first MPP data warehouse that allows applications to be fully embedded within the database engine to enable ultra-fast, deep analysis of massive data sets.

SQL-MapReduce functions are simple to write and are seamlessly integrated within SQL statements. They rely on SQL queries to manipulate the underlying data and provide input. The functions can procedurally manipulate such input data and provide outputs that can be further consumed by SQL queries or written into tables within the database.

MapReduce functions seamlessly integrate into SQL queries

Applications

Aster Data's customers use SQL-MapReduce to ask questions of their data that were previously impossible, or the results were so slow that they could not meet service level agreements. In these short tutorials and case studies, you will learn how companies are writing SQL-MapReduce functions for:

  • Fraud Detection – A large online gaming company catches cases of fraud that previous queries could not detect. And the company reduced its fraud analytics cycle time from one week to 15 minutes, with query response dropping from 90 minutes to 90 seconds.
  • Graph Analysis – A social media company uses the SQL-MapReduce function nPath for graph analysis to understand how its users are connected and enhance the networks of its community.
  • Sharing Behavior – ShareThis uses MapReduce to reduce query times as it analyzes the items that people share online to understand sharing behavior.
  • Sessionization – A social network uses the SQL-MapReduce function "sessionize" to break user data into sessions based on the length of time between activity on the network. With sessionize, the SQL code dropped from more than 1000 lines to less than 100 and performance improved dramatically.
  • Search Behavior – An online media company uses the SQL-MapReduce function nPath to better understand the paths its users follow after conducting a search to improve search results.
  • Transformations – Where data transformations previously required multiple complex self joins, a media company now uses the SQL-MapReduce function nPath to make a single pass of its data, significantly simplifying the code and improving performance.

Writing with SQL-MapReduce

In this tutorial series, Peter Pawlowski and Eric Friedman take you through the inner workings of Aster Data's integration of SQL with MapReduce and explain how to write and call a MapReduce function with SQL-MR.

Read much more on In-Database MapReduce on our blog


Top Picks
Whitepaper: A New Approach for Large-Scale Data Management and Data Analysis
Whitepaper: Deriving Deep Insights from Large Datasets
Forrester Report: In-Database Analytics: The Heart of the Predictive Enterprise