MapReduce, SQL-MapReduce Resources & Learning

What is MapReduce?
What is SQL-MR?
Applications
Writing with SQL-MR

What is MapReduce?

MapReduce is a programming framework developed by Google to simplify data processing across massive data sets. As people rapidly increase their online activity and digital footprint, organizations are finding it vital to quickly analyze the huge amounts of data their customers and audiences are generating to better understand and serve them. MapReduce is the tool that is helping those organizations.

What is SQL-MR?

SQL-MapReduce (SQL-MR) is a framework created by Aster to allow developers to write powerful and highly expressive SQL-MR functions in languages such as Java, C#, Python, C++, and R and push them into the database. Analysts can then invoke these functions using standard SQL through Aster nCluster, a highly-scalable relational database for frontline data warehousing.

Aster nCluster In-Database MapReduce functions are simple to write and are seamlessly integrated within SQL statements. They rely on SQL queries to manipulate the underlying data and provide input. The functions can procedurally manipulate such input data and provide outputs that can be further consumed by SQL queries or be written into tables within the database.

MapReduce functions seamlessly integrate into SQL queries

Applications

Aster customers are using In-Database MapReduce to ask questions of their data that were previously impossible, or the results were so slow that they could not meet service level agreements. In these webcasts, you will learn how customers are writing SQL-MR functions to:

  • Fraud Detection – A large online gaming company catches cases of fraud that previous queries could not detect. And the company reduced its fraud analytics cycle time from one week to 15 minutes.
  • Graph Analysis – A social media company uses the SQL-MR function nPath for graph analysis to understand how its users are connected and enahance the networks of its community.
  • Sharing Behavior – ShareThis uses MapReduce to reduce query times as it analyzes the items that people share online to understand sharing behavior.
  • Sessionization – A social network uses the SQL-MR function sessionize to break user data into sessions based on the length of time between activity on the network. With sessionize, the SQL code dropped from more than 1000 lines to less than 100 and performance improved dramatically.
  • Search Behavior – An online media company uses the SQL-MR function nPath to better understand the patterns its users follow after conducting a search so the company can improve search results.
  • Transformations – Where data transformations previously required multiple complex self joins, a media company now uses the SQL-MR function nPath to make a single pass of its data, significantly simplifying the code while improving performance.

Writing with SQL-MR

In the webcast series below Peter Pawlowski and Eric Friedman take you through the inner workings of Aster nCluster's SQL-MapReduce integration and explain how to write and call a MapReduce function with SQL-MR.

  • SQL-MR Session 1: The Basics of SQL and MapReduce Integration – Peter Pawlowski explains the benefits and limitations of SQL and MapReduce for organizations pushing their analytics to the next level.
  • SQL-MR Session 2: nPath – Peter Pawlowski explains how nPath, a SQL-MR function prepackaged in nCluster for the analysis of ordered data, is integrated into a SQL query.
  • SQL-MR Session 3: Writing a SQL-MR Function – Eric Friedman describes the execution model a developer writing a SQL-MR function needs to consider. Friedman then shows how the prepackaged sessionize function was written.

Read much more on In-Database MapReduce on our blog


Top Picks
Whitepaper: New MapReduce Whitepaper
Webcast: Bringing Big Data Analytics to the Enterprise - 11/12, with Merv Adrian
Webinar: Service Oriented 'Analytics' - 11/19, with James Kobelius