MapReduce, SQL-MapReduce® Resources & Learning
What is MapReduce?
MapReduce, or map reduce, is a programming framework developed by Google to simplify data processing across massive data sets. As people rapidly increase their online activity and digital footprint, organizations are finding it vital to quickly analyze the huge amounts of data their customers and audiences generate to better understand and serve them. MapReduce is the tool that is helping those organizations.
You can learn more about MapReduce at www.mapreduce.org.
What is SQL-MapReduce?
SQL-MapReduce is a framework created by Teradata Aster to allow developers to write powerful and highly expressive SQL-MapReduce functions in languages such as Java, C#, Python, C++, and R and push them into the discovery platform for advanced in-database analytics. Analysts can then invoke SQL-MapReduce functions using standard SQL through Aster Database, the first discovery platform that allows applications to be fully embedded within the database engine to enable ultra-fast, deep analysis of massive data sets.
SQL-MapReduce functions are simple to write and are seamlessly integrated within SQL statements. They rely on SQL queries to manipulate the underlying data and provide input. The functions can procedurally manipulate such input data and provide outputs that can be further consumed by SQL queries or written into tables within the database.
MapReduce functions seamlessly integrate into SQL queries
Teradata Aster's customers use SQL-MapReduce to ask questions of their data that were previously impossible, or the results were so slow that they could not meet service level agreements. In these short tutorials and case studies, you will learn how companies are writing SQL-MapReduce functions for:
- Fraud Detection – A large online gaming company catches cases of fraud that previous queries could not detect. And the company reduced its fraud analytics cycle time from one week to 15 minutes, with query response dropping from 90 minutes to 90 seconds.
- Graph Analysis – A social media company uses the SQL-MapReduce function nPath for graph analysis to understand how its users are connected and enhance the networks of its community.
- Sharing Behavior – ShareThis uses MapReduce to reduce query times as it analyzes the items that people share online to understand sharing behavior.
- Sessionization – A social network uses the SQL-MapReduce function "sessionize" to break user data into sessions based on the length of time between activity on the network. With sessionize, the SQL code dropped from more than 1000 lines to less than 100 and performance improved dramatically.
- Search Behavior –An online media company uses the SQL-MapReduce function nPath to better understand the paths its users follow after conducting a search to improve search results.
- Transformations – Where data transformations previously required multiple complex self joins, a media company now uses the SQL-MapReduce function nPath to make a single pass of its data, significantly simplifying the code and improving performance.
The integration of Aster Database and ApacheTM HadoopTM allows businesses to leverage Hadoop for data retention and pre-processing capabilities, while using Aster to perform data transformations, reporting and complex data analytics. Using Aster’s unique integration with Hadoop, business analysts can easily access and analyze large volumes of multi-structured data with the Teradata Aster Discovery Platform. There are currently two ways to integrate Hadoop data into Aster Database.
- Aster-Hadoop Adaptor: Uses Teradata Aster's patented SQL-MapReduce capabilities for two-way, high-speed, data transfer between Apache Hadoop and Teradata Aster. The adaptor utilizes SQL-MapReduce functions for ultra-fast, two-way data loading between Hadoop Distributed File System (HDFS) and Aster's discovery platform.
- Aster SQL-HTM: Empowers business analysts to directly analyze vast amounts of Hadoop data without requiring complex MapReduce programming skills or an understanding of how data is stored within the Hadoop Distributed File System (HDFSTM). With SQL-H, analysts can use common BI and reporting tools which leverage their business knowledge and SQL skills. They can access data in Hadoop directly, easily join it with data in Aster Database utilizing the analytical power of SQL-MapReduce® and the Aster Discovery Platform. SQL-H interfaces with the Apache HCatalog project to provide a mechanism for users to directly access the Hadoop data from Aster.
To learn more about how MapReduce can add more value to your business, contact us by phone at 1.888.Aster.Data, or e-mail firstname.lastname@example.org.
Harnessing the Value of Big Data Analytics
The Next Generation of Big Data Analytics
Tasso Argyros, Co-President, Teradata Aster, discusses a seamless way to bridge the Hadoop and SQL gap