Archive for August, 2008

By Tasso Argyros in Blogroll, Database, MapReduce on August 27, 2008

Building on Mayank’s post, let me dig deeper into a few of the most important differences between Aster’s In-Database MapReduce and User Defined Functions (UDFs):

Feature User Defined Functions Aster SQL/MR Functions What does it mean?
———— ———— ———— ————
Dynamic Polymorphism No. Requires changing the code of the function and static declarations Yes SQL/MR functions work just like SQL extensions – no need to change function code
———— ———— ———— ————
Parallelism Only in some cases and for few number of nodes Yes, across 100s of nodes Huge performance increases even for the most complex functions
———— ———— ———— ————
Availability Ensured No. In most cases UDFs run inside the database Always. Functions run outside the database Even if functions have bugs, the system remains resilient to failures
———— ———— ———— ————
Data Flow Control No. Requires changing the UDF code or writing complex SQL subselects Yes. “PARTITION BY” and “SEQUENCE BY” natively control the flow of data in and out of the SQL/MR functions Input/output of SQL/MR functions can be redistributed across the database cluster in different ways with no actual function code change

In this blog post we’ll focus on Polymorphism – what it is and why it’s so critically important for building real SQL extensions using MapReduce.

Polymorphism allows Aster SQL/MR functions to be coded once (by a person that understands a programming or scripting language) and then used many times through standard SQL by analysts. In this context, comparing Aster SQL/MR functions and UDFs is like comparing SQL with the C language. The former is flexible, declarative and dynamic, the latter requires customization and recompilation even for the slightest change in usage.

For instance, take a SQL/MR function that performs sessionization. Let us assume that we have a webclicks(userId int, timestampValue timestamp, URL varchar, referrerURL varchar); table that contains a record of clicks for each user on our website. The same function, with no additional declarations, can be used in all the following ways:

SELECT sessionId, userId, timestampValue
FROM Sessionize( 'timestamp', 60 ) ON webclicks;

SELECT  sessionId, userId, timestampValue
FROM Sessionize( 'timestamp', 60 ) ON
(SELECT userid, timestampValue FROM webclicks WHERE userid = 50 );

[Note how the number of input arguments changed (going from all columns of webclicks; to just two columns of webclicks) in the above clause but the same function can be used. This is not possible with a plain UDF without writing additional declarations and UDF code]

SELECT  sessionId, UID, TS
FROM Sessionize( 'ts', 60 ) ON
(SELECT userid as UID, timestampValue as TS FROM webclicks WHERE userid = 50 );

[Note how the names of the arguments changed but the Sessionize() function does the right thing]

In other words, Aster SQL/MR functions are real SQL extensions – once they’ve been implemented there is zero need to change their code or write additional declarations – there is perfect separation between implementation and usage. This is an extremely powerful concept since in many cases the people that implement UDFs (engineers) and the people that use them (analysts) have different skills. Requiring a lot of back-and-forth can simply kill the usefulness of UDFs – but not so with SQL/MR functions.

How can we do that? There’s no magic, just technology. Our SQL/MR functions are dynamically polymorphic. To do this, our SQL/MR implementation (the sessionize.o file) includes not only code but also logic to determine its output schema based on its input, which is invoked at every query. This means that there is no need for a static signature as is the case with UDFs!

In-Database MapReduce Flow

Polymorphism also makes it trivial to nest different functions arbitrarily. Consider a simple example with two functions, Sessionize() and FindBots(). FindBots() can filter the input from any users that seem to act as bots, e.g. users whom interactions are very frequent (who could click on 10 links per second? probably not a human). To use these two functions in combination, one would simply write:

SELECT  sessionId, UserId, ts, URL
FROM Sessionize( 'ts', 60 ) ON FindBots( 'userid', 'ts' ) ON webclicks;

Using UDFs instead of SQL/MR functions would mean that this statement would require multiple subselects and special UDF declarations to accommodate the different inputs that come out of the different stages of the query.

So what is it that we have created? SQL? Or MapReduce? It really doesn’t matter. We just combined the best of both worlds. And it’s unlike anything else the world has seen!

By Mayank Bawa in Analytics, Analytics tech, Blogroll, Database, MapReduce on August 26, 2008

Pardon the tongue-in-cheek analogy to Oldsmobile when describing user-defined functions (UDFs), but I want to draw out some distinctions between this new class of functions that In-Database MapReduce enables.

Not Your Granddaddy's Oldsmobile

While similar on the surface, in practice there are stark differences between Aster In-Database MapReduce and traditional UDF’s.

MapReduce is a framework that parallelizes procedural programs to offload traditional cluster programming. UDF’s are simple database functions and while there are some syntactic similarities, that’s where the similarity ends. Several major differences between In-Database MapReduce and traditional UDF’s include:

Performance: UDF’s have limited or no parallelization capabilities in traditional databases (even MPP ones).  Even where UDF’s are executed in parallel in an MPP database, they’re limited to accessing local node data, have byzantine memory management requirements, require multiple passes and costly materialization.  In constrast, In-Database MapReduce automatically executes SQL/MR functions in parallel across potentially hundreds or even thousands of server nodes in a cluster, all in a single-pass (pipelined) fashion.

Flexibility: UDF’s are not polymorphic. Some variation in input/output schema may be allowed by capabilities like function overloading or permissive data-type handling, but that tends to greatly increase the burden on the programmer to write compliant code.  In contrast, In-Database MapReduce MR/SQL functions are evaluated at run-time to offer dynamic type inference, an attribute of polymorphism that offers tremendous adaptive flexibility previously only found in mid-tier object oriented programming.

Manageability: UDF’s are generally not sandboxed in production deployments. Most UDF’s are executed in-process by the core database engine, which means bad UDF code can crash a database. SQL/MR functions execute in their own process for full fault isolation (bad SQL/MR code results in an aborted query, leaving other jobs uncompromised). A strong process management framework also ensures proper resource management for consistent performance and progress visibility.

By Mayank Bawa in Analytics, Blogroll, Business analytics, MapReduce on August 25, 2008

I’m unbelievably excited about our new In-Database MapReduce feature!

Google has used MapReduce and GFS on page rank analysis, but the sky is really the limit for anyone to build powerful analytic apps. Curt Monash has posted an excellent compendium of applications that are successfully leveraging the MapReduce paradigm today.

A few examples of SQL/MapReduce functions that we’ve collaborated with our customers on so far:

1. Path Sequencing: SQL/MR functions can be used for developing regular expression matching of complex path sequences (eg. time series financial analysis or clickstream behavioral recommendations). It can also be extended to discover Golden Paths to reveal interesting behavioural patterns useful for segmentation, issue resolution, and risk optimization.

2. Graph Analysis: many interesting graph problems like BFS (breadth first search), SSSP (single source shortest path), APSP (all-pairs shortest path), and page rank that depend on graph traversal.

3. Machine Learning: several statistical algorithms like linear regression, clustering, collaborative filtering, naive bayes, support vector machine, and neural networks can be used to solve hard problems like pattern recognition, recommendations/market basket analysis, and classification/segmentation.

4. Data Transformations and Preparation: Large-scale transformations can be parameterized as SQL/MR functions for data cleansing and standardization, unleashing the true potential for Extract-Load-Transform pipelines and making large-scale data model normalization feasible. Push down also enables rapid discovery and data pre-processing to create analytical data sets used for advanced analytics such as SAS and SPSS.

These are just a few simple examples Aster has developed for our customers and partners via Aster’s In-Database MapReduce to help them with rich analysis and transformations of large data.

I’d like to finish with a simple code snippet example of a simple, yet powerful SQL/MR function we’ve developed called “Sessionization”

Our Internet customers have conveyed that defining a user session can’t be easily done (if at all) using standard SQL. One possibility is to use cookies but users frequently remove them or they expire.

Aster In-Database MapReduce

Aster developed a simple “Sessionization”Â? SQL/MR function via our standard Java API library to easily parameterize the discovery of a user session. A session would be defined by a timeout value (eg. in seconds). If the elapsed time between consecutive click events is greater than the timeout, this would signal a new session has begun for that user.

From a user perspective, the input is user clicks (eg. timestamp, userid). The output is to associate each click to a unique session identifier based on the Java procedure noted above. Here’s the simple syntax:

SELECT timestamp, userid, sessionid
FROM sessionize("timestamp", 600) ON clickstream
SEQUENCE BY timestamp

Indeed, it is that simple.

So simple, that we have reduced a complex multi-hour Extract-Load-Transform task into a toy example. That is the power of In-Database MapReduce!

By Mayank Bawa in Blogroll, Database, MapReduce on August 25, 2008

I am very pleased to announce today that Aster nCluster now brings together the expressive power of a MapReduce framework with the strengths of a Relational Database!

Jeff Dean and Sanjay Ghemawat at Google had invented the MapReduce framework in 2004 for processing large volumes of unstructured data on clusters of commodity nodes. Jeff and Sanjay’s goal was to provide a trivially parallelizable framework so that even novice developers (a.k.a interns) could write programs in a variety of languages (Java/C/C++/Perl/Python) to analyze data independent of scale. And, they have certainly succeeded.

Once implemented, the same MapReduce framework has been used successfully within Google (and outside, via Yahoo! sponsored Apache’s Hadoop) to analyze structured data as well.

In mapping our product trajectory, we realized early on that the intersection of MapReduce and Relational Databases for structured data analysis has a powerful consonance. Let me explain.

Relational Databases present SQL as an interface to manipulate data using a declarative interface rooted in Relational Algebra. Users can express their intent via set manipulations and the database runs off to magically optimize and efficiently execute the SQL request.

Such an abstraction is sunny and bright in the academic world of databases. However, any real-world practitioner of databases knows the limits of SQL and those of its Relational Database implementations: (a) a lack of expressive power in SQL (consider doing a Sessionization query in SQL!), and (b) a cost-based optimizer that often has a mind of its own refusing to perform the right operations.

Making an elephant dance!A final limitation of SQL is completely non-technical: most developers struggle with the nuances of making a database dance well to their directions. Indeed, a SQL maestro is required to perform interesting queries for data transformations (during ETL processing or Extract-Load-Transform processing) or data mining (during analytics).

These problems become worse at scale, where even minor weaknesses result in longer run-times. Most developers (the collective us), on the other hand, are much more familiar with programming in Java/C/C++/Perl/Python than in SQL.

MapReduce presents a simple interface for manipulating data: a map and a reduce function written in the language of choice (Java/C/C++/Perl/Python) of a developer. Its real power lies in the Expressivity it brings: it makes the phrasing of really interesting transformations and analytics breathtakingly easy. The fact that MapReduce, in its use of Map and Reduce functions is a “specific implementation of well known techniques developed nearly 25 years ago” is its beauty: every programmer understands it and knows how to leverage it.

As a computer scientist, I am thrilled at the simple elegant interface that we’ve enabled with SQL/MR. If our early beta trials with customers are any indication, databases have just taken a major step forward!

You can program a database too!You can now write against the database in a language of your choice and invoke these functions from within SQL to answer critical business questions. Data analysts will feel liberated to have simple powerful tools to compete effectively on analytics. More importantly, analysts now have simplicity, working within the environs of simple SQL that we all love.

The Aster nCluster will orchestrate resources transparently to ensure that tasks make progress and do not interfere with other concurrent queries and loads in the database.

Aster: Do More!We proudly present our SQL/MapReduce framework in Aster nCluster as the most powerful analytical database. Seamlessly integrating MapReduce with ANSI SQL provides a quantum leap that will empower analysts and ultimately unleash the power of data for the masses.

That is our prediction. And we are working to make it happen!


I am curious if anyone out there is attending the TDWI World Conference in San Diego this week? If so and you would like to meet up with me, please do drop me a line or comment below as I will be in attendance. I’m of course very excited to be making the trip to sunny San Diego and hope to catch a glimpse of Ron Burgundy and the channel 4 news team! :-)

But of course it’s not all fun and games, as I’ll participate in one of TDWI’s famous Tool Talk evening sessions discussing data warehouse appliances. This should make for some great dialogue between me and other database appliance players, especially given the recent attention our industry has seen. I think Aster has a really different approach to analyzing big data and look forward to discussing exactly why.

For those interested in the talk, here are the details..come on by and let’s chat!
What:TDWI Tool Talk Session on data warehouse appliances
When: Wednesday, August 20, 2008 @ 6:00p.m.
Where: Manchester Grand Hyatt, San Diego, CA

By Tasso Argyros in Analytics, Blogroll, Business analytics, Statements on August 17, 2008

When Polo lets you use your mobile phone to buy a pair of pants, you know there’s something interesting going on.

The trend is inevitable: purchasing becomes easier and more frictionless. You could buy something at the store or from your home. But now you can buy stuff while you jog in the park, while you bike (it’s not illegal yet), or even while you’re reading a distressing email on your iPhone (shopping therapy at its best.)

As purchasing gets easier and pervasive, we’ll tend to buy things in smaller quantities and more often. Which means more consumer behavior data will be available for analysis by advertisers and retailers to better target promotions to the right people at the right time.

In this new age, where interaction of buyers with shops and brands is much more frequent and intimate, enterprises who use their data to understand their customers will have a huge advantage over their competition. That’s one of the reasons why at Aster we’re so excited building the tools for tomorrow’s winners.

By Tasso Argyros in Administration, Availability, Blogroll, Manageability, Scalability on August 12, 2008

- John: “What was wrong with the server that crashed last week?”

- Chris: “I don’t know. I rebooted it and it’s just fine. Perhaps the software crashed!”

I’m sure anyone who has been in operations has had the above dialog, sometimes quite frequently! In computer science such a failure would be called “transient” because the failure affects a piece of the system only for a fixed amount of time. People who have been running large-scale systems for a long time will attest that transient failures are extremely common and can lead to system unavailability if not handled right.

In this post I want to explore why transient failures are an important threat to availability and how a distributed database can handle them.

To see why transient failures are frequent and unavoidable, let’s consider what can cause them. Here’s an easy (albeit non-intuitive) reason:  software bugs.  All production-quality software still has bugs; most of the bugs that escape testing are difficult to track down and resolve, and they take the form of Heisenbugs, race conditions, resource leaks, and environment-dependent bugs, both in the OS and the applications. Some of these bugs will cause a server to crash unexpectedly.  A simple reboot will fix the issue, but in the meantime the server will not be available.  Configuration errors are another common cause.  Somebody inserts the wrong parameters into a network switch console and as a result a few servers suddenly go offline. And, sometimes, the cause of the failure just remains unidentified because it can be hard to reproduce and thus examine more thoroughly.

I submit to you that it is much harder to prevent transient failures than permanent ones. Permanent failures are predictable, and are often caused by hardware failures. We can build software or hardware to work around permanent failures. For example, one can build a RAID scheme to prevent a server from going down if a disk fails, but no RAID level can prevent a memory leak in the OS kernel from causing a crash!

What does this mean? Since transient failures are unpredictable and harder to prevent, MTTF (mean time to failure) for transient failures is hard to increase.

Clearly, a smaller MTTF means more frequent outages and larger downtimes. But if MTTF is so hard to increase for transient failures, what can we do to always keep the system running?

The answer is that instead of increasing MTTF we can reduce MTTR (mean time to recover). Mathematically this concept is expressed by the formula:

Availability = MTTF/(MTTF+MTTR)

It is obvious that as MTTR approaches zero, Availability approaches 1, (i.e. 100%). In other words, if failure recovery is very fast, (instantaneous in an extreme example) then even if failures happen frequently, overall system availability will continue to be very high. This interesting approach to availability, called Recovery Oriented Computing was developed jointly by Berkeley and Stanford researchers, including my co-founder George Candea.

Applying this concept to a massively parallel distributed database yields interesting design implications. As an example, let’s consider the case where a server fails temporarily due to an OS crash in a 100-server distributed database. Such an event means that the system has fewer resources to work with: in our example after the failure we have a 1% reduction of available resources. A reliable system will need to:

(a) Be available while the failure lasts and

(b) Recover to the initial state as soon as possible after the failed server is restored.

Thus, recovering from this failure needs to be a two-step process:

(a) Keep the system available with a small performance/capacity hit while the failure is ongoing (availability recovery)

(b) Upgrade the system to its initial levels of performance and capacity as soon as the transient failure is resolved (resource recovery)

Minimizing MTTR means minimizing the sum of the time it takes to do (a) and (b), ta + tb. Keeping ta very low requires having replicas of data spread across the cluster; this, coupled with fast failure detection and fast activation of the appropriate replicas, will ensure that ta remains as low as possible.

Minimizing tb requires seamless re-incorporation of the transiently failed nodes into the system. Since in a distributed database each node has a lot of state, and the network is the biggest bottleneck, the system must be able to reuse as much of the state that pre-existed on the failed nodes as possible to reduce the recovery time. In other words, if most of the data that was on the node before the failure is still valid (a very likely case) then it needs to be identified, validated and reused during re-incorporation.

Any system that lacks the capacity to keep either ta or tb low does not provide good tolerance to transient failures.

And because there will always be more transient failures the bigger a system gets, any architecture that cannot handle failures correctly is – simply – not scalable. Any attempt to scale it up will likely result in outages and performance problems. Having a system designed with a Recovery-Oriented architecture, such as the Aster nCluster database, can ensure that transient failures are tolerated with minimal disruption, and thus true scalability is possible.

By Mayank Bawa in Analytics, Blogroll, Business analytics, Business intelligence, Database on August 5, 2008

Today we are pleased to welcome Pentaho as a partner to Aster Data Systems. What this means is that our customers can now use Pentaho open-source BI products for reporting and analysis on top of Aster nCluster.

We have been working with Pentaho for some time on testing the integration between their BI products and our analytic database. We’ve been impressed with Pentaho’s technical team and the capabilities of the product they’ve built together with the open source community. Pentaho recently announced a new iPhone application which is darn cool!

I guess, by induction, Aster results can be seen on the iPhone too. :-)