ACT III: On the road to nPath

By Michael Riordan, Teradata Aster

The foundations of the Aster platform are a parallel database with all the features you'd expect from a SQL platform.  However the power of the Aster platform really shines once you start using its library of SQL-MR analytic components.  With these components, complex data analysis that is difficult to code in SQL becomes much, much easier.  A clear example of this can be seen with Aster's nPath module.

Consider the example dataset, bank_web_clicks, that we loaded in our prior tutorial, Using Aster Express: Act 2. This fairly simple data set contains web log event data of users' web page visits to a fictional bank web site.  The table fields in this data are:


For those unfamiliar with web logs, 'sessionizing' refers to grouping a customer’s web clicks into distinct viewing sessions based upon the time of the page view.  Page visits that are close together in time are considered a session.  When there is a sizeable break in time, for example a break greater than an hour, a new session is created.  Think of your own browsing habits.  You may visit a site regularly during the day, but your morning visit is considered a separate session from your afternoon or evening visits.

In our data table, bank_web_clicks, the web log records have already been parsed, and the sessions have already been defined with the field ‘session_id’.  For example, let's look at the data for a single customer with customer_id = 32.



From the results of this SQL query we can see that customer 32 has visited our site during 8 individual sessions (only 5 of the sessions appear in this screen shot). 

A query like this is a simple metric that we can use to capture information about our customers; for example,  to see who are the most frequent visitors to our website.

(note the LIMIT 10 syntax that keeps the output manageable - there are a million rows in this table!):



OK, so far these are pretty standard SQL queries.  Yes, there is a lot of customer viewing behavior that we can derive from our web logs using these types of SQL queries.  But for more complex analytics, its time to head toward Aster’s powerful analytics libraries and dive into SQL-MapReduce using Aster’s nPath module.

nPath – MapReduce Analytics using SQL Queries

With nPath we can start asking more complex questions like "what are the common paths that our customers take on our web site", or even focus it more to key web transactions like "what path did the customers take that lead them to making a purchase on-line".  These queries need to find patterns or relationships between the rows in the table.  These types of queries, especially Time Series types of queries, become difficult with standard SQL.  Relational databases are great tools when the data model maps the relationships between objects or tables.  But when the relationships are between rows in the same table, well, that's another story.

Let's take a look at our bank_web_clicks table for an example.  So how do we find the customer's path through the web site?  For starters, we'd need to order the page visits by time.  We can do that with the page visit timestamp field in our table (which is what was used to create the session groupings).  But what would that SQL look like?  It would certianly have quite a few 'MAX(datestamp) < datestamp' iterations for each level deep that we wanted to query.  Getting the 2nd page wouldn't be too bad, but the 3rd in the series, or the 4th or 5th - now that would be some complex SQL query!

Now with Aster's nPath module, it is much more straight-forward and simple query to write.  Let's build one here so that we can take a closer look.

First, we start with a SELECT statement that calls nPath.  The parameters that we'll pass to nPath tell it which table to use to pull data, as well as how to partition that data and which patterns we want to analyze.

We’ll start again by looking at the paths taken by our favorite customer (customer_ID=32). With nPath, we are analyzing the data to find common paths.  If we know the end point, its easy to understand the query by thinking of looking backward at the steps that led up to that end point.  In this example, as in many nPath examples, we'll call the end point our 'conversion event', which in this case is the BILL MANAGER ENROLLMENT page. 

While the nPath SQL below does have a little bit of a learning curve, the first thing that should be easily recognized is that this does look like a 'standard' SQL statement.  Take a look at the nPath query below and at the inputs that we are passing to it.  You can see that we are passing the table bank_web_clicks as the input records.  We also are definining an alias named 'CONVERSION' as our endpoint, the 'BILL MANAGER ENROLLMENT' page.  All other pages are aliased as 'PAGE'.  Another key syntax to take a closer look at is the 'PATTERN' statement.  Here we are telling nPath that we are looking for all pages ('PAGE+') that lead to our CONVERSION step; PATTERN('PAGE+.CONVERSION').



In these results we see that customer_32 did enroll in the Bill Manager program during his 4th web session.  And the path that he/she took was first the Account Summary page, then the Bill Manager Form, and finally the Bill Manager Enrollment page.  Cool!

Now what about the other customers?  Well, its "just SQL", go ahead and remove the WHERE clause with customer_id = 32 and rerun the query.  That will show all customers who have joined the Bill Manager program, along with the paths that they took through the web site that lead to that action/conversion.

Or what if we wanted to know the most common paths that customer were taking through our web site?  This might be good information to help us design the site better to encourage customer behaviour.  Let's make another small change to our "just SQL" query:



With the addition to our SQL query of a simple DISTINCT and count/GROUP BY, we have some great info on how customers are using the web site.  Now this Aster-nPath is starting to look like a pretty powerful tool.
Hopefully that gives you a good peek into the power of Aster's SQL-MR library, especially the nPath module.  More examples will be coming soon.

In the meanwhile, have fun experimenting with these examples.  On the next tutorial we've also published a guide to the Aster nPath syntax that should be very helpful as you start building your own queries against your data, Aster nPath Guide.

Have fun!