Act II: Loading Data
By Michael Riordan, Teradata Aster
In this second part of our series on using Aster Express, we'll continuing using the Aster ACT query tool and introduce a new tool for bulk loading data, Aster's ncluster_loader. (See Using Aster Express: Act 1 tutorial for part 1 of this series).
With the Aster Express images for VMware Player, we've also included some sample data sets. These data sets will allow us to create analytic tutorials to showcase the power and flexibility of the Teradata Aster Discovery Platform. These data sets have been zipped and included in a "demo" directory on the Queen node. As we saw in the first the Using Aster Express: Act 1 tutorial of this series, we've also included the Aster client tools on the Queen as well. ACT is the tool that we've been using for submitting SQL queries to our Aster cluster. Another very useful tool is "ncluster_loader", which is the Aster tool for bulk loading data.
Let's start by logging into our Queen node. For simplicity with our virtual images, we'll login directly to the Queen instance. As we showed previously, it is also good practice to use a remote SSH utility. Both are equivalent for our purposes. Our user login/password is "aster/aster" for Aster Express. Once we're on the Queen, we open a terminal window by double-clicking the GNOME Terminal icon, and stretch it a bit to give ourselves some elbow room. Start the ACT query tool by typing "act" on the command line and password "beehive" when prompted.
Creating a new Table
We are going to create a new table for our first dataset. This dataset contains just over 1 million records that simulate customer visits to a bank web site. Here is the SQL CREATE TABLE statement that we'll use:
Type or copy/paste this CREATE TABLE statement into ACT. (See the screenshot below). This now gives us a landing table for loading our data. Notice again the 'DISTRIBUTE BY HASH' syntax. This tell Aster how the data should be distributed across the Worker nodes in the cluster. We're choosing to distribute based on the customer_id. We'll cover this in more detail once we get to the analytic tutorials.
Let's exit ACT and find our dataset. Use the ACT quit command, "\q" to bring us back to the shell command prompt.
The zipped datasets are in a folder name "demo" under the aster home directory. Let's first unzip the "bank_web_data.zip" file.
We're now ready to use the Aster load tool, ncluster_loader.
For help on the syntax for using this tool, use the "--help" parameter syntax.
In our example, we'll specifiy all the required parameters to connect to our Aster cluster, plus our load table and the data file. We'll also tell the loader to skip the first row, as that contains the field names, plus we'll add the 'verbose' flag so that we can see all the processing details.
You'll see the processing details scroll on your screen; connecting to Aster, ready file formats and finally the output, showing just over 1 million records loaded in only a few seconds!
That's it. We now have some sample data to play with. Here's an example to show the distinct web pages visited by customers:
Also, if you want to load and play with the the other 2 data sets in the demo directory, here are the CREATE TABLE statements, along with the ncluster_load commands:
So there you go, with these 3 data sets, you have 25 million rows of sample data to play with. In our next tutorial we'll show some powerful Aster analytics that you can apply to these tables to yield new insights from your Big Data - Aster Act 3.