blog   contact    
 
log: Winning with Data
1.888.Aster.Data Email
Posted on June 29th, 2009 by Mayank Bawa

We are announcing the availability of an Enterprise-Ready MapReduce Data Warehouse Appliance.

The appliance is powered by Dell hardware and Aster’s nCluster SQL/ MR database, with optional software for BI platform from Microstrategy and data modeling software from Aqua Data Studio.

Our product portfolio now allows our customers to get the benefits of our flagship Aster nCluster SQL/MR database in the packaging that they are most comfortable with - on-premise software, in-cloud service, or pre-packaged appliance.

The appliance offering packs a lot of punch compared to other data warehousing appliances in the market - it has the highest ratio of compute & memory to data sizes, allowing you to run rich queries on the appliance without breaking a sweat.

We are especially proud of the open nature of our appliance - the hardware is from Dell built from industry-standard components, the BI server is from Microstrategy, and the data modeling tool is from AquaFold (Aqua Data Studio). The appliance brings together industry-leading components of a full data warehouse stack together - all pre-tested and configured for optimal performance.

Even the programming of our appliance is open - our SQL/MR framework allows applications to push computation into the appliance using industry standard SQL augmented with MapReduce in the language of your choice (Java, C#, Perl, Python, etc.).

We have been approached by a number of customers seeking a get-started-quickly system, especially those groups of users and departments seeking a Hadoop framework to build their solutions upon.

In response to the requests, we are proud to announce an Express Edition of the appliance that is designed to work for upto 1TB of user data. And it comes in an even more attractive price - that of $50K only - complete with hardware and software!

Give us a call - we’ll get your warehouse setup on our appliance to ensure that the time-to-first-query is measured in hours, not months!

Posted on June 19th, 2009 by rpai

I am excited to be announcing Aster’s Global Partner Program which will be singularly focused on empowering our software and service provider partners grow robust, profitable businesses solving rich, big data analytics challenges for end customers.

Aster is leading a revolution in frontline data warehousing and analytic solutions and has shown success with several marquee customers. Earlier this year we launched our channel efforts and as I look ahead channel partners will play a critical role in Aster’s strategy and success. Through industry domain expertise, specialized data management knowledge and experience, our partners extend Aster’s offerings helping our customers maximize their investment and benefit from innovative “Aster-powered” solutions.

In this blog, I want to focus on outlining the differentiating value Aster brings to partner service providers and independent software (application) vendors.

In a recent analyst briefing, after presenting Aster’s company and product differentiators, I was asked why a service provider (system integrator) gets excited about working with Aster.

I responded with my top 3 reasons:

- Gain access to new “big-data” (e.g. risk management, customer targeting, churn analysis, customer behavior analysis) projects at enterprises across vertical domains

- Deliver real economic benefits to enterprises by changing the business of enterprise data warehousing (challenging current norms of scale, performance and price)

- Opportunity to build a high margin, competitive, domain-specific services practice working w/ the world-class Aster product team

As we continue to push the envelope of technical innovation with our application-friendly relational database, we are witnessing a surge of interest from application software vendors (and developers) who realize analytics and big data management cannot be an after-thought.

Savvy application developers realize storing and analyzing structured and semi-structured user and usage data are critical to success. Being able to plan and provision a robust, proven internet-scale database for current and growing data needs is now a necessity.

With the rapid consolidation in the enterprise application market (thank you, Oracle) and constant pressure to harvest economic value of business data, we notice the shift in application development:

- Application developers and vendors want a data platform which scales elastically with business (and not held captive to proprietary hardware vendor lock-ins) while being flexible to be deployed on-premise or in the cloud

- Demand for a data platform which can seamlessly embrace the power of relational (SQL) with modern frameworks for big data processing (MapReduce) with overall lower total cost of ownership so developers can focus on applications (and not manageability, scalability, reliability of data)

To get more details check out http://www.asterdata.com/partners/index.php or to join as a partner click on http://www.asterdata.com/partners/application.php

Posted on June 9th, 2009 by Peter Pawlowski

The Aster SQL/MapReduce framework allows developers to push analytics code for applications closer to the data in the database, without dealing with the headaches of extracting and analyzing data outside of the database. We’ve supported a variety of language from day one, including Java, Python, and Perl. Today we’re pleased to announce official support for the .NET family of languages via Mono, an excellent open source .NET implementation. This will allow developers who use .NET languages like C# and VB (and, of course, F#) to more easily leverage nCluster for massively parallel analytics.

Our .NET support is enabled through our Stream SQL/MR function, which allows users to process data via a simple streaming interface: provide a program that reads rows from the console (stdin) and writes rows back to the console (stdout). Let’s consider a simple C# program called Tokenize, which splits incoming rows into tokens, and then output each token (one per line):

net-blog-post-code.jpg

To run this program over data stored in nCluster, a developer just needs to compile the above Tokenize.cs into Tokenize.exe with a C# compiler (in our case, the Mono C# compiler gmcs). With the compiled executable in hand, one command in our terminal client will install it in nCluster. The program can be then invoked from SQL. The below example will run the program over all the rows in the documents table, outputting a table with a single column (token). Each row in the result of the query will correspond to a single token in the input documents.

net-blog-post-code_2.jpg

It’s as simple as that. We hope our new .NET support will enable an ever-broader group of developers take advantage of SQL/MR, our in-database analytics technology!If you’re interested in learning more, please check out a host of new resources around our implementation of MapReduce within Aster nCluster including example applications and code.

Posted on June 5th, 2009 by Mayank Bawa

 Rajeev was a close friend and a cherished mentor. We were saddened to hear the news today and we will miss him dearly. Our thoughts are with his family.

Posted on May 1st, 2009 by Mayank Bawa

Data poetry by Mason Hale. Awesome!

A big congratulations to our CTO and Co-Founder, Tasso Argyros, who has been recognized as one of BusinessWeek’s Best Young Tech Entrepreneurs for 2009. I’d have given him a run for his spot, but I am over-the-hill and probably too old to run the distance - I wish they’d start a list for Best Entrepreneurs under the age of 40 :-)

Tasso’s hard work, dedication, confidence and vision have been a huge part of our success to date, and we know they will be a big part of great things ahead for Aster. Congratulations to you, and to all the other great companies that made the list as well; it’s an honor for them to be recognized alongside you.

Posted on April 22nd, 2009 by Peter Pawlowski

Aster’s SQL/MR framework (In-Database MapReduce) enables our users to write custom analytic functions (SQL/MR functions) in a programming language like Java or Python, install them in the cluster, and then invoke them from SQL to analyze data stored in nCluster database tables. These SQL/MR functions transform one table into another, but do so in a massively parallel way. As increasingly valuable analytic functions are pushed into the database, the value of constructing a data structure once, and reusing it across a large number of rows, increases substantially. Our API was designed with this in mind.

What’s the SQL/MR API look like? The SQL/MR function is given an iterator to a set of input rows, as well as an emitter for outputting rows. We decided on this interface for a number of reasons, with one of the most important being the ability to maintain state between rows. We’ve found that many useful analytic functions need to construct some state before processing a row of input, and this state construction should be amortized over as many rows as possible.

Here’s a wireframe of one type of SQL/MR function (a RowFunction):

class RealAsterFunction implements RowFunction
{
  void operateOnSomeRows(RowIterator iterator, RowEmitter outputEmitter)
  {
   //
   // Construct some data structure to enable fast processing.
   //
   ...
   //
   // Read a row from iterator, process it, and emit a result.
   //
   ...
 }
}

When this SQL/MR function is invoked in nCluster, the system starts several copies of this function on each node (think: one per CPU core). Each function is given an iterator to the rows that live in its local slice of the data. An alternative design, which is akin to the standard scalar UDF, would have been:

class NotRealAsterFunction implements PossibleRowFunction
{
 static void operateOnRow(Row currentRow, RowEmitter outputEmitter)
 {
   //
   // Process the given row and emit a result.
   //
   ...
 }
}

In this design, the static operateOnRow method would be called for each row in the function’s input. State can no longer be easily stored between rows. For simple functions, like computing the absolute value or a substring of a particular column, there’s no need for such inter-row state. But, as we’ve implemented more interesting analytic functions, we’ve found that enabling the storage of such state, or more specifically paying only once for the construction of something complex and then reusing it, has real value. Without the ability to save state between rows, the construction of this state would dominate the function’s execution.

Examples abound. Consider a SQL/MR function which applies a complex model to score the data in the database, whether it’s scoring a customer for insurance risk, scoring an internet user for an ad’s effectiveness, or scoring a snippet of text for its sentiment. These functions often construct a data structure in memory to accelerate scoring, which works very well with the SQL/MR API: build the data structure once and reuse it across a large number of rows.

A sentiment analysis SQL/MR function, designed to classify a set of notes written up by customer service reps or a set of comments post on a blog, would likely first build a hash table of words to sentiment scores, based on some dictionary file. This function would then iterate through each snippet of text, converting each word to its stem and then doing a fast lookup via the hash table. Such a persistent data structure accelerates the sentiment scoring of each text snippet.

Another example is Aster’s nPath SQL/MR function. At a high level, this function looks for patterns in ordered data, with the pattern specified with a regular expression. When nPath runs, it converts the pattern into a data structure optimized for fast, constant-memory pattern matching. If state couldn’t be maintained between rows, there’d be a large price to reconstructing this data structure on each new row.

Repeating the high bit: as increasingly valuable analytic functions are pushed into the database, the value of constructing a data structure once, and reusing it across a large number of rows, increases substantially. The SQL/MR API was designed with this in mind.

Posted on April 7th, 2009 by Steve Wooledge

I’m delighted to welcome Specific Media to the quickly-growing family of Aster customers! I had the pleasure of briefly meeting the folks from Specific Media in our offices last week. Similar to Aster, Specific Media is incredibly focused on doing more with data to increase the value they provide to their customers: advertisers which represent 300 of the top Fortune 500 brands.

They’re also really smart and humble about what they do, which makes it a pleasure to work with them. And what you wouldn’t know from a brief introduction is how cutting-edge their analytic methodologies and capabilities are.  We’re just starting our partnership together and hope to have some success metrics to share later about how they are using the Aster nCluster database for their data warehouse. They have some interesting ideas for using the Aster In-Database MapReduce framework to perform rich analysis of data efficiently for improved ad targeting and relevancy.

Posted on April 6th, 2009 by Tasso Argyros


When Mayank, George and I were at Stanford one of the things that brought us together was a shared vision of how the world could benefit with a more scalable database to address exploding volumes of data.  This led to the birth of Aster Data Systems and our flagship product, Aster nCluster, a highly scalable relational database system for what we call “frontline” data warehousing – an intersection of large data volumes, rich analytics, and mission-critical availability.

One way we found to solve the problem of managing and analyzing so much data was by implementing In-Database MapReduce. MapReduce is a programming model that was popularized at Google in 2003 to process large unstructured data sets distributed across thousands of nodes, and at Stanford we worked with some of the professors that had worked with the Google founders. In-Database MapReduce enables enterprises to harness the power of MapReduce while managing their data in Aster nCluster. Just like its massively parallel execution environment for standard SQL queries, Aster nCluster adds the ability to implement flexible MapReduce functions for parallel data analysis and transformation inside the database.

Much of the work of the Aster Data team is “fusing” best practices from the relational database world with innovations that Google pioneered for distributed computing; this takes strong engineering, so it’s no wonder that we are an engineering-driven company with some of the best minds available on our team.  Of our 26 engineers on staff, there are seven PhDs, and six PhDs on leave.  Over time in this blog I plan to highlight the members of the Aster team that help make nCluster a reality.

One key member is Dr. Mohit Aron. Mohit is an architect, and his focus is on the distributed aspects of the nCluster architecture. His achievements include the delivery of several key projects at Aster, notably in areas related to quality of service, SQL/MR, compression, performance, and fault-tolerance.

Before joining Aster Data Systems, Mohit was a Staff Engineer at Google Inc where he was one of the lead designers of the super-scalable award winning Google File System. Dr. Aron has held senior technical positions in industry where his work has focused on scalable cluster-based storage and database technologies.  He received his B.Tech degree from the Indian Institute of Technology, New Delhi and his M.S. and Phd from Rice University, Houston. His graduate research focused on high performance networking and cluster-based web server systems. He was one of the primary contributors to the ScalaServer project and won numerous best paper awards at prestigious conferences.

I am also very glad today to announce that another key member of our organization, Dheeraj Pandey, has been promoted to VP of Engineering. He has been with Aster ever since September ‘07. Dheeraj has played an instrumental role in building this strong team together with me. He has been my alter ego all this while, as we shipped two major releases and four patchsets in the last 19 months. Beyond the tangibles, he has an acute focus on nurturing emotional intelligence within the engineering organization. Too many organizations, with strong technical mindsets, falter because people begin to underemphasize the value of honest communication, trust, and self-awareness. I am proud that we are building a culture, from very early on, which will endure the test of time as the company grows.

Dheeraj came to Aster from Oracle Corporation, where he managed the storage engine of the database. Under his leadership, Oracle built the unstructured data management stack, called Oracle SecureFiles, from the ground up. He also led the development of Oracle 11g Advanced Compression Option for both structured and unstructured data. Dheeraj has co-invented several patent-pending algorithms on database transaction management, Oracle Real Application Clusters, and Data Compression. Previously, he was building commodity-clustered fileservers at Zambeel. In the past 10 years of his industry career, he has developed diverse software for midtier Java/COM applications to fileservers, databases, and firmware in storage switches. Dheeraj received an M.S. in Computer Science from The University of Texas (Austin), where he was a doctoral fellow. He received a B.Tech. in Computer Science from the IIT Kanpur, where he was judged the “Best All-Rounder Student Among All Graduating Students in All Disciplines.”

I am confident that, as an innovation-driven company, we are entrusting one of  our most critical functions, Engineering, in very safe hands.

I hope you continue to watch this space for updates on Aster, our products, and our people.

Posted on April 2nd, 2009 by Shawn Kung

When Aster announced In-Database MapReduce last summer, we saw tremendous interest and intrigue. Today, Amazon announced that it is helping promote the use of parallel processing frameworks such as Hadoop (an open-source implementation of MapReduce) by making it available on EC2. (note: Aster announced production customers and availability of MapReduce on both Amazon’s EC2 and AppNexus in February)

Our vision was, and continues to be, to bring the power of MapReduce to a whole new class of developers and mission-critical enterprise systems. When would you use Aster’s In-Database MapReduce vs. a system like Hadoop? You need to ask a few questions as you think about this:

[1] Can I use my MapReduce system only for batch processing or can I do real-time reporting and analysis? Can I have a single system to do number-crunching AND needle-in-a-haystack summary or aggregation lookup? Can I get response to my short queries in seconds or do I need to wait for several minutes?

[2] How do I maximize developer productivity, using SQL for regular data processing and MapReduce for richer analysis?

[3] Do you only want to manage raw data files using file name conventions, or do you also want to use database primitives like partitions, tables, and views?

[4] How do I easily integrate the MapReduce system with my standard ETL and reporting tool, so I don’t have to reinvent the wheel on dashboards, scorecards, and reports?

[5] When I have such large data in an enterprise system, how do I control access to data and provide appropriate security privileges?

[6] Workload management: When I have invested in a system with hundreds or thousands of processors, how do I efficiently share it among multiple users and guarantee response-time SLAs?

[7] For mission-critical data-intensive applications, how do I do full and incremental backup and disaster recovery?

We conducted an educational webcast on MapReduce recently, together with a Stanford data mining professor, which details some of these differences further.

It’s great to see MapReduce going mainstream and companies such as Amazon supporting the proliferation of innovative approaches to the data explosion problem. Together, we hope to help build mind-share around MapReduce and help companies do more with their data. In fact, we welcome users to put Amazon Elastic MapReduce output into Aster nCluster Cloud Edition for persistence, sharing, reporting and easy fast concurrent access. Lots of Aster customers are using both and it’s easy to move data since Aster is on the same Amazon Web Services cloud.

Please contact us if you’d like help getting started with your MapReduce explorations. We conducted a web seminar to introduce you to the concept.

Copyright © 2008 Aster Data Systems, Inc. All rights reserved.