02
Apr
By Shawn Kung in Blogroll, Cloud Computing on April 2, 2009
   

When Aster announced In-Database MapReduce last summer, we saw tremendous interest and intrigue. Today, Amazon announced that it is helping promote the use of parallel processing frameworks such as Hadoop (an open-source implementation of MapReduce) by making it available on EC2. (note: Aster announced production customers and availability of MapReduce on both Amazon’s EC2 and AppNexus in February)

Our vision was, and continues to be, to bring the power of MapReduce to a whole new class of developers and mission-critical enterprise systems. When would you use Aster’s In-Database MapReduce vs. a system like Hadoop? You need to ask a few questions as you think about this:

[1] Can I use my MapReduce system only for batch processing or can I do real-time reporting and analysis? Can I have a single system to do number-crunching AND needle-in-a-haystack summary or aggregation lookup? Can I get response to my short queries in seconds or do I need to wait for several minutes?

[2] How do I maximize developer productivity, using SQL for regular data processing and MapReduce for richer analysis?

[3] Do you only want to manage raw data files using file name conventions, or do you also want to use database primitives like partitions, tables, and views?

[4] How do I easily integrate the MapReduce system with my standard ETL and reporting tool, so I don’t have to reinvent the wheel on dashboards, scorecards, and reports?

[5] When I have such large data in an enterprise system, how do I control access to data and provide appropriate security privileges?

[6] Workload management: When I have invested in a system with hundreds or thousands of processors, how do I efficiently share it among multiple users and guarantee response-time SLAs?

[7] For mission-critical data-intensive applications, how do I do full and incremental backup and disaster recovery?

We conducted an educational webcast on MapReduce recently, together with a Stanford data mining professor, which details some of these differences further.

It’s great to see MapReduce going mainstream and companies such as Amazon supporting the proliferation of innovative approaches to the data explosion problem. Together, we hope to help build mind-share around MapReduce and help companies do more with their data. In fact, we welcome users to put Amazon Elastic MapReduce output into Aster nCluster Cloud Edition for persistence, sharing, reporting and easy fast concurrent access. Lots of Aster customers are using both and it’s easy to move data since Aster is on the same Amazon Web Services cloud.

Please contact us if you’d like help getting started with your MapReduce explorations. We conducted a web seminar to introduce you to the concept.


Comments:
Robert Mahfoud on April 13th, 2009 at 1:55 pm #

Why doesn’t Aster provide a freely available development version of its parallel DB product??
It sounds interesting, and for someone looking to deploy Hive on my small sandbox of a few hosts, it would be great to be able to test nCluster, be it locally or on the cloud.

Steve Wooledge on April 15th, 2009 at 1:27 pm #

Robert,

Thanks for your comment. We are definitely exploring the most effective way to make our nCluster software available for testing. We are exploring several options, including via cloud, installation on local servers, and more. Stay tuned.

Thanks
Steve

Post a comment

Name: 
Email: 
URL: 
Comments: