Archive for the ‘Cloud Computing’ Category

13
Jul
By Tasso Argyros in Cloud Computing on July 13, 2010
   

Amazon announced today the availability of special EC2 cloud clusters that are optimized for low-latency network operations. This is useful for applications in the so-called High-Performance Computing area, where servers need to request and exchange data very fast. Examples of HPC applications range from nuclear simulations in government labs to playing chess.

I find this development interesting, not only because it makes scientific applications in the cloud a possibility, but also because it’s an indication of where cloud infrastructure is heading.

In the early days, Amazon EC2 was very simple: if you wanted 5 “instances” (that is, 5 virtual machines), that’s what you got. However, memory of the instances was low, as well as disk capacity. Over time, more and more configurations were added and now one can choose an instance type from a variety of disk & memory characteristics with up to 15GB of memory and 2TBs of disks per instance. However, network was always a problem independently of the size of the instance. (According to rumors, EC2 would make things worse by distributing instances as far away from each other as possible in the datacenter to increase reliability - as a result, network latency would suffer.) Now, the network problem is being solved by means of these special “Cluster Compute Instances” that provide guaranteed, non-blocking access to a 10GbE network infrastructure.

Overall this course represents a departure from the super-simple black-box model that EC2 started from. Amazon - wisely - realizes that accommodating more applications requires transparency - and providing guarantees - for the underlying infrastructure. Guaranteeing network latency is just the beginning: Amazon has the opportunity add much more options and guarantees around I/O performance, quality of service, SSDs versus hard drives, fail-over behavior etc. The more options & guarantees Amazon offers the closer we’ll get to the promise of the cloud - at least for resource-intensive IT applications.



26
Apr
Dell and Aster Data: A Powerful Combination for Large-Scale Customers
By dkloc in Cloud Computing on April 26, 2010
   

If you read this blog, you’ve probably seen the news about the partnership between Aster Data and Dell on their new PowerEdge C-Series servers (link to their page). Together we have enabled some really successful customers such as MySpace and Mint.com and proven that Dell hardware with Aster Data software easily scales to support large-scale data warehousing and advanced analytics.

Aster Data CEO Mayank Bawa explains this combination in more detail, as well as Aster Data’s history and the distinct advantages offered by the partnership with Dell, including Online Precision Scaling™, out-of-the-box advanced in-database analytics, and always-on availability.



01
May
By Mayank Bawa in Cloud Computing on May 1, 2009
   

Data poetry by Mason Hale. Awesome!



02
Apr
Enterprise-Class MapReduce
By Shawn Kung in Blogroll, Cloud Computing on April 2, 2009
   

When Aster announced In-Database MapReduce last summer, we saw tremendous interest and intrigue. Today, Amazon announced that it is helping promote the use of parallel processing frameworks such as Hadoop (an open-source implementation of MapReduce) by making it available on EC2. (note: Aster announced production customers and availability of MapReduce on both Amazon’s EC2 and AppNexus in February)

Our vision was, and continues to be, to bring the power of MapReduce to a whole new class of developers and mission-critical enterprise systems. When would you use Aster’s In-Database MapReduce vs. a system like Hadoop? You need to ask a few questions as you think about this:

[1] Can I use my MapReduce system only for batch processing or can I do real-time reporting and analysis? Can I have a single system to do number-crunching AND needle-in-a-haystack summary or aggregation lookup? Can I get response to my short queries in seconds or do I need to wait for several minutes?

[2] How do I maximize developer productivity, using SQL for regular data processing and MapReduce for richer analysis?

[3] Do you only want to manage raw data files using file name conventions, or do you also want to use database primitives like partitions, tables, and views?

[4] How do I easily integrate the MapReduce system with my standard ETL and reporting tool, so I don’t have to reinvent the wheel on dashboards, scorecards, and reports?

[5] When I have such large data in an enterprise system, how do I control access to data and provide appropriate security privileges?

[6] Workload management: When I have invested in a system with hundreds or thousands of processors, how do I efficiently share it among multiple users and guarantee response-time SLAs?

[7] For mission-critical data-intensive applications, how do I do full and incremental backup and disaster recovery?

We conducted an educational webcast on MapReduce recently, together with a Stanford data mining professor, which details some of these differences further.

It’s great to see MapReduce going mainstream and companies such as Amazon supporting the proliferation of innovative approaches to the data explosion problem. Together, we hope to help build mind-share around MapReduce and help companies do more with their data. In fact, we welcome users to put Amazon Elastic MapReduce output into Aster nCluster Cloud Edition for persistence, sharing, reporting and easy fast concurrent access. Lots of Aster customers are using both and it’s easy to move data since Aster is on the same Amazon Web Services cloud.

Please contact us if you’d like help getting started with your MapReduce explorations. We conducted a web seminar to introduce you to the concept.



03
Mar
The Value of Flexibility
By Chris Neumann in Blogroll, Cloud Computing on March 3, 2009
   

As the Director of Technology Delivery for Aster Data Systems, I oversee the teams responsible for delivering and deploying our nCluster analytic database to customers and enabling prospective customers to evaluate our solutions effectively and efficiently.  Recently, Shawn posted on the release of Aster nCluster Cloud Edition and discussed how cloud computing enables business to scale their infrastructure without huge hardware investments.   As a follow-on, I’d like to let you know about how the flexibility provided by nCluster’s support of multiple platforms can reduce the time and costs associated with evaluating nCluster.

Evaluating enterprise software can be a costly effort in both time and money.  The process typically requires weeks of prep work by the evaluation team, possibly including purchasing different hardware for each vendor being evaluated.  Spending significant amounts of money and losing weeks of resource productivity to an evaluation is something few companies can afford to do, particularly in these uncertain times.

With our recent public release of Aster nCluster Cloud Edition, we now provide the most platform options of any major data warehouse vendor.  While it’s natural to focus on the flexibility this affords for production systems, it also allows us to be very flexible for enabling customers to try our solution:

Commodity Hardware Evaluation
Several warehouse vendors claim to support commodity hardware, but most are very closely tied to one “preferred” vendor.  Aster nCluster supports any x86-based hardware, meaning that you can evaluate us on either new hardware (if performance is a key aspect of the evaluation) or older hardware that is being repurposed (if you want to test the functionality of nCluster without buying new hardware).

Aster-Hosted Evaluation
Our data center in San Carlos, CA has racks of servers dedicated to customer evaluations.  With an Aster-hosted system, functional evaluations of nCluster can be performed with minimum infrastructure requirements.

Cloud Evaluation
With Aster nCluster Cloud Edition, custom-configured nClusters can be brought up in minutes on either Amazon EC2 or AppNexus.  POCs can be performed on one or multiple systems in parallel, with zero infrastructure requirements.  Your teams can evaluate all of nCluster’s functionality in the cloud, with complete control over sizing and scaling. (While other vendors have announced cloud offerings, we’re the only data warehouse vendor to have production customers on two separate cloud services).

Whether you’re building a new frontline data warehouse or looking to replace an existing system that doesn’t scale or costs too much, you should check us out.  We have a great product that’s turning heads as an alternative to overpriced hardware appliances for multi-TB data warehouses.  With all the flexibility our offerings provide, you can evaluate all the power of Aster nCluster without the costs of traditional POCs.

Give us a try and see everything you can do with Aster nCluster!



10
Feb
By Shawn Kung in Analytics, Blogroll, Cloud Computing on February 10, 2009
   

blue_sky11.jpg

Cloud computing is a fascinating concept.  It offers greenfield opportunities (or more appropriately, blue sky frontiers) for businesses to affordably scale their infrastructure needs without plunking down a huge hardware investment (and the space/power/cooling costs associated with managing your own hosted environment).  This removes the risks of mis-provisioning by enabling on-demand scaling according to your data growth needs.  Especially in these economic times, the benefits of Cloud computing are very attractive.

But let’s face it - there’s also a lot of hype, and it’s hard to separate truth from fiction.  For example, what qualities would you say are key to data warehousing in the cloud?

Here’s a checklist of things I think are important:

[1] Time-To-Scalability.  The whole point of clouds is to offer easy access to virtualized resources.  A cloud warehouse needs to quickly scale-out and scale-in to adapt to changing needs.  It can’t take days to scale…it has to happen on-demand in minutes (<1 hour).

[2] Manageability.  You go with clouds because you not only want to save on hardware, but also on the operational people costs of maintaining that infrastructure.  A cloud warehouse needs to offer one-click scaling, easy install/upgrade, and self-managed resiliency.

[3] Ecosystem.  While clouds offer *you* huge TCO savings, you can’t compromise service levels for your customers - especially if you run your business on the cloud.  BI/ETL/monitoring tools, Backup & Recovery, and ultra-fast data loading can’t be overlooked for “frontline” mission-critical warehousing on the cloud.

[4] Analytics.  Lots of valuable data is generated via the cloud and there are opportunities to subscribe to new data feed services.  It’s insufficient for a cloud warehouse to just do basic SQL reporting.  Rather, it must offer the ability to do deep analytics very quickly.

[5] Choice.  A truly best-in-class cloud warehouse won’t lock you in to a single cloud vendor.  Rather, it will offer portability by enabling you to choose the best cloud for you to run your business on.

Finally, here are a couple ideas on the future of cloud warehousing.  What if you could link multiple cloud warehouses together and do interesting queries across clouds?  And what about the opportunities for game-changing new analytics - with so many emerging data subscription services, wouldn’t this offer ripe opportunities for mash-up analytics (eg. using Aster SQL/MapReduce).

What do you think are the standards for “best-in-class” cloud warehousing?