Archive for the ‘Blogroll’ Category

By Tasso Argyros in Analytics, Blogroll, Data-Analytics Server on June 23, 2010

Recently, a journalist called to ask about in-memory data processing, a very interesting subject. I always thought that in-memory processing will be more and more important as memory prices keep falling drastically. In fact, these days you can get 128GB of memory into a single system for less than $5K plus the server cost, not to mention that DDR3 and multiple memory controllers are giving a huge performance boost. And if you run software that can handle shared-nothing parallelism (MPP), your memory cost increases linearly, and systems with TBs of memory are possible.

So what do you do with all that memory? There are two classes of use cases that are emerging today. First is the case where you need to increase concurrent access to data with reduced latency. Tools like memcached offer in-memory caching that, used properly, can vastly improve latency and concurrency for large-scale OLTP applications like websites. Also the nice thing with object caching is that it scales well in a distributed way and people have build TB-level caches. Memory-only OLTP databases have started to emerge, such as VoltDB. And memory is used implicitly as a very important caching layer in open-source key-value products like Voldemort. We should only expect memory to play a more and more important role here.

The second way to use memory is to gain “processing flexibility” when doing analytics. The idea is to throw your data into memory (however much it fits, of course) without spending much time thinking how to do that or what queries you’ll need to run. Because memory is so fast, most simple queries will be executed at interactive times and also concurrency is handled well. European upstart QlikView exploits this fact to offer a memory-only BI solution which provides simple and fast BI reporting. The downside is its applicability to only 10s of GBs of data as Curt Monash notes.

By exploiting an MPP shared-nothing architecture, Aster Data has production clusters with TBs of total memory. Our software takes advantage of memory in two ways: first, it uses caching aggressively to ensure the most relevant data stays in memory; and when data is in memory, processing is much faster and more flexible. Secondly, MapReduce is a great way to utilize memory as it provides full flexibility to the programmer to use memory-focused data structures for data processing. In addition, Aster Data’s SQL-MapReduce provides tools to the user to encourage the development of memory-only MapReduce applications.

However, one shouldn’t fall into the trap of thinking that all analytics will be in-memory anytime soon. While memory is down to $30/GB, disk manufacturers have been busy increasing platter density and dropping their price to less than $0.06/GB. Given that the amount of data in the world grows faster than Moore’s law and memory, there will always be more data to be stored and analyzed than what fits into any amount of memory that an enterprise can use. In fact, most big data applications will have data sets that do not fit into memory because, while tools like memcached worry only about the present (e.g. current Facebook users), analytics need to worry about the past, as well - and that means much more data. So a multi-layer architecture will be the only cost-effective way of analyzing large amounts of data for some time.

One shouldn’t be discussing memory without mentioning solid-state disk products (like Aster Data partner company Fusion-io). SSDs are likely to make the surprise here given that their per-GB price is falling faster than disks (being a solid-state product that follows Moore’s law does help). In the next few years we’ll witness SSDs in read-intensive applications providing similar advantages to memory while accommodating much larger data sizes.

By Tasso Argyros in Blogroll, Data-Analytics Server on June 22, 2010

Rumors abound that Intel is “baking”? the successor of the very successful Nehalem CPU architecture, codenamed Westmere. It comes with an impressive spec: 10 CPU cores (supporting 20 concurrent threads) packed in a single chip. You can soon expect to see 40 cores in middle range 4-socket servers - a number hard to imagine just five years ago.

We’re definitely talking about a different era. In the old days, you could barely fit a single core in a chip. (I still remember 15 years ago when I had to buy and install a separate math co-processor on my Mac LC to run Microsoft Excel and Mathematica.) And with the hardware, software has to change, too. In fact, modern software means software that can handle parallelism. This is what makes MapReduce such an essential and timely tool for big data applications. MapReduce’s purpose in life is to simplify data and processing parallelism for big data applications. It gives ample freedom to the programmer on how to do things locally; and takes over when data needs to be communicated across processes/cores/servers, thus evaporating a lot of the parallelism complexity.

Once someone designs their software and data to operate in a parallelized environment using MapReduce, gains will come on multiple levels. Not only will MapReduce help your analytical applications scale across a cluster of servers with terabytes of data, it will also exploit the billions of transistors and the 10s of CPU cores inside each server. The best part: the programmer doesn’t need to think about the difference.

As an example, consider this great paper out of Stanford discusses MapReduce implementations of popular Machine Learning algorithms. The Stanford researchers considered MapReduce as a way of “porting”? these algorithms (traditionally implemented to run in a single CPU) to a multi-core architecture. But, of course, the same MapReduce implementations can be used to scale these algorithms across a distributed cluster as well.

Hardware has changed - MPP, shared-nothing, commodity servers, and, of course, multi-core. In this new world MapReduce is software’s response for big data processing. Intel and Westmere have just found an unexpected friend.

By Steve Wooledge in Blogroll on June 14, 2010

As the market around big data heats up, it’s great to see the ecosystem for Hadoop, MapReduce, and massively parallel databases expanding. This includes events for education and networking around big data.

As such, Aster Data is co-sponsoring our first official “unconference” the night before the 2010 Hadoop Summit. It’s called BigDataCamp and will be June 28th at the TechMart from 5:00-9:30PM (adjacent to the Hyatt where Hadoop Summit is taking place). Similar to our ScaleCamp event last year where we heard from companies like LinkedIn and ShareThis and industry practitioners like Chris Wensel (author of Cascading), there will be a lineup of great talks, including hands-on workshops led by Amazon Web Services, Karmasphere, and more. In addition, we’re lucky to have Dave Nielsen as the moderator/organizer of the event as he’s chaired similar unconferences such as CloudCamp, and is an expert at facilitating content and discussions to best fit attendee interest.

It’s very fitting to have the more open/dynamic agenda style of an unconference given the audience will be more of the “analytic scientists” - a title which I’ve seen LinkedIn use when describing the rise in job roles dedicated to tackling big data in companies to tease out insights and develop data-driven products and applications. The analytic scientist-customers I speak with who use Aster Data together with Hadoop challenge the norms and move quickly - not unlike an unconference agenda. I expect a night of free thinking (and free drinks/food), big ideas, and a practical look at emerging technologies and techniques to tackle big data. Best of all, the networking portion is a great chance to meet folks to hear what they’re up to and exchange ideas.

Check out the agenda at and note that seats are limited and we expect to sell out, so please REGISTER NOW. Hope to see you there!

By Tasso Argyros in Analytics, Blogroll on April 16, 2010

This Monday we announced a new web destination for MapReduce, At a high level, this site is the first consolidated source of information & education around MapReduce, the groundbreaking programming model which is rapidly revolutionizing the way people deal with big data. Our vision is to make this site the one-stop-shop for anyone looking to learn how MapReduce can help analyze large amounts of data.

There were a couple reasons why we thought the world of big data analytics needed a resource like this. First, MapReduce is a relatively new technology and we are constantly getting questions from people in the industry wanting to learn more about it, from basic facts to using MapReduce for complex data analytics at Petabyte scale. By placing our knowledge and references in one public destination, we hope to build a valuable self-serve resource to educate many more people than what we could ever reach directly. In addition, we were motivated by the fact that most MapReduce resources out there focus more on specific implementations of MapReduce, which fragments the available knowledge and reduces its value. In this new effort we hope to create a multi-vendor & multi-tool resource which will benefit anyone interested in MapReduce.

We’re already working with analysts such as Curt Monash, Merv Adrian, Colin White and James Kobielus to syndicate their MapReduce-related posts. Going forward, we expect even more analysts, bloggers, practitioners, vendors, and academics to contribute. If traffic grows like we expect, we may eventually add a community forum to aid in interaction and sharing of knowledge and best practices.

I hope you enjoy surfing this new site! Free to email me for any suggestions as we work to make more useful for you.

By rpai in Analytics, Blogroll, Frontline data warehouse, TCO on February 22, 2010


Today Aster took a significant step and made it easier for developers building fraud detection, financial risk management, telco network optimization, customer targeting and personalization, and other advanced, interactive analytic applications.

Along with the release of Aster Data nCluster 4.5, we added a new Solution Partner level for systems integrators and developers.

Why is this relevant?

Recession or no-recession, IT executives are constantly challenged. They are asked to execute strategies based on better analytics and information to improve effectiveness of business processes (customer loyalty, inventory management, revenue optimization, ..), while staying on top of technology-based disruptions and managing (shrinking or flat) IT budgets.

IT organizations have taken on the challenge by building analytics-based offeringsleveraging existing data management skills and increasingly taking advantage of MapReduce, a disruptive technology introduced by Google and now being rapidly adopted by mainstream enterprise IT shops in Finance, Telco, LifeSciences, Govt. and other verticals.

As MapReduce and big data analytics goes mainstream, our customers and ecosystem partners have asked us to make it easier for their teams to leverage MapReduce across enterprise application lifecycles, while harvesting existing IT skills in SQL, Java and other programming languages. The Aster development team that brought us the SQL-MapReduce® innovation, has now delivered the market’s first integrated visual development environment for developing, deploying and managing MapReduce and SQL-based analytic applications.

Enterprise MapReduce developers and system integrators can now leverage the integrated Aster platform and deliver compelling business results in record time.

We are also teaming up with leaders in our ecosystem like MicroStrategy to deliver an end-to-end analytics solution to our customers that includes SQL/MapReduce enabled reporting and rich visualization. Aster is proud to be driving innovation in the Analytics and BI market and was recently honored at MicroStrategy’s annual customer conference.

I am delighted with the rapid adoption of Aster Data’s platform by our partners and the strong continued interest from enterprise developers and system integrators in building big data applications using Aster. New partners are endorsing our vision and technical innovation as the future of advanced analytics for large data volumes.

Sign up today to be an Aster solution partner and join the revolution to deliver compelling information and analytics-driven solutions.


By Steve Wooledge in Blogroll on February 16, 2010

We’re a couple of days away from our second Big Data Summit event in as many calendar quarters, and it’s shaping up to be jam-packed with good presentations and conversation around innovations in data management and advanced analytics.  For people not aware, we’re conducting Big Data Summits regionally in North America and Europe with an eye on helping educate organizations who are looking for ways to tackle the enormous amount of data growing in (and outside) of their four walls. More importantly, we’re helping people answer the question, “what do I do with all this data?”.  If you can’t make it to the Bay Area on Feb 18th, look for one coming soon to a venue near you.  Here are some previews of what’s coming on Thursday:

1) Intuit (formerly will be talking about how they anonymize consumer data and provide financial benchmarks and insights to individuals.  e.g., Do you spend more on dinner than the average Joe or Jane in California?

2)  Mobclix, a mobile advertising network, will discuss how they enable application developers for the iPhone, Android, BlackBerry, etc. to make more money by providing targeting advertising based on better analytics. (And how they’ve deployed it on Amazon Web Services to scale up & down on demand)

3) Merv Adrian from IT Market Strategy will keynote to talk about trends in data management and the questions you need to ask yourself as you consider ways to tackle big data problems. He’s also moderating a panel that I’m hearing will have some cameo appearances from other industry analysts with big brains.

4) Dell is our platinum sponsor and they’ll be talking about some new hardware they’re rolling out for big data computing and putting on display at the show.

5) Our CEO and co-founder will be talking about our approach to harnessing the power of big data and giving a glimpse of the future from Aster Data.

6) Other sponsors include Amazon Web Services, Informatica, and a new partner of ours, Impetus.

Hope to see you at the show.

By rpai in Blogroll on November 11, 2009

Last week I attended Bank of America’s Technology Innovation Summit in Silicon Valley. In attendance were leading technology executives from Bank of America who outlined needs and challenges for the global banking giant. BofA’s annual IT spend is greater than $5 Billion, serving  almost 59 million, or one out of every two U.S. households and distribution strength of about six thousand branches, 18,000 ATMs and 24 million online banking customers, and more than 3,000 customer touches every second. Key themes discussed involved Cloud computing, Information Management, Security, Mobility and Green IT. And as I sat through the panel discussions and spoke to some of the IT leaders, it became evident that underpinning all the major business and IT initiatives for the global bank was a central theme -Lots of data, need for better and faster insights.

A senior BofA IT executive stated “Broad BI and data mining remain objectives, not realized goals”. There was a high level of interest in analytics and a big drive to be information-driven across business units.

Clearly, for a large bank like BofA, the business drivers exist. For example, the consumer channels executive was interested in understanding consumer behavior across different channels. In a saturated marketplace for retail customers and facing stiff competition from Chase (now owns WaMu), Wells Fargo (now owns Wachovia), BofA is keenly interested in strengthening its bond with its existing customer base.  With thousands of interactions per second, every interaction with the customer is an opportunity to learn more about customer behavior and customer preferences.

 In the credit card division, early detection of fraud patterns can translate into big savings for a market that is undergoing dramatic transformation due to reforms mandated by Congress. 

On the IT front, BofA has lots of existing investments in BI tools and data management software.

So where is the gap? Why are BI/data mining unrealized goals?

The answer lies in re-thinking and challenging the status quo in data management and analytic application development in today’s big data IT environments. Google, Amazon, and other innovators are leading this and it is only a matter of time before leaders in the financial services industry follow suit. A new mandate and architecture for big data applications  is emerging.

This new class of analytic applications will require a strategic investment in infrastructure that embraces assimilating advanced analytics processing right next to the terabytes to petabytes of enterprise data for key business initiatives including

  • Customer service effectiveness to predict customer requirements as well as fully understand customer relationships across branch office, ATM, online, and mobile channels
  • Ability to respond faster to regulators or to management and driving decisions based on insights driven from accurate, timely data

Broader, more pervasive BI and richer Analytics is on the threshold of becoming a reality!

By Steve Wooledge in Blogroll on November 10, 2009

Aster Data 4.0 is here and for those of you who subscribe to Aster Data’s blog, “Winning with Data”, you may have noticed that we’ve changed things up a bit.  This blog is now called the “Big Data Blog” and will continue to be a mash-up of opinions and news from the team at Aster Data. Topics will continue to be a mix of technical deep-dives as well as company announcements and content.

At the same time, our CEO and co-founder Mayank will be sharing his thoughts on a separate blog called “Winning with Data” where he will talk about his perspectives of the market trends, customer use-cases, technology evolution and company growth. You can find all of his previous posts there, as well as fresh content starting with the announcement of Aster Data’s massively parallel data-application server.

By Steve Wooledge in Blogroll, nPath on October 15, 2009

We just wrapped up our first of a two-part series on Mastering MapReduce together with Curt Monash. We’ve spent a lot of time discussing MapReduce with Curt and wanted to help educate the community on exactly what it is and how it applies to data management and analysis.  We’ve published the recorded webcast and below are the slides we presented from an Aster Data perspective which outline:

- What is Aster Data’s SQL-MapReduce?
- Example industry applications of SQL-MapReduce
- Walking through the SQL-MapReduce syntax

Mastering MapReduce: MapReduce for Big Data Management and Analysis
View more presentations from AsterData.

Curt has also posted his slides on DBMS2 with a great overview on dispelling the myths around MapReduce, and how MapReduce and SQL play nicely with each other.

We had great turn-out and questions from the sessions.  If you have any questions after reviewing the material, please drop a comment.

By Steve Wooledge in Blogroll on October 5, 2009

On Friday we had series of events and announcements around our new Aster-Hadoop Data Connector which utilizes key new SQL-MapReduce functions to provide ultra-fast, two-way data loading between HDFS (Hadoop Distributed File System) and Aster Data’s MPP data warehouse.

In addition to the Big Data Summit we held in New York City (which we’ll detail in a separate post), Colin White presented on a Webcast with Aster on the various use-cases for Hadoop within data warehouse environments. Colin does a great job summarizing what Hadoop is, how it’s different from an RDBMS, the different types of users for each, and how they co-exist nicely in customer environments.

Below are the slides to view if you weren’t able to attend the event, which will also be available for on-demand viewing soon in our resource library.

Making Sense of Hadoop - It's Fit with Data Warehousing Solutions
View more presentations from AsterData.