Archive for July, 2008

On DATAllegro’s Acquisition by Microsoft

Stuart announced yesterday that Microsoft has agreed to acquire DATAllegro. It is pretty clear Stuart and his team have worked hard for this day: it is heartening to see that hard work gets rewarded sooner or later. Congratulations, DATAllegro!

Microsoft is clearly acquiring DATAllegro for its technology. Indeed, Stuart says that DATAllegro will start porting away from Ingres to SQL Server once the acquisition completes. Microsoft’s plan is to provide a separate offering from its traditional SQL Server Clustering.

In effect, this event provides a second admission from a traditional database vendor that OLTP databases are not up to the task for large-scale analytics. The first admission was in 1990s when Sybase (ironically, originator of SQL Server code base) offered Sybase IQ as a separate product from its OLTP offering.

The market already knew this fact: the key point here is that Microsoft is waking up to the realization.

A corollary is that it must have been really difficult for Microsoft SQL Server division to scale SQL Server for larger scale deployments. Clearly, Microsoft is an engineering shop and the effort of integrating alien technology into their SQL Server code-base must have been carefully evaluated for a build-vs-buy decision. The buy decision is a tacit admission that it is incredibly hard to scale their SQL Server offering with its roots in traditional OLTP database.

We can expect Oracle, IBM, and HP to have similar problems in scaling their 1980s code-base for the needs of data-scale and query-workloads of today’s data warehousing systems. Will the market wait for Oracle, IBM, and HP’s efforts to scale to come to fruition? Or will Oracle, IBM, and HP soon acquire companies to improve their own scalability?

It is interesting to note that DATAllegro will be moving to an all-Microsoft platform. The acquisition could also be read as a defensive move by Microsoft. All of the large-scale data warehouse offerings today are based on Unix variants (Unix/Linux/Solaris), thus leading to the uncomfortable situation at some all-Microsoft shops who chose to run Unix-based data warehouse offerings because SQL Server would not scale. Microsoft needed an offering that could preserve their enterprise-wide customers on Microsoft platforms.

Finally, there is a difference in philosophy between Microsoft’s and DATAllegro’s product offerings. Microsoft SQLServer has sought to cater to the lower end of the BI spectrum; DATAllegro has actively courted the higher end. Correspondingly, DATAllegro uses powerful servers, fast storage, and expensive interconnect to deliver a solution. Microsoft SQL Server has sought to deliver a solution at a much lower cost. We can only wait and watch: will the algorithms of one philosophy work well in the infrastructure of the other?

At Aster Data Systems, we believe that the market dynamics will not change as a result of this acquisition: companies will want the best solutions to derive the most value from data. In the last decade, Internet changed the world and old-market behemoths could not translate their might into the new market. In this decade, Data will produce a similar disruption.

A Belief in Partnerships
By Mayank Bawa in Blogroll, Statements on July 24, 2008

We took a decision early on in building the company that we’d  make our platform open in technology and have an inclusive philosophy on business.

I am glad to say that this year we have started delivering on our business philosophy.

We have good relationships with several smart consulting teams, and are actively working with them to bring innovative solutions to the market for our joint customers. We recently recommended a partner to a company where we were not a good fit because we felt that our partner could bring a lot of value to the prospect and that such introductions strengthen our extended network. We were genuinely surprised at the warmth it generated at both the company and the partner for us!

In the last few years, we’ve actively built our product to work on a variety of hardware platforms: we have customers running IBM, HP, Dell, and even white-box offerings! Earlier this week, we announced our partnership with Informatica. You will see a series of announcements appearing in the next few months.

We are actively looking for a person who can lead our efforts in establishing meaningful partnerships in the data warehousing space. If you know one, or are one, who shares an inclusive philosophy, drop us a note!

How to Answer Analytic Questions
By George Candea in Analytics, Business analytics on July 14, 2008

In a recent interview with Wired magazine, IBM’s Wattenberg mentioned an interesting yardstick for data analytics: compare the data you give to a human to the sum total of the words that human will hear in a lifetime, which is less than 1 TB of text. Incidentally, this 1 TB number is how big Gordon Bell thinks a lifetime of recording daily minutiae would be; Bell now has MyLifeBits, the most extensive personal archive, in which he records all his e-mails, photographs, phone calls, Web pages visited, IM conversations, desktop activity (like which apps he ran and when), health records, books in his library, labels of the bottles of wine he enjoyed, etc. His collection grows at about 1 GB / month, amounting to ~1 TB for a lifetime; and that’s what the human brain is built for.

Wattenberg comes with an interesting perspective: human language is a form of compression (“Twelve words from Voltaire can hold a lifetime of experience”). This is because of the strong contextual information carried by each phrase. MyLifeBits does not reflect the life experiences; it provides the bits from which those life experiences are built, through connections and interpretations.

Herein lies the challenge of data analytics: how to “compress” vast amounts of data into a small volume of information that the human brain can absorb, process, and act upon. How to leverage context in delivering answers, recommendations, and insights. The Web brought data out in the open, search engines allowed us to ask questions of this data, analytics engines are now starting to allow precise and deep questions to be asked of otherwise overwhelming amounts of data. We, as an industry, are just entering the Neolithic of information history.

We need breakthroughs in visualization and, in particular, in the way we leverage the context of previous answers. Researchers at University College London are looking into how the hippocampus encodes spatial and episodic memories; they are going as far as analyzing fMRI (functional MRI) scans of the brain to extract the memories stored in that brain. In computerized data analytics, we are faced with a relatively simpler task: record all past answers and then leverage this context in order to more effectively communicate new results. Understand how the current answer relates to the previous one, and deliver an interpretation of the delta. That’s where we would like to be, sooner rather than later.