Archive for November, 2010

By Barton George in Analytics, Cloud Computing on November 19, 2010

Barton George is Cloud Computing and Scale-Out Evangelist for Dell.

Today at a press conference in San Francisco we announced the general availability of our Dell cloud solutions. One of the solutions we debuted was the Dell Cloud Solution for Data Analytics, a combination of our PowerEdge C servers with Aster Data’s nCluster, a massively parallel processing database with an integrated analytics engine.

Earlier this week I stopped by Aster Data‘s headquarters in San Carlos, CA and met up with their EVP of marketing, Sharmila Mulligan. I recorded this video where Sharmila discusses the Dell and Aster solution and the fantastic results a customer is seeing with it.

Some of the ground Sharmila covers:

  • What customer pain points and problems does this solution address (hint: organizations trying to manage huge amounts of both structured and unstructured data)
  • How Aster’s nCluster software is optimized for Dell PowerEdge C2100 and how it provides very high performance analytics as well as a cost effective way to store very large data.
  • (2:21) InsightExpress, a leading provider of digital marketing research solutions, has deployed the Dell and Aster analytics solution and has seen great results:
    • Up and running w/in 6 weeks
    • Queries that took 7-9 minutes now run in 3 seconds

Pau for now…

Extra-credit reading

By Mayank Bawa in Analytics, MapReduce on November 9, 2010

It’s ironic how all of a sudden Vertica is changing its focus from being a column-only database to claiming to be an Analytic Platform.

If you’ve used an Analytic Platform you know it’s more than just bolting in a layer of analytic functions on top of a database. But that’s how Vertica claims it’s now a full-blown analytic platform when in fact their analytics capability is rather thin. For instance, their first layer is a pair of window functions (CTE and CCE). The CCE window function is used, for example, to do sessionization. Vertica has a blog post that posits sessionization as a major advanced analytic operation. In truth, Vertica’s sessionization is not analytics. It is a basic data preparation step that adds a session attribute to each clickstream event so that very simple session-level analytics can be performed.

What’s interesting is the CCE window function is simply a pre-built function – some might say just syntactic sugar - that combines the functionality of finite width SQL window functions (LEAD/LAG) with CASE statements (WHEN condition THEN predicate). Nothing ground breaking to say the least!

For example, the CTE query referred to in a Vertica blog post can be rewritten very simply using SQL-99:

symbol, bid, timestamp,
OVER (PARTITION BY symbol ORDER BY timestamp) window_id
FROM tickstore;

The layering of custom pre-built functions has for a long time been the traditional way of adding functions to a database. The SQL-99 and SQL-2003 analytic functions themselves follow this tradition.

The problem with this is not just with Vertica but also with the giants of the market, Oracle and Microsoft for instance. Their approach is that the customer is at the mercy of the database vendor - pre-built analytic functions are hard-coded to every major release of the DBMS. There is no independence between the analytics layer and the DBMS – which real, well-architected analytic platforms need to have. Simply put, if you want to do a different sessionization semantic, you’ll have to wait for Vertica to build a whole new function. Read the rest of this entry »

By Steve Wooledge in Analytics on November 8, 2010

One of the coolest parts of my job is seeing how companies use analytics to drive their business. The term “big data” has become somewhat of a superstar in the world of analytics recently, but it’s also the complexity and richness of the insights from that data which make it a “big data” challenge for companies to tackle with traditional data management infrastructures. It’s not just size that matters – it’s analytical power. That is to say, what you DO with data.

And it’s not just in Silicon Valley or on Wall Street. October marked one year of the “Big Data Summit” road show which we hosted across the US to offer high-level executives, data analytic practitioners, and analysts the opportunity to share best practices and exchange ideas about solving big data problem within their industries and organizations. It was a huge success with an average of 80-100 people attending each summit in major cities including held New York, Chicago, San Francisco, Dallas, and Washington DC. We are starting the tour again later this month in New York City on November 18 and are rebranding it “Data Analytics Summit,” again because of the feedback that it’s more about the application of data in analytics and applications within a specific business area or industry.

Here are some examples. The attendees at the summits have been providing us with interesting data through surveys. The attendees are from a variety of industries, from traditional retailers to bleeding-edge digital media companies. We asked respondents questions like, “What are the biggest opportunities for benefiting from big data within the market?” Let me know if you think they missed any big opportunities. Here are a few of our findings:

- Data exploration to discover new market opportunities: Nearly 30% of respondents thought that analyzing big data to find “the next big thing” was a huge opportunity. This supports the notion that data scientists will be one of the sexiest jobs in the future.

- Behavioral targeting: 16% surveyed called out the importance of establishing links between purchasing behavior and areas like advertising spend to better tailor budgets and promotional campaign

- Social Network Analysis: 15% of those surveyed responded that using social network analysis to build a more complete profile of their customer base is a key business opportunity

- Monetizing Data: 15% of respondents say monetizing data is key for organizations seeking to unlock the hidden value within previously untapped asset

- Fraud Reduction and Risk Profiling: Distinguishing good customers from bad ones, for fraud reduction (10%) and risk profiling (10%), was identified as critical for financial institutions

Another general observation from attendees is that using sampled or aggregated data is no longer a viable business option for rich analytics and there is an urgent need to analyze all available data including structured and unstructured data.

What other areas do you see? Let us know what you think or if you have any questions on the statistics. I don’t claim to be an industry analyst, but it was fun to look at the breakdown of how various cities responded to the survey.