Archive for the ‘Analytics’ Category

 
26
Apr
Posted by Mayank in Analytics on April 26, 2010

I’ve remarked in an earlier post that the usage of data is changing and new applications are on the horizon. In the last couple of years, we’ve observed interesting design patterns for business processes that use data.

In a previous post, I outlined a design pattern that we call “The Automated Feedback Loop.” In this post, I want to outline a design pattern that we call “The (Iterative) Analytics Data Warehouse”.

The traditional well-understood design pattern of a data warehouse is a central (for Enterprise Data Warehouse) or departmental (for Data Marts) repository of data. Data is fed into the warehouse from ETL processes that pull data from a variety of sources. The data is organized in a data model that caters to 3 use-cases of the warehouse:

  1. Reports - A set of BI queries are run with regular frequency to monitor the state of the business. The target of the reports are business users who want to understand what happened. The goal is to keep them in touch with the pulse of the business.
  2. Exports - A set of export jobs are run with adhoc frequency to provide data sets for further analysis. The target of the exports are business analysts who want to optimize business practices. The goal is to provide them with true, quality-stamped data so that they can make confident optimization recommendations.
  3. Adhoc - A set of queries are run with adhoc frequency to detect or verify patterns that influence business events. The source of the queries are data scientists who want to understand and optimize business practices. The goal is to provide them with computation capabilities (good query interfaces, enough processing, memory and storage resources) to allow them to interact with the data.

The exports and adhoc tasks are transient tasks. Once the data analysts or data scientists find a pattern valuable to the business, that pattern is incorporated into a report so that business users can monitor that pattern on a frequent repeatable practice.

In a typical data warehouse, the bulk of tasks (~80%) are from [1] Reports. The remainder of 20% is from [2] Exports and [3] Adhoc.

Since Reports are frequent and generate known queries, the design of the data warehouse is done to cater to reporting. This includes data models, indexes, materialized views or derived tables – and other optimizations – to make the known Reporting queries go fast. Read the rest of this entry »



 
15
Apr
Posted by Mayank in Analytic applications, Analytics, Cloud Computing, MapReduce on April 15, 2010

In the last few years there has been a significant amount of market pickup, from users and vendors, on data clouds and advanced analytics – specifically a new class of data-driven applications run in a data cloud or on-premise. What’s different about this from past approaches is the frequency and speed at which these applications are accessed, the depth of the analysis, the number of data sources involved and the volume of data mined by these applications – terabytes to petabytes. In the midst of this cacophony of dialogue, recent announcements from vendors in this space are helping to clarify different visions and approaches to the big data challenge.

Both Aster Data and Greenplum made announcements this week that illustrated different approaches. At the same time that Aster Data announced the Aster Analytics Center, Greenplum announced an upcoming product named Chorus. I wanted to take a moment to compare and contrast what these announcements say about the direction of the two companies.

Greenplum’s approach speaks to two traditional problem areas i) access to data, from provisioning of data marts to connectivity to data across marts, and ii) some level of collaboration among certain developers and analysts. Their approach is to create a tool for provisioning, unified data access, and sharing of annotations and data among different developers and analysts. Interestingly, this is not an entirely new concept; these are well-known problems for which a number of companies and tools have already developed best-of-breed solutions over the last 15 years. For example, the capabilities for data access are another version of Export/Copy primitives that already exist in all databases and that have been built upon by common ETL and EII tools for cases in which richer support than Export & Copy are needed – for instance, when data has to be transformed, correlated or cleaned while being moved from one context (mart) to another (mart).

This approach is indicative of a product direction in which the primary focus is on adding another option to the list of tools available to customers to address these problems. It’s really not a ground-breaking innovation that evolves the world of analytics. New types of analytics, or ‘data-driven applications,’ is where the enormous opportunity lies. The Greenplum approach of data collaboration is interesting in a test environment or sandbox. When it comes to real production value however, it effectively increases the functions available to the end user, but at a big cost due to significant increases in complexity, security issues and extra administrative overhead. What does this mean exactly?

  • The spin-up of marts and moving data around can result in “data sprawl” which ultimately increases administrative overhead and is dangerous in these days of compliance and sensitivity to privacy and data leaks.
  • Adding a new toolset into the data processing stack creates difficult and painful work to either manage and administer multiple tool sets for similar purposes or to eliminate and transition away from investments in existing toolsets.
  • To enable effective communication and sharing, users need strong processes and features for source identification of data, data collection, data transformation, rule administration, error detection & correction, data governance and security. The quality and security policies around meta-data are especially important as free-form annotations can lead to propagation of errors or leaks in the absence of strong oversight.

In contrast, Aster Data’s recent announcements support our long-standing investments in our unique advanced in-database architecture where applications run fully inside Aster Data’s platform with complete application services essential for complex analytic applications. The announcements highlight that our vision is not to create a new set of tools and layers in the data stack that recreate capabilities currently available from a number of leading vendors, but rather to deliver a new Analytics Platform, a Data-Application Server, to uniquely enable analytics professionals to create data-rich applications that were impossible or impractical before – namely, to create and use advanced analytics for rich, rapid, and scalable insights into their data. This focus is complemented by our partners, who offer proven best-of-breed solutions for collaboration and data transformation.

Read the rest of this entry »



 
06
Nov
Posted by Mayank in Analytics, Frontline data warehouse, Interactive marketing on November 6, 2008

I was at Defrag 2008 yesterday and it was a wonderful, refreshing experience. A diverse group of Web 2.0 veterans and newcomers came together to accelerate the “Aha!” moment in today’s online world. The conference was very well organized and there were interesting conversations on and off the stage.

The key observation was that individuals, groups and organizations are struggling to discover, assemble, organize, act on, and gather feedback from data. Data itself is growing and fragmenting at an exponential pace. We as individuals feel overwhelmed by the slew of data (messages, emails, news, posts) in the microcosm, and we as organizations feel overwhelmed in the macrocosm.

The very real danger is that an individual or organization’s feeling of being constantly overwhelmed could result in the reduction of their “Aha!” moments – our resources will be so focused on merely keeping pace with new information that we won’t have the time or energy to connect the dots.

The goal then is to find tools and best practices to enable the “Aha!” moments – to connect the dots even as information piles up on our fingertips.

Read the rest of this entry »



 
06
Oct
Posted by Mayank in Analytics, Business Intelligence, Interactive marketing on October 6, 2008

It is really remarkable how many companies today view data analytics as the cornerstone of their businesses.acerno logoaCerno is an advertising network that uses powerful analytics to predict which advertisements to deliver to which person at what time. Their analytics are performed on completely anonymous consumer shopping data of 140M users obtained from an association of 450+ product manufacturers and multi-channel retailers. There is a strong appetite at aCerno to perform analytics that they have not done before because each 1% uplift in the click-through rates is a significant revenue stream for them and their customers.

Aggregate KnowledgeAggregate Knowledge powers a discovery network (The Pique Discovery™ Network) that delivers recommendations of products and content based on what was previously purchased and viewed by an individual using the collective behavior of the crowds that had behaved similarly in the past. Again, each 1% increase of engagement is a significant revenue stream for them and their customers.ShareThis logo

ShareThis provides a sharing network via a widget that makes it simple for people to share things they find online with their friends. In a short period of time since their launch, ShareThis has reached over 150M unique monthly users. The amazing insight is that ShareThis knows which content users actually engage with, and want to tell their friends about! And in its sheer genius, ShareThis gives away its service to publishers and consumers free; relying on delivering targeted advertising for its revenue: by delivering relevant ad messages while knowing the characteristics of that thing being shared. Again, the better their analytics, the better their revenue.

Which brings me to my point: data analytics is a direct contributor of revenue gains in these companies. Read the rest of this entry »



 
15
May
Posted by Mayank in Analytics, Statement on May 15, 2008

Have you ever discovered a wonderful little restaurant off the beaten path? You know the kind of place. It’s not part of some corporate conglomerate. They don’t advertise. The food is fresh and the service is perfect – it feels like your own private oasis. Keeping it to yourself would just be wrong (even if you selfishly don’t want the place to get too crowded).

We’re happy to see a similar anticipation and word-of-mouth about some new ideas Aster is bringing to the data analytics market. Seems that good news is just too hard to keep to yourself.

We’re serving up something unique that we’ve been preparing for several years now. We’re just as excited to be bringing you this fresh approach.