Archive for the ‘Business analytics’ Category

12
Jun
   

Back in 2005, when we first founded Aster Data, our vision was to take some of the latest technology innovations – including MPP shared-nothing architectures; Linux-based commodity hardware; and novel analytical interfaces like Google’s MapReduce – and bring them to mainstream enterprises. This vision translated into a strategy focused not only on big data innovations, but also on delivering technologies that make big data viable for enterprise environments. SQL-MapReduce®, our industry-leading patented technology that combines standard SQL processing with a native MapReduce execution environment, is one example of how we make big data enterprise ready.

Today we have completed another major milestone on providing value to our customers by announcing a major innovation: Aster SQL-H™, a seamless way to execute SQL & SQL-MapReduce on Apache™ Hadoop™ data.

This is a significant step forward from what was state-of-the-art until yesterday. What was missing? A common DBMS-Hadoop connector operating at the physical layer. This means that getting data from Hadoop to a database required a Hadoop expert in the middle to do the data cleansing and the data type translation. If the data was not 100% clean (which is the case in most circumstances) a developer was needed to get it to a consistent, proper form. Besides wasting the valuable time of that expert, this process meant that business analysts couldn’t directly access and analyze data in Hadoop clusters. Other database connectors require duplicating the data into HDFS by using proprietary formats; a cumbersome and expensive approach by any measure.

SQL-H, an industry-first, solves all those problems.

First, we have integrated Aster’s metadata engine with Hadoop’s emerging metadata standard, HCatalog. This means that data stored in Hadoop using Pig, Hive & HBase can be “seen” in an Aster system as if they are just another Aster view. The business implication is that a business analyst using standard SQL or a BI tool can have full and seamless access to Hadoop data through the Aster’s standard ODBC/JDBC connector and Aster’s SQL engine. There is no need to have a human in the middle to translate the data or ensure its consistency; and no need to file tickets or call up experts to get the data the business needs. Everything happens transparently, seamlessly, and instantly. This is an industry first, since today all available Hadoop tools either do not provide standard SQL interfaces that are well optimized, do not provide native BI compatibility, or require manual data translation and movement from Hadoop to a third party system. None of these approaches are viable options for SQL & BI execution on Hadoop data, thus making it hard for enterprises to get value from Hadoop.

Secondly, SQL-H provides a high-performance, type-safe data connector, that can take a SQL or SQL-MapReduce query that involves Hadoop data, automatically select the minimum subset of data in Hadoop that is required for execution of the query, and run the query on the Aster system. The performance of running SQL and SQL-MapReduce analytics in Aster is significantly higher than Hadoop because (a) Aster can optimize data partitioning and distribution, thus reducing network transfers and overhead; (b) Aster’s engine can keep statistics about the data and use that to optimize execution of both SQL & MapReduce; (c) Aster’s SQL queries are cost-based-optimized which means that it can handle very complex SQL, including SQL produced by BI tools, very efficiently.

In addition, one can take advantage of SQL-H to apply the 50+ pre-build SQL-MapReduce apps that Teradata Aster provides on Hadoop data, thus doing big data analytics that are impossible to do in every other database without having to write a single line of Java MapReduce code! These apps include functions for path & pattern analysis, statistics, graph, text analysis, and more.

Teradata Aster is committed to groundbreaking product innovation as the key strategy in maintaining our #1 position in the big analytics market. SQL-H is another important step that we expect will make Hadoop and big data analytics much more palatable for enterprise environments, allowing business analysts, SQL power-users & BI tool users to analyze Hadoop data without having to learn about Hadoop interfaces and code.

If you want to find out more we’ll be talking about SQL-H at Hadoop Summit, on webcast taking place June 21st, at the upcoming Big Analytics 2012 events in Chicago & New York, and at the annual Teradata Partners event.



13
Apr
By Mayank Bawa in Analytics, Business analytics, Teradata Aster on April 13, 2012
   

We live in interesting times!

In the past 30 years, data was used to record business events and report on business events. Over the last 5 years, data has gotten closer to business. Now data is being used to record business events, report on business events as well as influence business events. We now realize that the more data we record, the more comprehensively data can influence business events.

Hence the excitement of “big data” - it is a business opportunity for each line of business - to influence business events to have favorable outcomes.

The responsibility for technologists is to provide the right platforms and tools to make influencing business easy and simple.

There are TWO relentless forces that are playing out in the big data space to which technology has to respond.

The first force is the diversity of data. As we record more data, we end up having different formats of data to manage. About 20% is relational, but we also have text, emails, PDF, Twitter feeds, Facebook profiles, social graphs, CDRs, Apache logs, JSON formats, …

The second force is the richness of analytics. As we influence more business, we end up having richer analytics to perform. About 20% is SQL, but we also have time series analysis, statistical analysis, geo-spatial analysis, graph analysis, sentiment analysis, entity extraction, …

Note that I am not saying MapReduce doesn’t have a diverse set of analytics to do: MapReduce is a way of programming to do analysis - time series, statistical, geo-spatial - each require different MapReduce programs to be written.

Today, the platforms and tools for big data are very complex. They expect lines of business owners to write programs to manage different forms of big data, to write sophisticated programs to analyze big data, to master the management and administration of big clusters and be self-sustaining in managing data quality. This last point is very important - data values change over time. We have to keep values consistent, otherwise our analysis will be wrong and our influence on business will be negative - garbage in, garbage out rule of computing.

As a result, big data is in danger of entering the DIY (do it yourself) space. A line of business is now expected to support big clusters = big administration = big programs = big friction = low influence.

We have to acknowledge these challenges as technologists. If we let big data solutions be a DIY solution, only pockets of enterprise will embrace big data - the rest of the non-technology savvy business leaders will be left out of the opportunity.

We have to simplify this equation. We need to enable line of business owners to benefit from big data a lot more easily. We have to make it simpler for business leaders to get from big data to big analytics.

Our goal, big data = small clusters = easy administration = big analytics = big influence.

This entails solving the following problems:

[1] Make platform and tools to be easier to use to manage and curate data. Otherwise, garbage in = garbage out, and you will get garbage analytics.

[2] Provide rich analytics functions out of the box. Each line of programming cuts your reachable audience by 50%.

[3] Provide tools to update or delete data. Otherwise, data consistency will drift away from truth as history accumulates.

[4] Provide applications to leverage data and find answers relevant to business. Otherwise the cost of DIY applications is too high to influence business - and won’t be done.

At Teradata Aster, we are continuing to lead the big data revolution. We have led the revolution for the past 5 years, and helped shape the market and technologies. We are convinced that the path to big data success is to connect it with Big Analytics in the coming 5 years.



21
Mar
   

The conversation around “big data” has been evolving beyond a technology discussion to focus on analytics and applications to the business.  As such, we’ve worked with our partners and customers to expand the scope of the Big Data Summit events we initiated back in 2009 and have created Big Analytics 2012 - a new series of roadshow events kicking off in San Francisco on April 19, 2012 .

According to previous attendees and market surveys, the greatest big data application opportunities in businesses are:

- Digital marketing applications such as multi-channel analytics and testing to better understand and engage your customers

- Using data science and analytics to explore and develop new markets or data-driven services

Companies like LinkedIn, Edmodo, eBay,  and others have effectively applied data science and analytics to take advantage of the new economics of data. And they are ready to share details of what they have learned along the way.

Big Analytics 2012 is a half-day event, is absolutely free to attend, and will include insight from industry insiders in two different tracks: Digital Marketing Optimization, and Data Science and Analytics. Big Analytics 2012 is a great way to meet and hear from your peers such as: executives who want to learn more about leveraging advanced analytics to a competitive advantage, interactive marketing innovators who want access to “game changing” insights for digital marketing optimization, enterprise architects and business intelligence professionals looking to provide big data infrastructure and data scientists and business analysts who are responsible for developing new data-driven products or business insights.

Come to learn from the panel of experts and stay for an evening networking reception that will put you in touch with big data and analytics professionals from throughout the industry. Big Analytics 2012 will be coming soon to a city near you. Click here to learn more about the event and to register now.

 



19
Mar
By Tasso Argyros in Analytics, Business analytics, Interactive marketing, Teradata Aster on March 19, 2012
   

Tomorrow, I will have the pleasure of presenting “Radical Loyalty - Data Science Applied to Marketing” at the GigaOm Structure:Data event with Marc Parrish, the VP of Membership and Customer Retention Marketing at Barnes & Noble. In contrast with most talks at this event, Marc and I will be focusing on the business opportunities of Big Data and specifically on marketing loyalty programs and how they relate to Big Data analytics.

The concept of a loyalty program is certainly nothing new. Brick and mortar companies have been leveraging customer loyalty in a variety of unique ways for decades. What’s different is the ability of businesses to use new types of data to take their customer loyalty insights and strategies to a completely new level. At tomorrow’s conference, we will explore ways in which modern retailers like Barnes & Noble with a strong digital marketing strategy leverage their customers’ loyalty using Big Data and how to make loyalty programs worthwhile for customers and their needs.

Barnes & Noble has proven an ability to innovate their business model by leveraging data. I look forward to sharing some insight with Marc on retail and other real world applications of Big Data.



25
Aug
By Mayank Bawa in Analytics, Blogroll, Business analytics, MapReduce on August 25, 2008
   

I’m unbelievably excited about our new In-Database MapReduce feature!

Google has used MapReduce and GFS on page rank analysis, but the sky is really the limit for anyone to build powerful analytic apps. Curt Monash has posted an excellent compendium of applications that are successfully leveraging the MapReduce paradigm today.

A few examples of SQL/MapReduce functions that we’ve collaborated with our customers on so far:

1. Path Sequencing: SQL/MR functions can be used for developing regular expression matching of complex path sequences (eg. time series financial analysis or clickstream behavioral recommendations). It can also be extended to discover Golden Paths to reveal interesting behavioural patterns useful for segmentation, issue resolution, and risk optimization.

2. Graph Analysis: many interesting graph problems like BFS (breadth first search), SSSP (single source shortest path), APSP (all-pairs shortest path), and page rank that depend on graph traversal.

3. Machine Learning: several statistical algorithms like linear regression, clustering, collaborative filtering, naive bayes, support vector machine, and neural networks can be used to solve hard problems like pattern recognition, recommendations/market basket analysis, and classification/segmentation.

4. Data Transformations and Preparation: Large-scale transformations can be parameterized as SQL/MR functions for data cleansing and standardization, unleashing the true potential for Extract-Load-Transform pipelines and making large-scale data model normalization feasible. Push down also enables rapid discovery and data pre-processing to create analytical data sets used for advanced analytics such as SAS and SPSS.

These are just a few simple examples Aster has developed for our customers and partners via Aster’s In-Database MapReduce to help them with rich analysis and transformations of large data.

I’d like to finish with a simple code snippet example of a simple, yet powerful SQL/MR function we’ve developed called “Sessionization”

Our Internet customers have conveyed that defining a user session can’t be easily done (if at all) using standard SQL. One possibility is to use cookies but users frequently remove them or they expire.

Aster In-Database MapReduce

Aster developed a simple “Sessionization”Â? SQL/MR function via our standard Java API library to easily parameterize the discovery of a user session. A session would be defined by a timeout value (eg. in seconds). If the elapsed time between consecutive click events is greater than the timeout, this would signal a new session has begun for that user.

From a user perspective, the input is user clicks (eg. timestamp, userid). The output is to associate each click to a unique session identifier based on the Java procedure noted above. Here’s the simple syntax:

SELECT timestamp, userid, sessionid
FROM sessionize("timestamp", 600) ON clickstream
SEQUENCE BY timestamp
PARTITION BY userid;

Indeed, it is that simple.

So simple, that we have reduced a complex multi-hour Extract-Load-Transform task into a toy example. That is the power of In-Database MapReduce!



19
Aug
   

I am curious if anyone out there is attending the TDWI World Conference in San Diego this week? If so and you would like to meet up with me, please do drop me a line or comment below as I will be in attendance. I’m of course very excited to be making the trip to sunny San Diego and hope to catch a glimpse of Ron Burgundy and the channel 4 news team! :-)

But of course it’s not all fun and games, as I’ll participate in one of TDWI’s famous Tool Talk evening sessions discussing data warehouse appliances. This should make for some great dialogue between me and other database appliance players, especially given the recent attention our industry has seen. I think Aster has a really different approach to analyzing big data and look forward to discussing exactly why.

For those interested in the talk, here are the details..come on by and let’s chat!
What:TDWI Tool Talk Session on data warehouse appliances
When: Wednesday, August 20, 2008 @ 6:00p.m.
Where: Manchester Grand Hyatt, San Diego, CA



17
Aug
More Freedom, More Data
By Tasso Argyros in Analytics, Blogroll, Business analytics, Statements on August 17, 2008
   

When Polo lets you use your mobile phone to buy a pair of pants, you know there’s something interesting going on.

The trend is inevitable: purchasing becomes easier and more frictionless. You could buy something at the store or from your home. But now you can buy stuff while you jog in the park, while you bike (it’s not illegal yet), or even while you’re reading a distressing email on your iPhone (shopping therapy at its best.)

As purchasing gets easier and pervasive, we’ll tend to buy things in smaller quantities and more often. Which means more consumer behavior data will be available for analysis by advertisers and retailers to better target promotions to the right people at the right time.

In this new age, where interaction of buyers with shops and brands is much more frequent and intimate, enterprises who use their data to understand their customers will have a huge advantage over their competition. That’s one of the reasons why at Aster we’re so excited building the tools for tomorrow’s winners.



05
Aug
By Mayank Bawa in Analytics, Blogroll, Business analytics, Business intelligence, Database on August 5, 2008
   

Today we are pleased to welcome Pentaho as a partner to Aster Data Systems. What this means is that our customers can now use Pentaho open-source BI products for reporting and analysis on top of Aster nCluster.

We have been working with Pentaho for some time on testing the integration between their BI products and our analytic database. We’ve been impressed with Pentaho’s technical team and the capabilities of the product they’ve built together with the open source community. Pentaho recently announced a new iPhone application which is darn cool!

I guess, by induction, Aster results can be seen on the iPhone too. :-)



25
Jul
   

Stuart announced yesterday that Microsoft has agreed to acquire DATAllegro. It is pretty clear Stuart and his team have worked hard for this day: it is heartening to see that hard work gets rewarded sooner or later. Congratulations, DATAllegro!

Microsoft is clearly acquiring DATAllegro for its technology. Indeed, Stuart says that DATAllegro will start porting away from Ingres to SQL Server once the acquisition completes. Microsoft’s plan is to provide a separate offering from its traditional SQL Server Clustering.

In effect, this event provides a second admission from a traditional database vendor that OLTP databases are not up to the task for large-scale analytics. The first admission was in 1990s when Sybase (ironically, originator of SQL Server code base) offered Sybase IQ as a separate product from its OLTP offering.

The market already knew this fact: the key point here is that Microsoft is waking up to the realization.

A corollary is that it must have been really difficult for Microsoft SQL Server division to scale SQL Server for larger scale deployments. Clearly, Microsoft is an engineering shop and the effort of integrating alien technology into their SQL Server code-base must have been carefully evaluated for a build-vs-buy decision. The buy decision is a tacit admission that it is incredibly hard to scale their SQL Server offering with its roots in traditional OLTP database.

We can expect Oracle, IBM, and HP to have similar problems in scaling their 1980s code-base for the needs of data-scale and query-workloads of today’s data warehousing systems. Will the market wait for Oracle, IBM, and HP’s efforts to scale to come to fruition? Or will Oracle, IBM, and HP soon acquire companies to improve their own scalability?

It is interesting to note that DATAllegro will be moving to an all-Microsoft platform. The acquisition could also be read as a defensive move by Microsoft. All of the large-scale data warehouse offerings today are based on Unix variants (Unix/Linux/Solaris), thus leading to the uncomfortable situation at some all-Microsoft shops who chose to run Unix-based data warehouse offerings because SQL Server would not scale. Microsoft needed an offering that could preserve their enterprise-wide customers on Microsoft platforms.

Finally, there is a difference in philosophy between Microsoft’s and DATAllegro’s product offerings. Microsoft SQLServer has sought to cater to the lower end of the BI spectrum; DATAllegro has actively courted the higher end. Correspondingly, DATAllegro uses powerful servers, fast storage, and expensive interconnect to deliver a solution. Microsoft SQL Server has sought to deliver a solution at a much lower cost. We can only wait and watch: will the algorithms of one philosophy work well in the infrastructure of the other?

At Aster Data Systems, we believe that the market dynamics will not change as a result of this acquisition: companies will want the best solutions to derive the most value from data. In the last decade, Internet changed the world and old-market behemoths could not translate their might into the new market. In this decade, Data will produce a similar disruption.



14
Jul
By George Candea in Analytics, Business analytics on July 14, 2008
   

In a recent interview with Wired magazine, IBM’s Wattenberg mentioned an interesting yardstick for data analytics: compare the data you give to a human to the sum total of the words that human will hear in a lifetime, which is less than 1 TB of text. Incidentally, this 1 TB number is how big Gordon Bell thinks a lifetime of recording daily minutiae would be; Bell now has MyLifeBits, the most extensive personal archive, in which he records all his e-mails, photographs, phone calls, Web pages visited, IM conversations, desktop activity (like which apps he ran and when), health records, books in his library, labels of the bottles of wine he enjoyed, etc. His collection grows at about 1 GB / month, amounting to ~1 TB for a lifetime; and that’s what the human brain is built for.

Wattenberg comes with an interesting perspective: human language is a form of compression (“Twelve words from Voltaire can hold a lifetime of experience”). This is because of the strong contextual information carried by each phrase. MyLifeBits does not reflect the life experiences; it provides the bits from which those life experiences are built, through connections and interpretations.

Herein lies the challenge of data analytics: how to “compress” vast amounts of data into a small volume of information that the human brain can absorb, process, and act upon. How to leverage context in delivering answers, recommendations, and insights. The Web brought data out in the open, search engines allowed us to ask questions of this data, analytics engines are now starting to allow precise and deep questions to be asked of otherwise overwhelming amounts of data. We, as an industry, are just entering the Neolithic of information history.

We need breakthroughs in visualization and, in particular, in the way we leverage the context of previous answers. Researchers at University College London are looking into how the hippocampus encodes spatial and episodic memories; they are going as far as analyzing fMRI (functional MRI) scans of the brain to extract the memories stored in that brain. In computerized data analytics, we are faced with a relatively simpler task: record all past answers and then leverage this context in order to more effectively communicate new results. Understand how the current answer relates to the previous one, and deliver an interpretation of the delta. That’s where we would like to be, sooner rather than later.