13
Apr
By Mayank Bawa in Analytics, Business analytics, Teradata Aster on April 13, 2012
   

We live in interesting times!

In the past 30 years, data was used to record business events and report on business events. Over the last 5 years, data has gotten closer to business. Now data is being used to record business events, report on business events as well as influence business events. We now realize that the more data we record, the more comprehensively data can influence business events.

Hence the excitement of “big data” – it is a business opportunity for each line of business – to influence business events to have favorable outcomes.

The responsibility for technologists is to provide the right platforms and tools to make influencing business easy and simple.

There are TWO relentless forces that are playing out in the big data space to which technology has to respond.

The first force is the diversity of data. As we record more data, we end up having different formats of data to manage. About 20% is relational, but we also have text, emails, PDF, Twitter feeds, Facebook profiles, social graphs, CDRs, Apache logs, JSON formats, …

The second force is the richness of analytics. As we influence more business, we end up having richer analytics to perform. About 20% is SQL, but we also have time series analysis, statistical analysis, geo-spatial analysis, graph analysis, sentiment analysis, entity extraction, …

Note that I am not saying MapReduce doesn’t have a diverse set of analytics to do: MapReduce is a way of programming to do analysis – time series, statistical, geo-spatial – each require different MapReduce programs to be written.

Today, the platforms and tools for big data are very complex. They expect lines of business owners to write programs to manage different forms of big data, to write sophisticated programs to analyze big data, to master the management and administration of big clusters and be self-sustaining in managing data quality. This last point is very important – data values change over time. We have to keep values consistent, otherwise our analysis will be wrong and our influence on business will be negative – garbage in, garbage out rule of computing.

As a result, big data is in danger of entering the DIY (do it yourself) space. A line of business is now expected to support big clusters = big administration = big programs = big friction = low influence.

We have to acknowledge these challenges as technologists. If we let big data solutions be a DIY solution, only pockets of enterprise will embrace big data – the rest of the non-technology savvy business leaders will be left out of the opportunity.

We have to simplify this equation. We need to enable line of business owners to benefit from big data a lot more easily. We have to make it simpler for business leaders to get from big data to big analytics.

Our goal, big data = small clusters = easy administration = big analytics = big influence.

This entails solving the following problems:

[1] Make platform and tools to be easier to use to manage and curate data. Otherwise, garbage in = garbage out, and you will get garbage analytics.

[2] Provide rich analytics functions out of the box. Each line of programming cuts your reachable audience by 50%.

[3] Provide tools to update or delete data. Otherwise, data consistency will drift away from truth as history accumulates.

[4] Provide applications to leverage data and find answers relevant to business. Otherwise the cost of DIY applications is too high to influence business – and won’t be done.

At Teradata Aster, we are continuing to lead the big data revolution. We have led the revolution for the past 5 years, and helped shape the market and technologies. We are convinced that the path to big data success is to connect it with Big Analytics in the coming 5 years.



21
Mar
   

The conversation around “big data” has been evolving beyond a technology discussion to focus on analytics and applications to the business.  As such, we’ve worked with our partners and customers to expand the scope of the Big Data Summit events we initiated back in 2009 and have created Big Analytics 2012 – a new series of roadshow events kicking off in San Francisco on April 19, 2012 .

According to previous attendees and market surveys, the greatest big data application opportunities in businesses are:

- Digital marketing applications such as multi-channel analytics and testing to better understand and engage your customers

- Using data science and analytics to explore and develop new markets or data-driven services

Companies like LinkedIn, Edmodo, eBay,  and others have effectively applied data science and analytics to take advantage of the new economics of data. And they are ready to share details of what they have learned along the way.

Big Analytics 2012 is a half-day event, is absolutely free to attend, and will include insight from industry insiders in two different tracks: Digital Marketing Optimization, and Data Science and Analytics. Big Analytics 2012 is a great way to meet and hear from your peers such as: executives who want to learn more about leveraging advanced analytics to a competitive advantage, interactive marketing innovators who want access to “game changing” insights for digital marketing optimization, enterprise architects and business intelligence professionals looking to provide big data infrastructure and data scientists and business analysts who are responsible for developing new data-driven products or business insights.

Come to learn from the panel of experts and stay for an evening networking reception that will put you in touch with big data and analytics professionals from throughout the industry. Big Analytics 2012 will be coming soon to a city near you. Click here to learn more about the event and to register now.

 



19
Mar
By Tasso Argyros in Analytics, Business analytics, Interactive marketing, Teradata Aster on March 19, 2012
   

Tomorrow, I will have the pleasure of presenting “Radical Loyalty – Data Science Applied to Marketing” at the GigaOm Structure:Data event with Marc Parrish, the VP of Membership and Customer Retention Marketing at Barnes & Noble. In contrast with most talks at this event, Marc and I will be focusing on the business opportunities of Big Data and specifically on marketing loyalty programs and how they relate to Big Data analytics.

The concept of a loyalty program is certainly nothing new. Brick and mortar companies have been leveraging customer loyalty in a variety of unique ways for decades. What’s different is the ability of businesses to use new types of data to take their customer loyalty insights and strategies to a completely new level. At tomorrow’s conference, we will explore ways in which modern retailers like Barnes & Noble with a strong digital marketing strategy leverage their customers’ loyalty using Big Data and how to make loyalty programs worthwhile for customers and their needs.

Barnes & Noble has proven an ability to innovate their business model by leveraging data. I look forward to sharing some insight with Marc on retail and other real world applications of Big Data.



15
Mar
By Steve Wooledge in Analytics, MapReduce, Teradata Aster on March 15, 2012
   

Yesterday I presented at the Los Angeles Teradata User Group on the topic of “Data Science: Finding Patterns in Your Data More Quickly & Easily with MapReduce”. One point discussed was the common misnomer that big data is about volume, which is certainly part of the issue organizations are facing. However, the big story in big data is the complexity and additional processing required to make “unstructured” data actionable through analytics. This is where procedural frameworks like MapReduce can help. Here is a great post by Teradata’s own Bill Franks about unstructured data which helps describe the requirements unstructured data demands in the context of analytics.

As Franks notes, “the thought of using unstructured data really shouldn’t intimidate people as much as it often does.” Read more to learn why.

 



28
Feb
By Stephanie in Interactive marketing on February 28, 2012
   

On a recent webinar, Rob Bronson from Forrester Research pointed out that 45% of Big Data implementations are in marketing.  One of the use cases we most hear about for customers is the need to move from single-touch attribution methods like last-click and first-click to multi-channel, multi-touch attribution.  Today we announced an extension of our Digital Marketing Solutions to deliver multi-touch attribution. 

When I speak with customers about moving to multi-touch attribution it feels like hearing about HDTV for the first time.   More clarity, more detail, and a richer experience that is more like the real-life experience of consumers.  So, multi-touch attribution is basically the HD equivalent of single-touch attribution.

What’s different?  First of all, consumers interact across many touch-points, social, mobile, search, websites as well as offline channels.  Most existing attribution solutions look at multiple touch-points within a single channel, like an ad network or web visitors.  With a Big Data Analytics approach it is easier to blend more channels into the mix and find customer connections.

This is critical today, because it better reflects the customer journey.   To be customer-centric, it is critical to be able to look at your brand through the eyes of the consumer.  A few years ago, this was impossible or at least difficult and expensive.  Now Big Data marketing analytics makes it possible to see the multi-channel journeys with incredible clarity.

As consumers dynamically adopt new technologies, keeping up with them is one of today’s marketers biggest challenges.  To do that, you can’t be stuck in legacy single-touch or annual reviews of attribution.  Big Data Analytics makes it possible to discover new patterns, test new programs and iterate to optimize in the time scales that the market demands.

An additional value is that Big Data Analytics can deliver a 3D-type enhancement to attribution.  Teradata Aster gives you the ability to use different measures for each touch point so you can use uniform, variable or exponential weightings in your model in order to test and iterate to get the right approach for your business.

Another big difference using Teradata Aster to analyze attribution is to be able to link to additional data in a Teradata Data Warehouse to include Revenue, Profit and Lifetime Value which extends attribution beyond conversion to real bottom-line performance.

Lastly, the ability to integrate into the Aprimo marketing platform makes this insight actionable.   With Aster and Aprimo being part of Teradata, it becomes possible to operationalize your Big Data Analytics more effectively.

The infographic above highlights why some marketers might feel like they have an attribution problem.  You can download a PDF of it here. On the same page, you will also find a white paper we created with Aprimo to go into more detail around what attribution looks like today, and an On-Demand webinar with Forrester and Razorfish that looks at attribution in some depth.  For those who want to read more, check out an addition to this Delicious stack.

So my question for this post is – Do you have an attribution problem?  And if so, how can having multi-touch, multi-channel attribution model make it better?



21
Feb
By Tasso Argyros in Analytic platform, Analytics, Analytics tech, Database, MapReduce on February 21, 2012
   

It has been about seven years since Aster Data was founded, four years since our industry-first Enteprise SQL-MapReduce implementation (first commercial MapReduce offering) and three years since our first Big Data Summit event (the first “Big Data” event in the industry as far as I know). During this whole time, we have witnessed our technology investments take off together with the Big Data market – just think how many people had never even heard the word MapReduce three years ago, and how many swear by it today!

As someone who was caught in the Big Data wave since 2005, I can tell you that the stage of the market has changed significantly during this time – and with it, the challenges that Enterprise customers face. A few years ago, customers were realizing the challenges that piles of new types of data were bringing – big volumes (terabytes to petabytes) and new, complex types (multi-structured data such as weblogs, text, customer interaction data); but at the same time, the opportunities that the new analytical interfaces, like MapReduce, were enabling. Fast forward to today and most enterprises are trying to put together their Big Data strategies and make sense of what the market has to offer – and as a result there is a lot of market noise and confusion: it is usually not clear what use cases apply to traditional technologies versus new; how to reconcile existing technologies with new investments; and what type of projects will they give them highest ROI versus a long and painful failure.

Teradata and Teradata Aster have a high interest in customers being successful with Big Data challenges and technologies, because we believe that the growth of the market will translate into growth for us. Given Teradata’s history in being the #1 strategic advisor to customers around data management and analytics, we only want to offer the best solutions to our customers. This includes our products –which are recognized by Gartner as leading technologies in Data Warehousing and Big Data analytics– but also our expertise helping customers how to use complementary solutions, like Hadoop, and making sure that the total solution works reliably and succeeds in tackling big business problems.

With this partnership, we are taking one more step towards this direction. So we are announcing three things:

1. Teradata and Hortonworks will work together to jointly solve big challenges for our customers. This is a win/win for customers and the industry.

2. Our intent to do joint R&D to make it easier for customers that use products from Teradata and Hadoop to utilize these products together. This is important because every enterprise will look to combine new technologies with existing investments, and there is plenty of opportunity to do better.

3. A set of reference architectures that combine Teradata and Hadoop products to accelerate the implementation of Big Data Big Data projects. We hope that this will be a starting point that will save enterprises time and money when they embark on Big Data projects.

We believe that all the above three points will translate into eliminating risks and unnecessary trial and error. We have enough collective experience to guide customers to avoid failed projects and traps. And by helping clear up some of the confusion in the big data market, we hope to accelerate its growth and the benefit to Enterprises that are looking to utilizing their data to become more competitive and efficient.



29
Sep
By Tasso Argyros in Analytic platform, Analytics, MapReduce on September 29, 2011
   

One of the great things about starting your own company (if you’re lucky and your company does well) is that you take part in the evolution of a whole new market, from its nascent days to its heyday. This was the case with Aster and the “Big Data” market. Back when we started Aster, in 2005, MPP systems that could store and analyze data using off-the-self servers was still a pretty new concept. I also recall in 2008, when we first came out with our native in-database MapReduce support — and our SQL-MapReduce® technology — we had to explain to most people what MapReduce even was. In 2009, we came out with the first Big Data event series — “Big Data Summit” — because we knew we were doing something new and wanted a term to describe it. “Big Data” caught on more than we had imagined back then, and the rest is history. Product innovation was at the core of Aster’s existence, and we kept pushing ourselves and our product to become the best platform for enterprise-class data analytics using both SQL and MapReduce as first class citizens on one analytic platform.

Today there is a lot of innovation in the big data market. However, we see a “chasm” between the SQL technologies—which are very enterprise-friendly—and the new wave of open source big data or “NoSQL” software which is used extensively by engineering organizations. In the middle is a very large number of enterprises trying to understand how they can use these new technologies to push their analytical capabilities beyond purely SQL, while at the same time utilizing their existing investments in technologies and people. This is the problem that Aster solves.

With last week’s announcement, the launch of our Teradata Aster MapReduce solutions which include Aster Database 5.0 software (formerly Aster nCluster) and our new Aster MapReduce Appliance, we bring to market the best answer for the organizations that are “caught in the middle.” Unlike SQL-only systems focused primarily on analyzing structured data, our database and appliance provide support for native MapReduce which enables a new generation of analytics, such as digital marketing optimization, social graph analysis, fraud detection based on customer behavior, etc. Our newly extended libraries of pre-built MapReduce analytical functions allows such applications to be developed with significantly less time and cost versus other MapReduce technologies. And, unlike other MapReduce-based systems, we offer full SQL support, integration with all major BI and ETL vendors and a data adaptor to EDW systems that allows enterprises to utilize existing tools and skills to bring big data analytics to their businesses. Finally, with our new appliance, we leverage Teradata’s strength and engineering to provide a proven and performance-optimized system for businesses to start analyzing untapped diverse data while cutting down on time, cost and frustration!

As we move forward, Aster is committed to being the leader in SQL and MapReduce analytics for multi-structured data. Having spent 6 years in this market, we believe that it’s not just the coolest technologies that will win, but the ones that make it easier for business analysts and data scientists within organizations to solve their business problems and innovate with analytics. With the launch of our new Teradata Aster solutions — including the revamped SQL-MapReduce interfaces and the new Aster MapReduce appliance—we are pushing the state of the art towards this direction (or as my marketing team likes to say – “bringing the science of data to the art of business”). :)



03
Aug
By Mayank Bawa in Analytic platform on August 3, 2011
   

The world of big data will benefit tremendously from a hybrid big data platform. Teradata’s Aster Data nCluster provides such a hybrid big data platform.

It enables multi-structured data to be stored natively in the database. Therefore, we can store relational data as tables with rows and columns. We can store PDF documents as PDF documents, HTML pages as HTML pages – and the same with Java objects, JPG files, Word documents, GIS data, and others.

It enables multi-structured data to be automatically (dynamically) interpreted natively in the database. For example, we can process PDF data to retrieve the various text blocks in that document, HTML pages to retrieve its content, and JPG files to render images or extract features. In other words, we can interact with the data in its native form to leverage the structure inherent in the stored data.

The final piece is that it enables a human or application user to step across the different structures seamlessly. For example, you can write a query that:

  1. Identifies your valuable customers by analyzing payment history table
  2. Analyzes and interprets customer sentiment by analyzing logs of customer calls
  3. Builds a decision tree to determine the most common problem detected in customer logs
  4. Builds a linear regression model to predict the loss in revenue that can be prevented by solving customers’ problem and the cost of acquiring net new customers to overcome the losses

This can all be done in one workflow and one session. Impressive?

We live in interesting times. The future is opening up in front of us.



28
Jul
By Mayank Bawa in Analytic platform, Analytics on July 28, 2011
   

I wrote earlier that data is structured in multiple forms. In fact, it is the structure of data that allows applications to handle it “automatically” – as an automaton, i.e., programmatically – rather than relying on humans to handle it “semantically”.

Thus a search engine can search for words, propose completion of partially typed words, do spell checking, and suggest grammar corrections “automatically”.

In the last 30 years, we’ve built specialized systems to handle each data structure differently at scale. We index a large corpus of documents in a dedicated search engine for searches, we arrange lots of words in a publishing framework to compose documents, we store relational data in a RDBMS to do reporting, we store emails in an e-discovery platform to identify emails that satisfy a certain pattern, we build and store cubes in a MOLAP engine to do interactive analysis, and so on.

Each such system is a silo – it imposes a particular structure on big data, and then it leverages that structure to do its tasks efficiently at scale.

The silo approach imposes fragmentation of data assets. It is expensive to maintain these silos. It is inefficient for humans and programs to master these silos – they have to learn the nuances of each silo to become an expert in exploiting it. As a result, we have all kinds of data administrators – a cube expert, a text expert, a spreadsheet expert, and so on.

The state of data fragmentation reminds me of the “dedicated function machines” that pre-dated the “Personal Computer”. We used to have electronic type-writers that would create documents, calculators that would calculate formulae, fax machines that would transmit documents, even tax machines that would calculate taxes. All of these machines were booted to relic-status at a museum by a general-purpose computer – the functions were ported on top of its computing framework and the data was stored in its file system. The unity of all of these functions and its data on the general-purpose computer gave rise to “integration” benefits. It made tasks easier: we can now fill our tax forms in (structured form-based) PDF documents, do tax calculations, and file taxes by transmitting the document – all on one platform. Our productivity has gone up. Indeed, the assimilation of data is leading to net new tasks that were not possible before. We can let programs search for previous year’s filings, read the entries, and populate this year’s forms from previous year’s filing to minimize data-entry errors.

We have the same opportunity in front of us now in the field of big data. For too long, have we relegated functions that work on big data to isolated “dedicated function machines.” These dedicated function machines are bad because they are not “open.” Data in a search engine can only be “searched” – it cannot be analyzed for sentiments or plagiarism or edited to insert or remove references. The data is the same, but each of these tasks requires a “dedicated function machine.”

We have the option to build a general purpose machine for big data – a multi-structured big data platform – that allows multiple structures of data to co-exist on a single platform that is flexible enough to perform multiple functions on data.

Such a platform, for example, would allow us to analyze structured payments data to identify our valuable customers, interpret sentiments of calls they made to us, analyze the most common problem across negative sentiment interactions, and predict the loss in revenue that can be prevented by solving that problem and the cost of acquiring net new customers to overcome the losses. Without a multi-structure big data platform, the above workflow is a 12-18 month cycle performed by a cross-functional team of “dedicated function experts” (CFO group, Customer Support group, Products group, Marketing group) – a bureaucratic mess of project management that produces results too expensively, too infrequently and too inaccurately, making simplifying assumptions at each step as they cannot agree on even basic metrics.

An open “Multi-Structured Big Data Platform” would be hugely enabling and open up vast efficiency and functionality that we can’t imagine today.



13
Jun
By Mayank Bawa in Analytic platform on June 13, 2011
   

The “big data” world is a product of exploding applications. The number of applications that are generating data has just gone through the roof. The number of applications that are being written to consume the generated data is also growing rapidly. Each application wants to produce and consume data in a structure that is most efficient for its own use.  As Gartner points out in a recent report on big data[1], “Too much information is a storage issue, certainly, but too much information is also a massive analysis issue.”

In our data-driven economy, business models are being created (and destroyed) and shifting based on the ability to compete on data and analytics. The winners realize the advantage of having platforms,  that allow data to be stored in multiple structures and (more importantly) allow data to be processed in multiple structures. This allows companies to more easily 1) harness and 2) quickly process ALL of the data about their business to better understand customers, behaviors, and opportunities/threats in the market. We call this “multi-structured” data, which has been a topic of discussion lately with IDC Research (where we first saw the term referenced) and other industry analysts. It is also the upcoming topic of a webcast we’re doing with the IDC on June 15th.

To us, multi-structured data means “a variety of data formats and types.” This could include any data “structured” or “unstructured”  - “relational” or “non-relational”. Curt Monash has blogged about naming such data Poly-structured or Multi-structured. At the core is the ability for an analytic platform to both 1) store and 2) process a diversity of formats in the most efficient means possible.

Handling Multi-structured Data

We in the industry use the term “structured” data to mean “relational” data. And data that is not “relational” is called “unstructured” or “semi-structured.”

Unfortunately, this definition lumps text, csv, pdf, doc, mpeg, jpeg, html, log files as unstructured data. Clearly, all of these forms of data have an implicit “structure” to them!

My first observation is that Relational is one way of manifesting the data. Text is another way of expressing the data – Jpeg, gif, bmp and other formats are structured forms of expressing images. For example, (Mayank, Aster Data, San Carlos, 6/1/2011) is a relational row stored in a table (Name, Company Visited, City Visited, Date Visited) – the same data can be expressed in text as “Mayank visited Aster Data, based in San Carlos, on June 1, 2011.” A geo-tagged photograph of Mayank entering the Aster Data office in San Carlos on June 1, 2011 will also capture the same information.

My second observation is that “structure” of data is what makes applications understand the data and know what to do with it. For example, a SQL-based application can issue the right SQL queries to process its logic; an image viewer can interpret JPG/GIF/BMP files to interpret the data; a text-engine can parse subject-object-verbs to interpret the data; etc.

Each application leverages the structure of data to do its processing in the most efficient manner. Thus, search engines recognize the white-space structure in English and can build inverted indexes on words to do fast searches. Relational engines recognize row headers and tuple boundaries to build indexes that can be used to retrieve selected rows very quickly. And so on.

My third observation is that each application produces data in a structure that is most efficient for its use. Thus, applications produce logs; cameras produce images; business applications produce relational rows; Web content engines produce HTML pages; etc. It is very hard to “Transform” data from one structure to the other. ETL tools have their hands full in just doing transformations from a relational schema to another relational schema. And semantic engines have a hard time “transforming” text to relational forms. All such “across structure” transforms are lost in the information.

Relational databases handle relational structure and relational processing very efficiently, but they are severely limiting in their capabilities to store and process other structures (e.g., text, xml, jpg, pdf, doc). In these engines, relations are a first-class citizen; every other structure is a distant second-class citizen.

Hadoop is exciting in the “Big Data” world because it doesn’t pre-suppose any structure. Data in any structure can be stored in plain files. Applications can read the files and build their own structures on the fly. It is liberating. However, it is not efficient – precisely because it reduces all data to its base form of files and robs the data of its structure – the structure that would allow for efficient processing or storage by applications! Each application has to redo its work from scratch.

What would it take for a platform to treat multiple structures of data as first class citizens? How could it natively support each format, yet provide a unified way to express queries or analytic logic at the end-user level to as to abstract away the complexity/diversity of the data and provide insights more quickly?  It’d be liberating as well as efficient!


[1] “’Big Data’ Is Only the Beginning of Extreme Information Management”. Gartner Research, April 7, 2011