We live in interesting times!
In the past 30 years, data was used to record business events and report on business events. Over the last 5 years, data has gotten closer to business. Now data is being used to record business events, report on business events as well as influence business events. We now realize that the more data we record, the more comprehensively data can influence business events.
Hence the excitement of “big data” – it is a business opportunity for each line of business – to influence business events to have favorable outcomes.
The responsibility for technologists is to provide the right platforms and tools to make influencing business easy and simple.
There are TWO relentless forces that are playing out in the big data space to which technology has to respond.
The first force is the diversity of data. As we record more data, we end up having different formats of data to manage. About 20% is relational, but we also have text, emails, PDF, Twitter feeds, Facebook profiles, social graphs, CDRs, Apache logs, JSON formats, …
The second force is the richness of analytics. As we influence more business, we end up having richer analytics to perform. About 20% is SQL, but we also have time series analysis, statistical analysis, geo-spatial analysis, graph analysis, sentiment analysis, entity extraction, …
Note that I am not saying MapReduce doesn’t have a diverse set of analytics to do: MapReduce is a way of programming to do analysis – time series, statistical, geo-spatial – each require different MapReduce programs to be written.
Today, the platforms and tools for big data are very complex. They expect lines of business owners to write programs to manage different forms of big data, to write sophisticated programs to analyze big data, to master the management and administration of big clusters and be self-sustaining in managing data quality. This last point is very important – data values change over time. We have to keep values consistent, otherwise our analysis will be wrong and our influence on business will be negative – garbage in, garbage out rule of computing.
As a result, big data is in danger of entering the DIY (do it yourself) space. A line of business is now expected to support big clusters = big administration = big programs = big friction = low influence.
We have to acknowledge these challenges as technologists. If we let big data solutions be a DIY solution, only pockets of enterprise will embrace big data – the rest of the non-technology savvy business leaders will be left out of the opportunity.
We have to simplify this equation. We need to enable line of business owners to benefit from big data a lot more easily. We have to make it simpler for business leaders to get from big data to big analytics.
Our goal, big data = small clusters = easy administration = big analytics = big influence.
This entails solving the following problems:
 Make platform and tools to be easier to use to manage and curate data. Otherwise, garbage in = garbage out, and you will get garbage analytics.
 Provide rich analytics functions out of the box. Each line of programming cuts your reachable audience by 50%.
 Provide tools to update or delete data. Otherwise, data consistency will drift away from truth as history accumulates.
 Provide applications to leverage data and find answers relevant to business. Otherwise the cost of DIY applications is too high to influence business – and won’t be done.
At Teradata Aster, we are continuing to lead the big data revolution. We have led the revolution for the past 5 years, and helped shape the market and technologies. We are convinced that the path to big data success is to connect it with Big Analytics in the coming 5 years.